Home

Technical Correspondence

1. 1 and does not implement Linda in any case In claiming that messaging environments such as Ex press Trollius or Helios provide the equivalent of the buffering and routing capabilities of Linda Davidson appears not to understand that the point of implement ing a tuple space is not buffering and routing it is a shared associative memory Express et al provide no such thing The claim to get adequate performance custom read expensive hardware is needed is false Phrasing his comments in terms of what we must do to convince the market is silly because a our point in this response was to address the computer science com munity not the market and b the market is rapidly becoming convinced of Linda s significance of its own accord Well over half a dozen companies are involved in Linda work at this point Kahn and Miller compare two fundamentally differ ent approaches to parallelism in the following terms The CLP Concurrent Logic Programming camp strives for uniformity within a language while the Linda camp strives for uniformity across languages Agreed it is our view that this distinction in approach is a deep and important one for the evolution of pro gramming systems We can build a complete programming environment out of two separate orthogonal pieces the computation model and the coordination model The computation model supplies tools for performing single threaded step at a time
2. Technical Correspondence Calendar P s and Queues The article Calendar Queues A Fast 0 1 Priority Queue Implementation for the Simulation Event Set Problem by Randy Brown Communications Oct 1988 pp 1220 27 presents a mechanism whose time had come to be invented Working independently we have both come up with essentially the same mechanism In addition it appears that some features of the mecha nism were discovered independently by a third party 1 In April of 1988 IBM released its VM XA SP1 operat ing system which uses my implementation of the mechanism to do its timer request queuing In most op erating systems components of the operating system may request to be notified when a specified time is reached The software must queue each request until the hardware clock reaches the specified time I thought it would be appropriate to write and describe IBM s experiences with this new mechanism for doing timer request queuing I also want to congratulate Randy Brown for his independent discovery of the idea and for his excellent explanation of it In the past all versions of IBM s VM operating system implemented the timer request queue as a simple lin ear chain But with larger numbers of users on the systems in recent years the O N growth in the over head of searching this chain began to be a problem The solution used by the developers of VM SP HPO a prede cessor of VM XA SP was to find ways of
3. rent programming language may be more appreciative of the uniformity in concurrent logic programming Distributed Computing and Issues of Scale The Linda model is in a sense very high level The model provides a decoupling between clients and serv ers by providing a shared global tuple space It is left up to clever compilers to avoid central bottlenecks and to compile much of the code as point to point communi cation and specialized data structures The claim is that it is easier to think and program in this decoupled manner Linda cannot scale to large scale distributed comput ing open systems without making yet another shift in computational model It would seem that Linda s prac ticality rests upon global compiler optimizations As systems get larger and more distributed it seems im plausible that one module which adds a tuple of a cer tain form could be compiled knowing all the other modules that read tuples of that form A solution to this is to provide multiple tuple spaces and control their access Indeed at the end of the article Carriero and Gelernter mention that they are building a version of Linda with multiple first class tuple spaces We discuss this further below It is common practice to compare programming lan guages on the kinds of aesthetic grounds that underly most of the electronic discussion sparked by the Linda article Shapiro in a network conversation proposes that the ease and complexity with whi
4. s embedding experiment is designed pre sumably to shed light on this question which interpro cess communication operators are more expressive Linda communication is based on tuple spaces in the concur rent logic languages Shapiro discusses communication is based on streams constructed using partially instan tiated logical variables We have explained in detail why we believe that a shared associative object mem ory a tuple space is a more flexible and expressive medium than a CLP style Concurrent Logic Program ming stream Streams are merely one of a family of important distributed data structures we need to build vectors and matrices bags graphs associative pools and also a variety of streams We have argued that tuple spaces make a far simpler and more expressive basis for building these structures than do streams based on logical variables Roughly comparable posi tions may explain why for example Ciancarini s Shared Prolog eliminates stream based communication in favor of a blackboard model related to Linda 3 and why Kahn and Miller remark that We think a promis ing direction to explore is a CLP like language without logical variables which instead uses tuple spaces for communication These researchers will speak for themselves of course we merely present our own re action to their work Now what does Shapiro bring to counter our claims His argument seems to be in essence contrary to wh
5. Engineering 3179 Bell Engineering Center University of Arkansas Fayetteville AK 72701 REFERENCES 1 Jones D W An empirical comparison of priority queue and event set implementations Commun ACM 29 4 Apr 1986 300 311 DISK FRAGMENTATION AND CACHE MACHINES It is difficult to analyze disk and disk cache perfor mance so that the results are repeatable and useful in the real world It is my experience that the layout of directory blocks and files on a hard drive is both seem ingly random and very important often performance testing does not take this into account For example performance improvements which show up in a test program may be much lower than the improvements which occur in real life in a highly fragmented and almost full hard disk drive In such cases seek time can be greatly exaggerated over the case of essentially con tiguous files present on empty unfragmented drives It is my informal observation that in the real world seek times are more important than I O starts and I O pro cessing times Please note that none of the commer cially available de fragmentation programs help with this problem the newly contiguous files may still be many tracks away from each other The Flash cache software mentioned by Jalics and Mcintyre Communications Feb 1989 pp 246 255 has some important features which can help reduce seeking enormously Flash has a sticky mode during which blocks read into the cache
6. important as we had thought Mr Bondy s comments regarding hardware caches on 80386 computers are well taken Regarding compilations we do not say that there is no potential there What we do say is that we tried it on one high speed Cobol compiler and that the savings were unimpressive Paul J Jalics David R McIntyre James J Nance College of Business Administration Dept of Computer and Information Science Cleveland State University 2121 Euclid Avenue Cleveland OH 44115 LINDA IN CONTEXT In the recent Communications 1 Carriero and Gelernter compare Linda to the Big Three families of parallel languages concurrent object oriented languages con current logic languages and functional languages While I accept many of their statements concerning concurrent object oriented languages and functional languages I find the comparison with concurrent logic languages misguided Linda Prolog Logic Concurrency Carriero and Gelernter like to think of Linda as very different from all the three models they explore I do not think this is so and would like to suggest that Linda incorporates fundamental aspects of the logic program ming model although perhaps in a degenerate form Specifically observe that the content of a Tuple Space i e a multiset of tuples can be represented di rectly by a multiset of unit clauses also called facts and that the basic tuple space operations of Linda out and in are vari
7. market that a product is superior it must provide either supe rior run time performance or a far simplified program ming model for not much less performance In addition it is becoming necessary to provide these capabilities over a large range of hardware platforms Many of the benefits attributed to the tuple space by Carriero and Gelernter exist in most buffered message passing systems If we think of communication as a transaction be tween separate programs or processes a transaction that can t rely on standard intra program mecha nisms like shared variables and procedure calls then communication is a fundamental problem for which few unified models exist Two processes in a parallel language may communicate a program in one language may use communication mechanisms to deal with a program in another language a user program may communicate with the operating sys tem or a program may need to communicate with some future version of itself by writing a file Most systems classify these events under separate and un related mechanisms but the tuple space model cov ers them all 3 p 445 The occam CSP model of communication encompas ses all these features within the language without re sorting to a computation and communication intensive superstructure 5 6 10 Since occam is built around the concept of safe reliable communication all process to process program to program program to operating system and progra
8. oppose the C C model instead they advocate the use of integrated parallel languages in which a uniform model applies to compu tation and also to coordination We have given our rea sons for favoring a different approach but this disagrec ment cannot be resolved on objective grounds We emphasize that despite this disagreement we view our own work as strongly allied to Kahn and Miller s their work has significantly advanced our own understand ing of coordination and its importance Nicholas Carriero David Gelernter Yale University Dept of Computer Science New Haven CT 06520 2158 carriero a cs yale edu gelernter cs yale edu REFERENCES 1 Bjornson R et al The implementation and performance of hyper cube Linda RR 690 Dept Computer Science Yale University March 1989 2 Carriero N and Gelernter D Coordination languages and their significance Res Rep Dept Computer Science Yale University June 1989 3 Ciancarini P Blackboard programming in Shared Prolog In Proceed ings of the 2d Workshop on Program Languages and Compilers for Parallelism Aug 1989 To be published Research and technical papers providing you with comprehensive coverage of programming languages and systems acm Transactions on Programming Languages and Systems Editor in Chief Susan L Graham University of California Berkeley CA ACM Transactions on Programming Languages and Systems TOPLAS offers research technical p
9. primitives like assert and re tract Instead they specify concurrency and communi cation by pure logic programs augmented with syn chronization constructs The Dining Philosophers Carriero and Gelernter did not have to set up a straw man for the purpose of comparing Linda with concur rent logic programming Ringwood in Communications January 1988 has done this job for them 4 With its seventy lines of incomplete code six procedures and four illustrative figures Ringwood s Parlog86 program for solving the dining philosophers problem is indeed an easy mark However this program is not an appro priate basis for comparison with concurrent logic lan guages The reason is that a dining philosopher can be specified by a much simpler concurrent logic program phil Id eating LeftId done Left Right lt phil ld Left Right Left is eating wait until he is done phil id Left eating RightId done Right lt phil ld Left Right Right is eating wait until he is done phil Id eating Id Done Left f eating Id Done Right 1 eat when done unify Done done then think then become phil td Left Right Atomically grab both forks The program is independent of the number of philoso phers dining A dinner of n philosophers can be speci fied by the goal phil 1 Fork1 Fork2 phil 2 Fork2 Fork3 phil n Forkn Fork1 whose execution results in each of the Fork vari
10. such a diverse set of approaches with argu ments against a few members The points regarding ob ject orientation being orthogonal to parallel program ming and monitors being of limited use for parallel processing are well taken But they do not constitute an argument against all the approaches in the vast spectrum of formulations with synchronous or asyn chronous message passing processes threads concur rent objects Actors etc For example consider a lan guage which allows dynamic creation of many medium grained processes includes primitives for asynchronous communication between them and in cludes primitives for dealing with specific types of shared data The Chare Kernel parallel programming system we are developing 7 is an example of such a system The shared data may be as simple as read only variables or as general as dynamic tables Dynamic tables consist of entries with a key and data and allow asynchronous insertions deletions and lookups with optional suspension on failure to find an entry These are similar to some extent to tuple spaces None of the arguments in this section argues against such a sys tem or point out Linda s advantage over such an ap proach For example the arguments against actors are mainly that a they do not support shared data struc tures and b do not allow messages to be sent to not yet created tasks A system such as the Chare Kernel surmounts both these criticisms Because it ex
11. sug gests to aid in choosing the new values seems to me to be worthwhile The mechanism I settled on seems to work but it is essentially a guess I do not know how to prove that it would always work well Basing the new day length and calendar size on the number of empty days recently encountered and the number of chain nodes recently examined per item enqueued or de queued seems like a much more solid mechanism Fi nally comparing the expected gain from a readjust ment to the cost of readjustment before making one seems sensible Before commenting on whether it was inevitable that this algorithm be discovered soon let me describe briefly how I came to create it I was scanning through back issues of Communications one day recreational reading when I came across Jones s 1 excellent article on priority queues I did not even know what a priority queue was until then but I saw immediately that a desk calendar solves the event set problem in O 1 time and my intuition was outraged by the thought that a computer could not also be made to do it in O 1 time I felt sure that if I applied myself to the problem I could do better than was currently being done Since I was seeking publications in order to acquire tenure I began work on the problem immediately When I began work on the problem in earnest how ever I discovered it was more difficult than I had anti cipated as is frequently the case The problems of keeping the numbe
12. system that decide what to execute in parallel when and where Such auto pilots they argue currently do not and probably can never yield as good a performance as an explicitly parallel program written in Linda say However this argu ment overlooks an important possibility allowing the programmers to annotate the logic or functional pro Communications of the ACM grams The types of annotations must be chosen care fully In our experience grainsize contro and data de pendences are two attributes that can be specified fairly easily by programmers For example based on such annotations we have recently obtained excellent speedups on different multiprocessors with uni processor speeds comparable with sequential compiled Prolog programs in our parallel logic programming sys tem based on the REDUCE OR Process Model 6 8 Such annotations are qualitatively different than Linda fying Prolog to borrow the authors phrase because they are highly non invasive The annotated logic pro gram still reads feels very much like a logic program and the programmer does not have to deal directly with management of parallel execution The futures of Multi Lisp 5 and Qlets of QLisp 4 are a few of the other examples of systems that use annotations effectively My third point concerns the authors arguments against a rather broad collection of approaches they have categorized as concurrent objects It is difficult to rule out
13. was built upon VM XA SF code but too late to ship with VM XA SF The modified VM XA SF system was tested in the lab in May 1987 on an IBM 3090 600 a system having six processors with a workload consisting of five thou sand simulated users More users than the shipped ver sion of VM XA SF could handle As we expected the new mechanism reduced the overhead associated with the timer request queue dramatically Subsequently the new mechanism was included in VM XA SP1 and was shipped in April 1988 A spin lock is held during all manipulations of the timer request queue There fore the overhead under consideration is not only the direct overhead from manipulating the queue but also overhead generated by other processors spinning while one processor holds the lock to manipulate the queue Here is a comparison of Randy Brown s mechanism and IBM s VM XA SP mechanism The remarkable similari ties give credence to the idea that there is only one way to do it right Similarities 1 Each day of the queue is represented by a sorted linked list of the request scheduled for that day 2 A table or array containing one pointer for each day of the year is used to find the linked list for a particular day 3 There is no overflow list Requests scheduled for future years beyond the end of the table are wrapped back into this year s table They are placed in the table according to the day they come due disr
14. we enumerate the following hard criteria 1 Be able to define servers that are immune from the misbehavior of others 2 Be able to survive hardware failure 3 Be able to serialize simultaneous events 4 Be able to securely encapsulate entities with state 5 Be able to dynamically create and connect services 6 And finally it must not rely on global constructs These criteria are hard in the sense that languages which are Turing equivalent can formally differ in their abilities to provide security and to react to exter nal events A language which has no means of imple menting robust servers for example cannot simulate a language which can In addition we give the following soft criteria soft in the sense that these properties are not strictly necessary at the distributed language level since higher levels of language can provide them e Can reprogram running programs e Can use foreign services not implemented in the dis tributed language Can define services which can be used by foreign computations Is based upon computational principles which scale Supports the expression of very highly concurrent programs e Supports small scale programming well In 1 we explain why we believe actors and concur rent logic programming languages mect these criteria well except 2 and why functional programming Or paralle Prolog and CSP Occam do not With re spect to assessing Linda for open systems
15. 1 0 666700 0 333700 1 333400 0 000400 0 Z 0 667000 0 334000 1 000100 0 0 667000 3 1 667000 1 334000 0 0 334000 1 667000 4 2 667000 which shows that a match of length 4 is obtained at positions 7 4 The program is more efficient than the Crystal and Linda ones in two respects First it avoids recomputa tion of the auxiliary functions p and q Although in principle the Crystal program can be transformed into a more intricate program that avoids the recomputa tion a compiler capable of performing such a transfor mation has yet to be demonstrated Similarly the Linda program can be transformed manually I presume to store both the P and Q matrices in the Tuple Space in addition to H thus avoiding their recomputation It seems that this transformation will adversely affect pro gram length and readability Second the concurrent logic program specifies a regular process structure with local communication only This structure can be mapped manually using Turtle programs 8 or auto matically to a concurrent multicomputer in a way that near processes reside in near processors or even in October 1989 Volume 32 Number 10 the same processor Efficiently mapping the other two programs which are not specified by a process network and have no obvious notion of locality is more difficult Conclusions I have shown that the semantic gap between concur rent logic programming and the set of constructs of
16. ables being incrementally instantiated to a stream of terms eating Id done with the Id s on each Fork reflecting the order in which its two adjacent philosophers use it For example a partial run of this program augmented with October 1989 Volume 32 Number 10 Technical Correspondence eating and thinking components that take some random amount of time on a dinner of 5 philosophers provided the substitution Forki eating 1 done eating 5 done eating 1 done eating 5 Fork2 eating 1 done eating 2 done eating 1 done eating 2 _ _ Fork3 eating 3 done eating 2 done eating 3 done eating 2 _ _ Fork4 eating 3 done eating 4 done eating 4 done eating 3 done eating 4 done _ Fork5 eating 4 done eating 4 done eating 5 done eating 4 done eating 5 The run was suspended midstream in a state in which Fork4 is free and the second and fifth philosophers are eating using the other four forks Up to that point each of the philosophers ate twice except the fourth which ate three times This output was obtained by running this program under the Logix system 10 a concur rent logic programming environment developed at the Weizmann Institute The program is written in the language FCP f a var iant of Saraswat s FCP 5 The unannotated head arguments are matched against the corresponding terms in the goal atom as in P
17. ality of Linda per se but merely to demonstrate how expres sive CLP is by presenting an elegant solution to an interesting problem But the objection again hinges on the fact that Shapiro s embedding represents a bad way to implement Linda and claims regarding expressivity are presumably to be based on good solutions to inter esting problems The mere existence of some solution is uninteresting these languages are all Turing equivalent in any case One final significant aspect of the embedding experi ment FCP is a complete programming language Linda is a coordination language dealing with creation and co ordination of black box computations exclusively Shapiro has attempted to embed the coordination aspects of C Linda within the complete language FCP a com parable experiment would attempt to embed the coordi nation aspects of FCP within C Linda As we demon strated in our article it is trivial to express CLP style streams in C Linda But this embedding experiment yields code that is efficient enough to be used Such structures are in fact used routinely in Linda programming Shapiro s claims with respect to embedding remind us in short of what might be put forward by an enthu siastic exponent of Lego It is true that you can build all sorts of elegant working models using Lego But it is also true that this piece of good news has not proved sufficient in itself to establish Lego as a mainstay of the cons
18. antages It supports portability of con cept the machinery required for explicit parallelism is always the same no matter what the base language to get parallelism we must be able to create and coordi nate simultaneous execution threads The C C ap proach allows us to transform any base computing lan guage into a parallel language in essentially the same way thus C Linda Fortran Linda Scheme Linda and so on The C C scheme is granularity neutral so far as parallel applications go it allows the programmer to choose an appropriate granularity for his application without having granularity decided or limited by a pro gramming model It supports multi lingual applications the computations that are glued together by our coordi nation model may be expressed in many different lan guages The isolation of orthogonal elements it fosters tends to make natural the incorporation into program ming environments of advances along either axis Fi nally the C C model is an expressive framework for building the interesting class of applications that incor porate features of parallel distributed and information management systems and the view of coordination as a first class phenomenon promoted by the C C scheme simplifies our understanding of the computing environment and suggests new theoretical frameworks for modelling it more comprehensively These claims are explained and defended in 2 Kahn and Miller in effect
19. ants of Prolog s database manipulation operations assert and retract over a database of facts in is a degenerate form of retract in the sense that it uses a home brewed form of matching rather than full fledged unification The rd operation on a Tuple Space similarly corresponds to querying a database of facts it also uses matching rather than unification Assert and retract save concurrency and are the major extra logical operations in Prolog That is essentially all that Linda is offering hence the title of this section October 1989 Volume 32 Number 10 The remaining difference between Linda and the data base manipulation component of sequential Prolog is concurrency Linda allows concurrent Tuple Space op erations whereas by definition sequential Prolog is sequential Concurrent execution in Linda means first that the in and out operations are atomic and second that in suspends if a matching tuple is not found This is in contrast to retract in sequential Prolog which fails in such a case Database oriented versions of Prolog such as LDL 3 are enhanced with the concept of transactions to sup port concurrency control of database update However to my knowledge no one has attempted to use assert and retract as a basic communication and synchroniza tion mechanism as is done in Linda Research in concurrent logic programming has taken a different approach Concurrent logic languages do not incorporate extra logical
20. apers algorithms and technical correspondence over the whole spectrum of programming languages and systems f your interests include design analysis implemen tation evaluation or use of programming systems methodologies techniques or environments you should be reading this authoritative quarterly Among the topics covered in TOPLAS are design and comparison of language features analysis of design and implementation of particular languages descriptions of novel program development tech niques testing and verification methodology user experience with languages or methodologies pro gram specification languages and methods design and implementation of program development envi ronments storage allocation and garbage collection algorithms methods for specifying programming language semantics language constructs for and programming of asynchronous and distributed processes and applications of programming languages or methods to other areas such as databases and office automation Published quarterly ISSN 0164 0925 Included in STN s Compuscience AMS s Mathsci Science Abstracts Science Citation Index Automatic Subject Citation Alert British Maritime Technology Ltd Computer Literature Index Compumath Citation Index Ergonomics Abstracts and Index to Scientific Reviews Order No 108000 Vol 12 1990 Subscriptions 90 00 year Mbrs 22 00 Student Mbrs 17 00 Single Issues 33 00 Mbrs 16 00 Ba
21. arlog86 and GHC 14 In contrast the f annotated terms are unified with the corresponding terms as in Prolog A goal atom can be reduced with a clause if both the input matching and the unification succeed The reduction of a goal with a clause is an atomic action it either suspends or fails without leaving a trace of the attempt or it succeeds with some unifying substitution which is applied atom ically to all the variables it affects The key to the simplicity of the program is indeed the ability of FCP to specify atomic test unification the philosopher atomically tries to grab both forks ex cluding other philosophers from grabbing them The mutual exclusion is obtained by unifying the head of the Fork stream with a term containing the unique Jd of the philosopher Atomic test unification is incorpo rated in all the languages in the FCP family but not in Parlog86 or GHC Programs exploiting atomic test unification cannot be written easily in the latter two languages The deadlock freedom of the program is guaranteed by the language semantics specifically if several pro cesses compete to unify the same variable s one suc ceeds and the others will see the variable s instan tiated deadlock free atomic unification can be imple mented using a variant of the two phase locking protocol see 11 12 Like the Parlog86 and the Linda programs this program is not starvation free It should be noted however that the Linda an
22. as an integrated whole Uniformity Both camps are striving for uniformity but of different sorts The CLP camp strives for uniformity within a language while the Linda camp strives for uniformity across languages There is just one way to communicate in CLP not as in Linda where there is the dialect spe cific way of communicating in the small e g proce dure calls and the tuple space way of communicating between processes There has been research in embedding object oriented functional and search oriented logic program ming languages into CLP by implementing them in a CLP language This approach might be generalizable to deal with programming paradigm interoperability At least programmers would be able to program in the style they wish even though existing programming lan guages would not be supported Research in how these embedded languages might be inter callable remains to be done Communications of the ACM Someone who frequently programs in different con ventional languages C Fortran Lisp etc may value most highly the uniformity that Linda offers As they move from language to language they can deal with process creation communication and synchronization in a uniform manner From this point of view Linda should be compared with distributed operating systems like Mach which also provide similar capabilities as system calls from different languages On the other hand someone looking for a simple yet real concur
23. at you have said CLP communication is more expressive than Linda if you start with CLP you have all the resources of idiomatic CLP streams at your command and if you want Linda you can also have Linda observe how easily we can implement Linda given CLP But this contention as stated is transparently false As Shapiro himself realizes his CLP implementation of Linda is far too inefficient to be usable As he writes they his embeddings are inefficient since they use a naive data structure i e a list to realize a Tuple Space It follows that if a programmer working in the CLP framework indeed needs the expressivity of Linda Shapiro s embedding demonstration is irrelevant He needs a real implementation of Linda A real that is an acceptably efficient implementation of Linda Communications of the ACM would obviously be vastly more complex than the CLP implementation Shapiro presents For one thing C Linda as Kahn and Miller point out in their note relies upon some sophisticated compiler optimizations based upon a global analysis of the restricted ways in which classes of tuples are used A good embedding should be capable of such optimization This point is so obvious that one might suspect Shapiro of having something else a deeper and subtler point in mind We might imagine that his Linda embedding is merely a kind of expressivity experi ment It is not designed to realize the function
24. be written only for the actual write Finally even though total processing time may be no less with write caching the user of an interactive program may appreciate getting control back immediately even if the actual physical writes have not yet been performed Jalics and McIntyre s comments about cache size are useful but then I would never consider using a cache smaller than the programs I run I use a 1 5MB cache This brings up a related problem in the latest crop of 80386 computers they often offer huge main memories RAM with trivial caches 64KB which cannot be ex panded I know of one text editor which performs text insertions by block moving text to make room for the insertion This single and frequent operation can clear the entire cache eliminating its usefulness in an in stant The lessons which Jalics and McIntyre offer need to be better appreciated in other related arenas They seem to feel that caching does not help compi lations significantly but I disagree For one thing I pre load my compiler into cache as described above insur ing rapid loading Another issue is language related use Modula 2 a language which encourages the genera tion of many software modules When you reference a tool module from within an application module the compiler must get the interface specifications for the tool module each such specification resides in a differ ent file Thus if one is re compiling a given application mo
25. c 16 18 In dian Inst Tech Tata Inst Fund Res 1985 pp 347 368 6 Shapiro E The family of concurrent logic programming languages Comput Surveys 1989 To be published 7 Shapiro E Concurrent Prolog Collected Papers Vols i and Il MIT Press 1987 8 Shapiro E Systolic programming A paradigm of parallel processing In Proceedings of the International Conference on Fifth Generation Com puter Systems 1984 pp 458 471 Also Chapter 7 in 7 9 Shapiro F A subset of Concurrent Prolog and its interpreter TR 003 COT Tech Rep Inst for New Generation Comput Tech Tokyo 1983 Also chapter 2 in 7 10 Silverman W Hirsch M Houri A and Shapiro E The Logix system user manual Version 1 21 Chapter 21 in 7 11 Taylor S Parallel Logic Programming Techniques Prentice Hall 1989 12 Taylor S Safra S and Shapiro E A parallel implementation of Flat Concurrent Prolog J Parallel Programming 15 3 1987 245 275 Also Chapter 39 in 7 13 Taylor S Hellerstein L Safra S and Shapiro E Notes on the complexity of systolic programs J Parallel and Distributed Comput 4 3 1987 Also Chapter 8 in 7 14 Ueda K Guarded Horn clauses In Logic Programming E Wada Ed Springer Verlag 1986 pp 168 179 Also Chapter 4 in 7 I was not convinced that Linda 3 has any advantages over traditionally message oriented systems or monitors on today s parallel systems To convince the
26. ce ts1 can be retrieved into the variable SubTs using the goal rd tuple_space ts1 SubTs Ts Once retrieved detecting this requires rd to report suc cessful completion this is an easy extension it can be operated upon out 1 hi there SubTs in 2 hi X SubTs out 3 hi ho SubTs The result of these operations combined is 1 We note thal while multiple Tuple Spaces can be easily realized by multiple Prolog like databases nested Yuple Spaces cannot be easily mapped to Prolog like assert and retract operations October 1989 Volume 32 Number 10 Technical Correspondence Ts tuple 1 _ tuple_space ts1 tuple 1 2 hi there tuple 3 _ hi ho Ts1 tuple 2 _ tuple_ space ts2 Ts2 _ Ts1 SubTs tuple 1 2 hi there tuple 3 _ hi ho Ts1 X there Note that operations on a nested Tuple Space are inde pendent of operations on the parent Tuple Space and hence may use the same set of identifiers Using test and set in and out can be implemented even more simply in FCP f without using unique identifiers A Tuple Space is represented by an incom plete list of terms tuple T Gone where Gone is instan tiated to gone when T is deleted out T tuple T Gone Ts Found tail of Ts insert T out T Ts lt out Id T Ts Slot taken keep looking in T tuple T f gone Ts Found unifiable tuple delete it in T tuple T Gone Ts T f
27. cessing environment we can find that when a request is received it is already due to expire In other words we may re ceive requests that precede the current day cursor These expired requests cannot be readily accommo dated by the general mechanism Fortunately ex pired requests are very rare so we just maintain a separate queue a sorted linked list for them We check this queue first whenever looking for the ear liest event to dequeue Except for the insignificant activity at this special queue our implementation should run in O 1 time Gerald A Davison IBM Corp 48TA 914 Neighborhood Road Kingston NY 12401 October 1989 Volume 32 Number 10 REFERENCES 1 Varghese G and Lauch T Hashed and hierarchical timing wheels Data structures for the efficient implementation of a timer facility Operating Syst Rev 21 5 Nov 1987 25 38 AUTHOR S RESPONSE I applaud Gerald Davison s letter although I would per haps dispute his main contention I applaud his letter because he describes the solution to a major weakness of the calendar queue algorithm the mechanism for choosing calendar length and length of day The calen dar queue as I described it readjusts these parameters only when the queue size changes by a factor of two which may be too seldom One can visualize them needing change even in a situation where only hold operations are being performed In addition gathering efficiency statistics as he
28. ch one can embed one language within another is important There are many subtle issues here First we do not see how a naive implementation of Linda in a CLP language dem onstrates much If the point is about how expressive CLP is and how these are real languages for implementing systems in a serious way then one needs to present a serious implementation C Linda for example relies on some sophisticated compiler optimizations based upon a global analysis of the restricted ways in which classes of tuples are used A good embedding should be capable of such optimizations There are several other objective means of comparing languages besides embeddings which we explored in the paper Language Design and Open Systems 1 These other criteria do not correspond to expressive ness as normally defined Rather they relate to issues that appear in open systems large distributed systems with no global trust information or control Some of these issues are quite important with respect to a form of expressiveness which gets surprisingly October 1989 Volume 32 Number 10 little attention modularity To quote from Logical Secrets 3 Trust is a form of dependency between entities by reducing dependencies the modularity of the system is increased Thus the degree of cooperation pos sible among mutually mistrusted agents is one measure for how viable a language is in supporting a certain kind of modularity In 1
29. ck Volumes 132 00 Mbrs 64 00 Student Mbrs 15 year Please Send All Orders and Inquiries to P O Box 12115 Church Street Station New York New York 10249 Circle 110 on Reader Service Card
30. complished through a standard messaging interface The overall dif ference in the approach is that of designing a system instead of hacking it together Carriero and Gelernter also did not define what good speedup is when they ran Linda on a 64 node iPSC hypercube How does their speedup compare with that obtained by systems such as CrOS HI from CalTech or its commercial equivalent Express Fox et al also claim high efficiency for a large number of problems and state the values as being between eighty and ninety five percent 9 Finally what Gelernter and Carriero have not shown is that Linda offers either a run time performance ad vantage over a message based system or that the algo rithm design coding and implementation is signifi cantly easier What they have shown is that Linda does run on a number of hosts but to get adequate perfor mance custom read expensive hardware is needed This is a problem in a commercial environment with customers who are very price sensitive The currently available messaging systems offer higher performance more portability and surprisingly similar programming models to Linda and as such they are currently more commercially viable Craig Davidson Mechanical Intelligence 922 Grange Hall Rd Cardiff CA 92007 REFERENCES 1 Braner M Brenda A Tool for Parallel Programming Cornell Theory Center Ithaca N Y 1989 2 Braner M Trollius User s Reference Manual Cornell Theory C
31. computations any ordinary program ming language will serve the purpose The coordination model is a glue that can bind many separate active computations into a single composite program A co ordination language is the embodiment of a particular coordination model a coordination language allows us to create and to manage information exchange among a collection of independent asynchronous activities which may be separated either by space or by time Our point in introducing the term transcends nomen clature The issue is rather that Linda and other sys tems in its class are not mere extensions to base computing languages rather they are complete languages in their own right albeit coordination rather than com puting languages they are independent of and orthogonal to not mere extensions of the base computing languages with which they are joined Approaching coordination as one of two orthogonal base axes that jointly define the space of programming environments clarifies the significance of systems like Linda and conduces to conceptual economy allowing us to relate a series of superficially separate program ming problems ranging from file systems and databases to distributed and parallel applications We refer to the orthogonal axis approach as the C C scheme computation model plus coordination Communications of the ACM 1257 Technical Correspondence model This approach has a series of what we regard as significant adv
32. d Parlog86 solu tions can be made starvation free with the addition of standard fairness assumptions This is not true for our solution since two philosophers sharing a common neighbor can conspire against this neighbor so that his grabbing the two forks is never enabled leading to his Communications of the ACM Technical Correspondence 1246 starvation If this is to be avoided the FCP t program can be enhanced with the tickets method so that it achieves starvation freedom under similar fairness as sumptions The resulting program is very similar to the Linda one and still much simpler than the original Parlog86 one Assigning unique identifiers to philosophers can be avoided if the language is strengthened with read only variables 6 or with a test and set primitive We con sider the second extension In FCP f an annotated term in the clause head denotes an output assignment of the term to a variable in the corresponding position in the goal The assignment fails if the corresponding position is a non variable term It can be mixed but not nested with f annotations and all assignments and unifications should be done atomically In FCP f an anonymous dining philosopher can be specified by phil eating done Left Right phil Left Right Left is eating wait until he is done phil Left eating done Right phil Left Right Right is eating wait until he is done philfeating Done L
33. d that the applicability of a parallel system to a wide range of problems is heavily dependent on the efficiency in which the processors can coordinate and share data In distributed memory systems this is the efficiency of the message system Applications are already limited by the message system in many multi processors an order of magnitude slower message system would make the problem worse The authors claim that C Linda results in code which is more readable than Parlog86 or Crystal is true but that it is different from message based systems is not true The out command of Linda can easily be October 1989 Volume 32 Number 10 replaced by a message send and the in by a message read Compare the following occam code with the C Linda example for the Dining Philosophers Problem Note that in the occam program 11 the equivalent of the tuple space manager requires only twenty four lines of code The remaining program requires fewer lines of code in a lower level language than C Also note that no call to a mysterious initialize routine of unknown length is required PROC philosopher CHAN OF INT left right down up WHILE TRUE SEQ think down 0 get your room ticket PAR left 0 get the left chopstick right 0 and the right at the same time oe eat PAR left 0 put down the chopsticks right 0 up 0 and leave the room The following 2 routines implement the tuple space PROC chops
34. dule repeatedly having the interface specification files in the cache speeds up compilations The key issue here is that in some languages more files must be ref erenced than just the source and object files Thank you for printing an article such as that written by Jalics and McIntyre The Communications is often both dry and marginally applicable to the real world I encourage you to keep printing articles with relevance to those of us in the trenches Jon Bondy President JLB Enterprises Inc Box 148 Ardmore PA 19003 AUTHORS RESPONSE We agree with Mr Bondy that disk fragmentation can be a very substantial problem Our goal was to repeat experiments in as repeatable an environment as possi ble and concentrate on only one aspect caching We do Communications of the ACM agree that I O starts under a primitive operating sys tem like DOS have relatively small overhead when compared to full scale operating systems with multipro gramming While we believe caching mechanisms should be transparent to the user we do not for a moment deni grate the performance potential of sticky modes for files or preloading certain files into a cache We did not say and do not mean that write caching is unimportant It certainly produces impressive results in the hardware cache and as Mr Bondy indicates in some software cache applications as well However what we did say is that write caching with software caches is not as
35. e computation namely the similarity matrix Hm They provide each element H of the matrix to its corresponding cell pro cess which instantiates H to its final value Although this schema for constructing recursive pro cess networks may require some meditation the first time encountered programs employing it can be con structed almost mechanically and understood with little effort once the schema is grasped Communications of the ACM h Xv Yv Hm Hm is the similarity matrix of vectors Xv and Yv h Xv Yv Hm matrix Xv Yv Hm Bs initialize Bs matrix X Xv Yv Hv Hm f Bs row X Yv Hv 0 0 Bs Ts matrix Xv Yv Hm Ts matrix Yv f Bs row X Y Yv H Hv f Left Bottom Bs 7 Top Ts t cell X Y H Left Right Bottom Top row X Yv Hv Right Bs Ts row X f Left 1 1 1 initialize 0 0 0 f Bs initialize Bs initialize J cell X Y H HI Pl H P f Hbl Hb Qb HI H Q compare X Y D P max Hl 1 333 Pl 0 333 Q max Hb 1 333 Qb 0 333 H max Hbl D max P max Q 0 compare X X 1 1 compare X Y 0 3333 1 X Y true For example a run of the program on the initial goal h a b a a b c d a b c d e Hm produces the answer Hm 1 0 0 0 0 0 2 0 667000 0 334000 0 001000 1 0 667000 1 666700 0 333700 0 000700
36. ed a monitor in Linda Linda only moves these problems to a different location but still implements a monitor to handle the tuple space On a distributed memory environment such as is needed for a Transputer based or hypercube system each processor must have a monitor task and a message routing task operating in parallel to the user s tasks The following description is from Cornell s implemen tation of Linda called Brenda and is based heavily on Carriero and Gelernter s work at Yale 1 Brenda operates by having a server process running on each processor that may have objects stored in its address space The library functions send messages to these servers and receive replies In other words a Brenda call is a classic Remote Procedure Call with some twists The storage processor is deter mined by a randomizing hashing function of the tag When an object is to be grabbed the request is sent to the server on only one processor as deter mined by the same hashing function In a system with many parallel processors the execution of one Brenda function typically involves the transmission of several messages over a path that passes through several processors Some tricks are utilized to reduce the number of messages passed whenever possible The speed of execution of Brenda functions as compared with the speed of sequential computation is a crucial parameter It determines what algo rithms should be chosen for a given task o
37. eft eating Done Right eat when done unify Done done then think then become phil Left Right Atomically grab both forks A dinner of n anonymous philosophers phil Fork1 Fork2 phil Fork2 Fork3 phil Forkn Fork1 is of course not as interesting Each Fork variable is bound to a stream of eating done Embedding Linda Sure the picture of Linda versus concurrent logic lan guages looks different after rehabilitating the dining philosophers But is this the whole story Both in Linda and in concurrent logic programs pro cesses communicate by creating a shared data struc ture A stream of messages between two concurrent logic processes is just a shared data structure con structed incrementally from binary tuples However a shared structure in Linda cannot contain the equiva lent of logical variables The ability to construct and communicate structures with logical variables incom plete terms incomplete messages is the source of the most powerful concurrent logic programming tech niques The proposed extension to Linda of nested Tu ple Spaces is yet another approximation from below of the power of the logical variable as will be demon strated shortly Another difference between Linda and concurrent logic languages is that Linda incorporates no protection mechanism any process can access any tuple in the Tuple Space by reinventing the appropriate keywords Communications of the ACM I
38. egarding the year They are placed in the sorted linked list with the year taken into account in the sorting 4 The size of the table length of the year and the width of a day are both adjusted dynamically 5 The table size is always a power of 2 The width of the day is also a power of 2 6 We use a special direct search to find the next request to dequeue whenever the forward scan fails to find anything in an entire year 7 We maintain a current day cursor to mark where we are in the scan This cursor consists of several data fields including index to the table day num ber and time at end of current day 8 We have similar targets for efficient operation Randy Brown recommends that at least seventy five percent of the requests should fall within the coming year the VM XA target is to get eighty and a half percent seven eighths within the coming year Communications of the ACM RM Technical Correspondence Randy Brown recommends that three or four re quests from the current year should be found in the current day whereas our target is five Differences Extensions 1 Our operating system requires a cancel request function To support this function our chains of re quests are doubly linked forward and backward so that the chain can be easily closed after a request is removed Also each request contains a pointer to the anchor word of its chain The anchor may re
39. enter Ithaca N Y 1988 Communications of the ACM Technical Correspondence 1252 ww Carriero N and Gelernter D Linda in context Commun ACM 32 4 Apr 1989 444 458 4 Fox G et al Solving Problems on Concurrent Processors Vol I Prentice Hall Englewood Cliffs N J 1988 5 Inmos Lid occam 2 Reference Manual Prentice Hall Cambridge Mass 1988 6 Inmos Ltd Transputer Development System Prenlice Hall Cambridge Mass 1988 7 ParaSoft Corp Multitasking Running Multiple Processes under Express Mission Viejo Calif 1988 8 ParaSoft Corp Express Documentation Mission Viejo Calif 1988 9 Perihelion Software Helios Technical Manual Shepton Mallet United Kingdom 1988 10 3L Ltd Transputer Pascal Documentation 31 Ltd Edinburgh United Kingdom 1987 11 Welch P H Occam and transputer engineering class notes Univ of Kent Canterbury 1989 I read Carrierro and Gelernter s article Linda in Con text in the April 1989 issue 2 with interest As one of the first pragmatic machine independent parallel programming languages Linda is worthy of much praise However as the theme of the article is a demon stration of the superiority of Linda and the tuple space approach over other extant approaches to parallel pro gramming I find it necessary to point out a few errors of omission One approach that was omitted in their discussion is logic programming This statement may a
40. entifier Id out Id T tuple Id _ T Ts 7 Found tail of Ts insert T out Id T Ts lt out Id T Ts Slot taken keep looking An in operation with tuple template T and identifier InId finds a term tuple _ InId T in the Tuple Space such that T unifies with T and InId unifies with InId in ld T Ts lt Tuple T in Tuple Space Ts deleted by an in operation with identifier Jd in ld T tuple _ Id f T 1 Tsp Found unifiable tuple delete it in Id T tuple _ Id T Ts Id T Id T in ld T Ts Tuple deleted or doesn t unify keep looking The precise interrelation between in and rd is not speci fied in the Linda article Assuming it is defined so that a tuple deleted from the tuple space by in should not be visible to any subsequent rd the corresponding concur rent logic program is rd T tuple _ Inid T 1 Ts var InId true Found tuple that has not been deleted yet rd T tuple _ InId T Ts foo T Inld T rd T Ts Tuple deleted or doesn t unify keep looking The program assumes that foo is not used as an identi fier It uses the var guard construct to check that InId is still a variable i e the tuple has not been deleted by a October 1989 Volume 32 Number 10 concurrent in process If weaker synchronization is al lowed between in and rd then the corresponding logic program may use weaker synch
41. f the embedding it has been extensively used for comparing concurrent logic languages cf 6 In a response to an earlier draft of this letter Carriero and Gelernter imply that embedding one language in Communications of the ACM 1247 Technical Correspondence 1248 another is always such an easy and trivial task that showing its possibility for two particular languages is hardly interesting This is far from being truc Try embedding Linda in Occam in a parallel functional language or in an actor language and see the mess you get Conversely try embedding these models and some concurrent logic language in Linda I doubt if the re sults of this exercise will help in promoting Linda but they will certainly increase our understanding of the relationship between Linda and the rest of the concur rent world The DNA Example Another example in Carriero and Gelernter s article is the DNA sequence analyzer program They show a functional Crystal program and a Linda program for solving it and argue 1 that the functional program is only slightly preferable on readability and elegance grounds and 2 that the Linda program is better in giving explicit control of parallelism to the programmer and in not relying on a sophisticated compiler to decide on its parallel execution I show below a concurrent logic program solving this problem that is shorter clearer and specifies a more efficient algorithm than the other two The p
42. fered by Linda can be closed with just six clauses This I believe is not because Linda was consciously designed with the concurrent logic programming model in mind or vice versa Rather finding that an ad hoc construct for concurrency is easily embedded in the concurrent logic programming model is a recurrent phenomenon This was demonstrated in various contexts for streams futures actors monitors remote procedure calls guarded synchronous communication atomic transac tions and other constructs of concurrency 6 7 I be lieve that this is a consequence of and the best kind of evidence for the generality and versatility of concur rent logic programming Concurrent logic languages are not just abstract com putational models general and versatile as they may They are real programming languages This may be sur prising given their fine grained concurrency For exam ple in the Logix session required to compile test de bug and produce the sample runs of the programs shown in this note 283 711 processes were created and thirteen minutes of Sun 3 CPU were consumed Since the Logix compiler and programming environment are written entirely in FCP most of these processes were created during system activities such as compilation rather than during the actual runs of the programs See 11 for a report on the parallel performance of FCP A fair estimate is that more than 50 000 lines of FCP code were developed using the Logi
43. hese macro agents is not very different from Actors a tuple space is like a mailbox or CLP a tuple space is like a logic variable M Linda seems to have many of the flaws of which the Linda researchers accuse message passing systems and CLP languages If the Linda people are effectively arguing that S Linda is superior to Actors and CLP for parallel pro gramming in the small and medium while M Linda gives you the good properties of Actors and CLP in the large at the costs of Actors and CLP that would in deed be an interesting and significant argument It would however concede our point about uniformity Historical Footnote Linda very closely resembles Bill Kornfeld s Ether lan guage from the late 1970s 17 The major difference being that Ether had constructs equivalent to Linda s rd out etc but not in Acknowledgments Much of this note is derived from a network discussion which included Ehud Shapiro Nicholas Carriero Graem Ringwood Ian Foster and ourselves We are also grateful to Jacob Levy Vijay Saraswat and Danny Bobrow for their comments Kenneth M Kahn Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto CA 94304 Mark S Miller Xanadu Operating Company 550 California Avenue Palo Alto CA 94306 REFERENCES 1 Kahn K and Miller M S Language Design and Open Systems In The Ecology of Computation North Holland 1988 2 Kornfeld W Ether A paralle
44. ies We are happy to concede for what it is worth that FCP f s dining philosophers solution is far more concise than Parlog86 s Finally Shapiro s remarks on the DNA example are irrelevant We gave Linda code for one algorithm Shapiro gives the CLP implementation of a different algorithm Shapiro himself notes and surely it is ob vious that we could write a Linda program to imple ment this other algorithm as well It is trivially clear that in order to compare the expressivity of two lan guages we must control for the problem being solved We claimed in our article that Linda was in gen eral more elegant more expressive and more efficient than CLP Nothing in Shapiro s note challenges this claim We continue to believe that a Linda based paral lel Prolog is potentially a more elegant expressive and efficient alternative to the CLP languages Shapiro dis cusses Several groups appear to be contemplating such systems and we look forward to their results Other Comments We have no fundamental argument with Professor Kale s note Obviously we reject his claim that the uniformity of tuple space operations may be detrimen tal the approach Kale has taken in his Chare Kernel is far too complicated for our tastes As he intimates in his note he has provided a system in which there is a vaguely tuple space like communication construct and a large number of other devices as well Although none of the others is
45. l problem solving system In IJ CAI 79 August 1979 3 Miller M Bobrow D Tribble E and Levy J Logical secrets In Concurrent Prolog Collected Papers Vol I The MIT Press 1987 pp 140 161 AUTHORS RESPONSE Of the comments published here we believe Kahn and Miller s to be by far the most interesting and signifi cant We have responded to Kahn and Miller our in We think a promising direction tu explore is a CLP like language wilhout logic variables which instead uses tuple spaces for communication Communications of the ACM 1255 Technical Correspondence 1256 tent was not so much to refute their comments as to amplify them and to explore their implications in a brief separate paper called Coordination Languages and their Significance 2 We summarize the paper s main points after addressing other comments We are glad that Professor Shapiro agrees with us on the topics of concurrent object oriented and functional languages and we appreciate the candor of his admis sion that The time when this loss in performance sus tained by concurrent logic languages as against systems like Linda is small enough to be easily compensated by the gain in expressiveness has yet to come But in our view his comments add nothing significant to our com parison between Linda and concurrent logic languages We will consider his main points in what we take to be descending order of significance Shapiro
46. ly pro gram in Organizations prefer to minimize the re training of their programmers 2 More and more applications are multi lingual More and more programmers deal with multiple lan guages as they work Concurrent logic programming has traditionally been addressing another problem namely how can one de sign a single language which is expressive simple clean and efficient for general purpose parallel com puting It does not mix concurrent and sequential pro gramming but instead takes the radical stance that all one needs and should ever want is a good way to describe concurrent computations The challenges are to design and understand such a language to provide useful formal models of the language to implement it well and to discover important programming tech niques The great hope is that this combination of being both semantically clean and real will enable radically new tools e g partial evaluators verifiers algorithmic debuggers program animators etc The practicality of such tools may rest upon the sim plicity of the language In principle such tools could be built for C Linda Fortran Linda etc but I expect they would be too complex to do in practice The tools the Linda folk have developed such as the Tuple space visualizer only deal with the part of the computation which occurs in Linda This is rather incomplete What is needed are tools that can deal with computation and communication together
47. m to file systems are handled through the built in communication primi tives As for different types of languages communicat ing the only way they are allowed to share data is through message passing Since each language has its own requirements for data representation in stack Communications of the ACM Technical Correspondence frames the easiest way to coordinate different lan guages is through message passing The basic messaging system defined in occam docs not allow for buffering since all the message passing is handled at the hardware level This restriction does require blocking send communications with a result of well synchronized code If the algorithm does not need to be synchronized buffer processes may be introduced to allow non blocking message sends Messaging environments such as Express Trollius or Helios provide the equivalent of the buffering and rout ing capabilities of Linda with much lower overhead 2 8 9 Express even offers parallel task invocation in the form of an interrupt routine on receipt of a mes sage of a given type 7 I am stressing the run time overhead of Linda since performance is the key issue in production programs on parallel processing systems Although the authors have dismissed monitors because the normal method of im plementation requires a single address space and a synchronous procedure call to get access to the data protected by the monitor they have effectively imple ment
48. n contrast shared logical variables cannot be rein vented and are unforgeable and therefore they pro vide a natural capabilities and protection mechanism 2 concurrent logic processes can only access struc tures bound to variables they were given explicit _ access to Another difference is that in a concurrent logic lan guage a shared structure can only be increased mono tonically by further instantiating variables in it and can only be reclaimed by garbage collection when un referenced In Linda tuples can be explicitly deleted from the shared Tuple Space but the Jack of discipline in Tuple Space access makes garbage collection impos sible The similarity between the cooperative incremental construction of logical terms and Linda Tuple Space operations is perhaps best illustrated by showing a con current logic program implementation of Linda s primi tives For simplicity the concurrent logic program rep resents the Tuple Space as an incomplete list of terms of the form tuple OutId InId T The operations are defined of course so that several processes can per form them concurrently on the same Tuple Space Each in and out operation must be given a unique identifier for the purpose of mutual exclusion An out operation with tuple T and identifier OutId inserts the term tu ple Outld _ T at the tail of the Tuple Space list out Id T Ts lt Tuple T inserted to Tuple Space Ts with an out operation with id
49. oo T Gone in T Ts Tuple deleted or doesn t unify keep looking Supporting eval in a language that already supports dy namic process creation seems redundant so we do not address it The concurrent logic programs for out in and rd are concise executable specifications of the semantics of these operations But they are inefficient since they use a naive data structure i e a list to realize a Tuple Space Their efficiency can be increased by applying the same ideas to a more sophisticated data structure such as search tree or a hash table as done in real implementations of Linda Embeddings of concurrent object oriented languages and of functional languages in concurrent logic lan guages have been reported elsewhere The trivial embeddings of Linda in FCP f and in FCP f are further evidence to the generality and versatility of concurrent logic programming Embedding as a Method of Language Comparison Language comparison is a difficult task with no agreed upon methods There are many soft methods for com paring languages and we have seen them being used in arguments in the sequential world ad nausia No reason why we should make better progress with such argu ments in the concurrent world However I know of one objective method for comparing and understanding the relations between languages and computational models embedding compiling one language in another and investigating the complexity o
50. orward past empty anchors in the table In assembler this takes just one instruction in the loops to increment a reg ister We keep the following statistics a Count of recent enqueue operations b Count of nodes skipped recently while scanning chains to find the insert point c Count of recent dequeues of lowest valued request d Count of empty anchors skipped recently while scanning forward through the table to find lowest valued request e Count of requests currently in the queue This data is checked once every thirty seconds to see if the width of day or size of table should be changed 4 Our goal is to have the width of day set so that on the average we are dequeuing for expiration five requests from each day of the table before advanc Communications of the ACM ing to the next day If the statistics show that we are above one hundred and eighty percent of that or below fifty five percent of it then we consider halv ing or doubling the width We also keep statistics on how often we make a change Re organizing the table causes quite a bit of overhead so we do not want to oscillate back and forth every thirty seconds between the same two day widths The decision to make a change is in two parts first to decide if changing the day width or table size might reduce overhead and second to de cide if making the change is worth the effort We allow the table to grow but not to
51. passing or Actor like systems do not suffer from the disadvantages attributed by the authors to specific systems within this broad class In fact the uniformity of tuple space operations may be detrimental because it passes on the burden of identifying the type of communication to the compiler while forcing the programmer to hide that informa tion from the system Linda does make a significant contribution The tuple space approach is a unique and interesting way of pro gramming which in addition can lead to good speeds and speedups over the best sequential programs on multiprocessors available today Only the context of other competing approaches to parallel programming is much richer than the authors argue L V Kal Dept of Computer Science October 1989 Volume 32 Number 10 Technical Correspondence University of Illinois at Urbana Champaign 1304 W Springfield Ave Urbana IL 61801 REFERENCES 1 Athas W C and Seitz C Multicomputers Message Passing Concur rent Computers Computer Aug 1988 9 24 2 Carriero N and Gelernter D Linda in context Commun ACM 32 4 Apr 1989 444 458 3 Dally W J A VLSI Architecture for Concurrent Data Structures 1987 4 Gabriel R P and McCarthy J Queue based multi processing Lisp In ACM Symposium on Lisp and Functional Programming August 1984 5 Halstead R Parallel Symbolic cornputing Computer 19 8 Aug 1986 6 Kale L V Parallel exec
52. plicitly supports dynamic shared data structures it is possible to deposit data in such structures and to have another new process pick that data later on Of course the au thors may not have known about this particular system But they seem to be arguing that ali of the possible approaches in this broad spectrum are irrelevant for parallelism In fact the Chare Kernel like approaches may have some advantages over Linda on the basis of the authors own criteria The communication of data is much more specific in Chare Kernel In particular when a process A needs to communicate data to process B it specifi October 1989 Volume 32 Number 10 cally directs the message to B when many processes share a dynamic table they may read and write to this shared and distributed data structure In contrast in Linda when a tuple is outed it is not immediately clear what sort of sharing and communication is to take place The programmers usually know what the sharing is to be but they are not allowed to express their knowledge in Linda leaving the burden to the compi ler In a few cases that the programmers cannot specify the type of communication it is reasonable to use the most general mechanisms The compiler their own auto pilot cannot be expected to be perfect regarding this and even if it were it would be unnatural for programmers to hide this information from the sys tem Yet knowledge of communication patterns is e
53. ppear strangely superficial as the authors devote a substantial portion of the paper to CLP Concurrent Logic Program ming However CLP is not at all the same as parallel execution of logic programs The CLP approach in volves logic based explicitly parallel languages which a allow specification of processes with tail recursive predicates and b use streams represented as partially instantiated Logical Variables for specifying commu nication between processes The logic programming ap proach in contrast is an implicitly parallel one much like the Functional Programming approach considered in their article Logic programming formulations are useful among other domains in symbolic computation applications involving search backtracking and prob lem reduction There are over a dozen research groups not counting the CLP related groups working on effi cient parallel execution schemes for logic programs The authors may wish to use the same arguments against logic programming as they used against functional pro gramming but this significant effort certainly deserves consideration Functional and logic languages may not be appropri ate for expressing certain types of computations How ever the authors contend that even for domains where computations can be naturally expressed in such lan guages they are not appropriate parallel programming languages because they rely on an auto pilot the compiler and the run time
54. programming there are really two Lindas Linda in the small and medium which relies on a single shared tuple space and Linda in the large which relies on multiple first class tuple spaces Let us call the first S Linda and the second M Linda for Single versus Multiple tuple spaces S Linda does not satisfy properties 1 or 6 The rea son is that any other process can in any of the tuples that a given server is trying to serve and thereby dis rupt communication between that server and its client This is clearly unacceptable for a secure distributed system but also makes reasoning about the correctness of programs a less local and modular affair One cannot show that a given subset of a Linda system will proceed successfully without knowing at least something about the correctness of all computations participating in that Linda system October 1989 Volume 32 Number 10 Technical Correspondence M Linda seems to satisfy all the criteria as well as Actors and CLP In M Linda used for open systems there would be at least one tuple space private to each trust boundary Therefore with regard to open system properties we can regard all computation proceeding inside a single tuple space as a single agent Presum ably the way such multiple agents interact is by creat ing new tuple spaces passing tuple spaces around and having processes that communicate to more than one tuple space The computational model of M Linda among t
55. quire updating when a request is removed and this allows the anchor to be located quickly 2 When inserting a new request in a queue we can scan the chain either forward or backward We are careful about making this choice as there is a possi bility that there may be a few very long queues If they exist we do not want to be scanning them in the wrong direction Such queues can exist if there are some favorite times that people request There would be such times as now plus ten seconds in finity minus one second etc Infinity here means the largest time the clock can handle all 1 bits in the time field If the new request is one of these then it is equal to or greater than the last such request in the chain So it makes sense to scan the chain backward from the far end This is also true in the general normal short queue case We would expect that the new request will most likely be later than most of the requests already in the queue There is one case however where we do a forward scan when inserting into a queue We scan forward if the last request in the chain is within one hour of infinity suggesting that this queue may be one of the long ones and the new request is not within one hour of infinity 3 We do not do trial dequeuing to sample the over head Instead we track the overhead continuously by counting the nodes that we pass whenever we scan a chain and also when we scan f
56. r of days in a calendar and the length of day appropriate and of handling overflow all in O 1 time soon presented themselves I prayed over the problem asking the Lord s help and began men tally examining possible solutions After several days work I arrived at essentially the published algorithm In my case at least the discovery of the algorithm did not come from an extensive knowledge of the litera ture Nearly everything I knew about priority queues came from Jones s article The driving force behind my work was a strong intuition and a good understanding of data structure techniques in general If calendar queues are an invention whose time has come then the necessary foundation is the general state of a computer science plus the information in Jones s article It is not obvious to me that these were sufficient to guarantee October 1989 Volume 32 Number 10 Technical Correspondence discovery of the algorithm I will note however that an article such as the one written by Douglas Jones is a powerful aid and incentive to invention I would like to take this opportunity to thank him for his article Since the algorithms Gerald Davison and I discovered are so nearly identical in spite of having considerable complexity it may be that the requirements of the problem do determine the structure of the solution rather precisely I would like to complement Gerald Davison on his excellent work Randy Brown Dept of Electrical
57. r whether Brenda is suitable for the task at all The execution of one Brenda function call is typically an order of magnitude slower than passing one message to the nearest neighbor It takes on the order of 10 milli seconds in the current implementation In addition to the communication delay the proces sor which is handling the tuple space is performing a non trivial amount of work to search update garbage Communications of the ACM collection and otherwise maintain the tuple space This work must significantly reduce the overall per formance of the system If it did not why build custom hardware to handle the tuple space functions One aspect of the tuple space that was not discussed in great detail in the article by Carriero and Gelernter 3 was the reading of values from the tuple space Whereas the insertion of values into the tuple space is non blocking retrieval of the data is and can become a bottleneck that has no equal in a well designed algo rithm on a message based system With distributed memory to read a tuple from the tuple space a mes sage must be sent to the appropriate processor the tuple space must be searched for a match the required tuple must then be sent to the requesting process which can then continue In a message based system the algo rithm can be constructed so that the required data is in a buffer in the processor when needed The task need only input from the buffer without the overhead of
58. reducing the number of timer requests per user while still keeping the O N mechanism Obviously that was only a tem porary solution While running VM XA SF another predecessor of VM XA SP in the lab with experimental workloads large N there were hints that the O N timer request queuing was becoming a problem Some time in 1986 I devised and then implemented and de bugged a timer request queue using a heap mecha nism But we did not bother testing the heap in the lab with interesting workloads large N because before fin ishing the heap I had figured out how the timer request queue could be better implemented using a hash table I had toyed with the hash table idea ear lier but at first could not see how to make it work Normally a practitioner might be satisfied to stay with the heap at that point since the heap could be expected to produce a dramatic overall reduction in overhead for the large N to be tested in the lab But in this case there was reason to go beyond the heap Al though the heap would give a vast reduction in en queuing overhead nevertheless there would be a mod erate increase in dequeuing overhead This was a concern because due to a peculiarity of the VM XA operating system the overhead on the dequeuing side October 1989 Volume 32 Number 10 is more critical than the overhead on the enqueuing side So I discarded the heap and implemented the new hash table mechanism This implementation
59. rogram specifies a soft systolic algorithm along the lines of the verbal description of the problem by Carriero and Gelernter See 8 for the concept of systolic programs and 13 for an analysis of their com plexity An array process network is spawned one cell process for each pair i j with near neighbors connec tions Each cell process receives from the left the tuple HI PI containing the h and p values from its left neighbor and from the bottom the tuple Hbl Hb Qb containing the bottom h and q values as well as the diagonal bottom left h value It computes the local values of the functions p q and h and sends them in corresponding tuples to its top and right neighbors The top neighbor is also sent HI the h value from the left The matrix and row procedures realize the standard concurrent logic programming method of spawning a process array network 8 The matrix process is called with two vectors Xv and Yv and a variable Hm It recur sively spawns rows with adjacent rows connected by a list of shared variables the Ts and Bs of each row are its top and bottom connections Each row process re cursively spawns cell processes A cell process has ele ments X and Y one from each vector and it shares with its left right top and bottom cell processes in its Left Right Top and Bottom variables respectively Dur ing the spawning phase the matrix and row processes construct the skeleton of the output of th
60. ronization as well These programs can be executed as is under the Logix system A Logix run of the following goal consist ing of several concurrent operations on an initially empty Tuple Space Ts out 1 hi there Ts in 2 hi X Ts in 3 hi Z Ts rd hi Y Ts out 4 hi ho Ts results in the following substitution X there Z ho Ts tuple 1 2 hi there tuple 4 3 hi ho _ indicating that hi there was inserted to Ts with opera tion 1 deleted with 2 and hi ho was inserted with 4 and deleted with 3 In this particular run in 3 hi Z Ts reached the second tuple before rd hi Y Z so the rd process remains suspended If we execute an additional out operation out 5 hi bye Ts then both out and the suspended rd terminate which results in the substitution Y bye Ts tuple 1 2 hi there tuple 4 3 hi ho tuple 5 _ hi bye _ Note that the last tuple inserted has not been deleted yet so its InId is still uninstantiated This implementation uses genuine unification rather than the home brewed matching mechanism of Linda Hence tuple templates and tuples in the Tuple Space may contain logical variables Hence this implementa tion can realize nested Tuple Spaces as is For exam ple the following goal sets up in Ts a Tuple Space consisting of two nested Tuple Spaces called ts1 and ts2 out 1 tuple_space tsi Ts1 Ts out 2 tuple_space ts2 Ts2 Ts Tuple Spa
61. s sential for efficient implementation on message passing machines Although there is a large body of Linda ap plications reported in literature the subset for which the performance data on iPSC 2 hypercube is available is fairly small and consists of simpler programs This may simply be indicative of the transient state of de velopment of that system However expect that when they attempt to port all the other Linda programs to the hypercube and or write more complex Linda programs with more complex communication patterns to run on hypercube the difficulties mentioned above will sur face It is not clear whether the Linda approach can be made to be viable in that situation The work of Athas and Seitz 1 on Cantor and Dally s work 3 on Concurrent Smalltalk on the Jelly Bean machine represents other interesting extensions to the actors and object oriented models that are viable in a large fine grained multi computer and they are both constructing such machines To summarize a logic programming is an approach to parallel programming on par with functional pro gramming and is distinct from concurrent logic pro gramming b the objections raised against efficiency of parallel execution schemes for functional or logic lan guages can be surmounted with simple annotations So such schemes present fairly effective approaches in ap plication domains where they naturally apply and c appropriate extensions of message
62. shrink After the load on the system has peaked the table size no longer changes However we do continue to adjust the width of the day up or down as necessary As the load falls off after a peak the width can increase to prevent the critical part of the table the part just ahead of the cursor from becoming too sparsely pop ulated It is true that if the distribution of times requested is very erratic e g bimodal then we might benefit from allowing the table to shrink to reduce the time spent in occasional long forward scans but so far we have not found this to be neces sary Before we will make the table bigger two conditions must be met a the density queued requests di vided by table size must justify a bigger table this is the only condition that Randy Brown requires and b the overhead of enqueuing a request must have exceeded a threshold The threshold is based in a complex way upon the targets of eighty seven and a half percent and criterium 5 mentioned earlier Ra tionale there seems to be no point in making the table bigger even when it is very dense if the density is not causing overhead And we can hope that our backward scan when inserting into the sorted linked lists will keep overhead low even when the table is densely populated Whether or not these considerations have actually resulted in our table being smaller than it otherwise might be is unknown In our multi tasking multipro
63. stick there and are never unloaded Whenever I boot my PC I do a complete directory scan with set Flash in sticky mode There after I never pay seek penalties when accessing the directory structures on my disk This speeds file access not only at file open time but when reading largefiles where additional file layout information must be accessed Another Flash feature with which Jalics and McIntyre seem unimpressed is the ability to specify files to be loaded into the cache I often pre load my editor compiler and linker at boot time which speeds up the load times for these frequently used utilities While the following comment may be somewhat out side of the context of their article one thing which they seem to have missed about Flash is that it is not Communications of the ACM 1243 Technical Correspondence 1244 strictly a cache it offers other related features which can increase system performance Jalics and McIntyre seem to feel that write caching is not important but it can help in three ways First in cases where a directory block is being written repeat edly as when a series of files are being copied into a directory that block is actually only written once This one effect can be almost magical when copying to a floppy Second while deferred writes cannot eliminate the processing required to do the actual write the pro gram does not suffer performance penalties during the time required to seek the block to
64. strictly necessary all could be imple mented using the tuple space construct the Chare au thors believe that in providing them they have antici pated a significant set of special cases This is the design approach that has informed a series of important programming languages from PL 1 through Ada and Common Lisp Unfortunately making simple things complicated is not a cost free exercise You burden the programmer with a more complex model to master the program reader with more complicated sources and the implementer with a harder optimization problem We choose to program in languages that provide opti mized support for a few simple tools and then let us build what we need But this is an aesthetic question Kale is obviously entitled to his views and we wish him luck in developing Chare We applaud the absence of ideological fustian in his note Davidson s note is full of errors and misrepresenta tions The claim that Linda implements a monitor to handle the tuple space is false Inferences about Linda s performance or implementation drawn from Braner s work on Brenda are puzzling cf 1 letter by October 1989 Volume 32 Number 10 Technical Correspondence Davidson on Linda this issue True Moshe Braner has done interesting work on this Linda variant but his system executes in a different environment from our hypercube system shares no code with our system is roughly an order of magnitude slower than our system
65. the request response message and the database search The difference in performance of the approaches may be best seen when some data must be shared globally such as in a simulation In Linda one process would insert a value into the tuple space and all other processes would then read the value With message passing a broadcast would be used Examining the message traffic alone on a 64 processor hypercube we find that the broadcast re quires only sixty three messages and only six message transmission times With Linda each process would have to send a request and wait for a response Assum ing a simple implementation this would generate on average three message hops to and from the processor with the data for each of the requesting processors Each processor would have to wait six message hop times and a total of more than three hundred sixty messages would be generated Each communication port of the destination processor would have to handle on average ten incoming and ten outgoing messages All sixty four requests would have to queue for access to the tuple space The queuing may be reduced through a look ahead mechanism but the time consuming message passing is not easily avoidable Since local memory access times can be measured on the scale of microseconds and message access is two to three orders of magnitude larger for most systems 9 the message traffic time will dominate the system per formance It must be remembere
66. tick CHAN OF INT left right WHILE TRUE INT any ALT left any phil to left gets it left any right any phil to right gets it right any PROC security CHAN OF INT down VAL INT max IS 4 INT n sat down up SEQ n sat down 0 WHILE TRUE INT any ALT i 0 FOR 5 ALT n sat down lt max amp down i any n sat down n sat down 1 up i any n sat down n sat down I a October 1989 Volume 32 Number 10 Technical Correspondence the main part PROC secure phil 5 CHAN OF INT left PAR security up down PAR i 0 FOR 5 PAR philosopher left i right i down i up i chopstick left i right i 1 5 right up down The stated benefit of not having to know about re ceivers of messages and vice versa in Linda is poten tially dangerous Much of the required synchronization in message based systems is to ensure complete and accurate execution of the program at all times By re quiring the programmer to think about the destination of the data and the overall data flow algorithms can be better optimized and loose tuples will not be left in the machine at program termination When designing the individual modules the physical source and destination of the data need not be defined only the message type At system integration the task of allocating message flow is no different from assigning names to the tuples in the tuple space and the routing can be ac
67. truction industry Furthermore Lego structures always wind up looking like assemblages of little boxes That is fine if you are dealing with boxy structures Otherwise you might want to conduct your building experiments using a more flexible medium Turning to the dining philosophers example with which Shapiro begins if we were members of the CLP community we would find this demonstration deeply alarming As we noted in our article dining philoso phers is neither an especially important nor an espe cially difficult problem The Linda solution is trivial In order to get a comparably concise solution in CLP Shapiro needs to jettison Flat Concurrent Prolog his own invention and introduce us to the super duper new logic language FCP which is a variant of Saraswat s FCP which is in turn a variant of FCP Shapiro seems unaware of the fact that when a simple problem is posed introducing a new language in order to solve it can hardly strengthen our confidence in this October 1989 Volume 32 Number 10 new language s ability to handle the unanticipated problems that it will be expected to tackle This same new language becomes the basis of the CLP Linda embedding But as we have already noted the Linda embedding is not so far as we are concerned an im pressive demonstration of anything and so we are left wondering what exactly motivates the new language and whether it is the definitive CLP solution or merely one of a long ser
68. ure can be squeezed into this box somehow or other although the spectacle may not be pretty But we believe that on balance Linda is a more practical and a more elegant alternative This article has generated a flurry of electronic dis cussions in which many participants argue for their own particular hobby horse and are deaf to what the others are saying In this note we attempt to step back from these religious platforms to extract the good ideas from the different approaches as well as to understand their relationships Different Problems Linda and concurrent logic programming seem to be solving two different problems Linda is best not thought of as a language but rather as an extension that can be added to nearly any language to enable process creation communication and synchronization For very clean languages such as functional program ming languages adding Linda constructs may destroy Communications of the ACM 1253 Technical Correspondence 1254 their clean semantics Linda is a model of how to coordinate parallel processes As a solution to this prob lem Linda seems pretty good We do not see how logic variables and unification can be added to Fortran and C as well as tuple spaces have been The utility of Linda rests on two observations about the world we live in 1 There is a lot of programming language inertia There are lots of programmers who do not want to or cannot leave the languages they present
69. ution of Logic Programs The REDUCE_OR process model In Proceedings of the 4th International Conference on Logic Programming Parkville Victoria Australia May 25 29 Univ of Melbourne 1987 pp 616 632 7 Kale L V and Shu W The Chare Kernel language for parallel programming A perspective UJUCNCS R 88 1451 Dept of Com puter Science Univ of Illinois 1988 8 Ramkumar B and Kale L V Compiled execution of the Reduce Or process model on multiprocessors UIUCDCS R 89 1513 Dept of Computer Science Univ of Illinois 1989 The article Linda in Context by Carriero and Gelernter in April s Communications pp 440 458 pre sents an interesting model for parallel computing Much of the article is a comparison with other ap proaches Much of this comparison looks like attacks on concurrent logic programming along with actors con current object oriented programming and functional programming The following paragraph from the con clusions is milder than most of the body of the article Logic programming is obviously a powerful and attractive computing model as well Notwithstand ing we continue to believe that the tools in concur rent logic languages sic are too policy laden and inflexible to serve as a good basis for most parallel programs Applications certainly do exist that look beautiful in concurrent logic languages and we tend to accept the claim that virtually any kind of parallel program struct
70. x system including the Logix system itself and numerous applications 7 and Logix is just one of several practical concurrent logic programming systems However presently concurrent logic languages offer a gain in expressiveness at the cost of a loss in performance compared to conventional ap proaches to concurrency The time when this loss in performance is small enough to be easily compensated by the gain in expressiveness has yet to come Acknowledgments 1 thank Nicholas Carriero David Gelernter Ken Kahn Amir Pnueli and Bill Silverman for their comments on an earlier manuscript Ehud Shapiro Dept of Applied Mathematics and Computer Science The Weizmann Institute of Science Rehovot 76100 Israel REFERENCES 1 Carriero N and Gelernter D Linda in context Commun ACM 32 4 Apr 1989 444 458 2 Miller M S Bobrow D G Tribble E D and Levy J Logical secrets In Proceedings 4th International Conference MIT Press 1987 pp 704 728 Also Chapter 24 in 7 3 Naqvi S and Tsur S A Logical Language for Data and Knowledge Bases Computer Science Press 1989 4 Ringwood G A PARLOG86 and the dining logicians Commun ACM 31 1 Jan 1988 10 25 October 1989 Volume 32 Number 10 Technical Correspondence 5 Saraswat V A Partial correctness semantics for CP J amp In Pro ceedings of the 5th Conference on Foundations of Software Technology and Theoretical Computer Science New Delhi India De

Technical Correspondence

Contents

Download Pdf Manuals

Related Search

Related Contents