Home

STATIC DETERMINATION OF DYNAMIC PROPERTIES OF

1. unknown ii records of collection C3 is unknown The type of a record will not change once allocated var C4 col ection of R any DEE Hi The records of collection C4 can change from one variant to another during execution by assigning values of different variants to the records The main defect of collections is that the num ber of collections is determined at compile time Thus we cannot declare an array of disjoint linear lists TL1 TL2 TEn It has not been recognized that a globally declared collection is in fact the union of smaller collec tions which are valid at various program points which would be useful to the compiler A similar criticism is that the concept of collection cannot be used recursively tha t is to say one cannot par tition a coliection into disjoint sub collections 87 Although of quite limited expressive power the no tion of collection in EUCLID may appear somewhat difficult to understand However its usefulness to compilers seems undeniable and we may in PASCAL let the compiler discover the collections 4 2 Compiler Discovery of Disjoint Colleettons We will represent a collection by the set of pointer variables which point within that collec tion Examole C1 Collection C1 will be denoted V W collection C2 will be denoted X Y Z We will try to partition the pointer variables of a program
2. integer digit character letter charint alphanum The tag field is hidden from the programmer and may be checked using conformity clauses The antagonism with PASCAL is more obvious in EUCLID 1976 which handles variant records in a type safe ALGOL 68 like manner Since EUCLID al lows parameterized types the tag will usually be a formal parameter of the type declaration type mode int char type charint tag mode record case tag of int gt var i integer end int char gt var c character end char end case end charint When a variable of the record type charint is declared the actual tag parameter may be a cons tant var digit charint int var letter charint char or any which allows type unions var alphanum charint any ALGOL 68 or EUCLID are type safe when dealing with type unions since No assignments to the tag fields are authorized once they have been initialized Uniting is allowed and safe ealphanum letter is legal because the type of the right hand side value charint char may be coerced to the type of the left hand side variable charintlany the type charint any permits alphanum to hold either a value of type charint char ora value of type charint int There is no de uniting coercion since if letter alphanum were allowed the principle of type checking would be violated The only way to retrieve an object which has been united and to retri
3. value which is assigned to pt Our system of equations is of the form lt P1 P2 P3 P4 PS5 gt F lt P1 P2 P3 P4 PS gt where F is an order preserving application from the complete lattice L in itself Therefore the Knas ter Tarski theorem states that the application F has a least fixpoint Tarski 1955 Moreover since F is a complete order preserving morphism from the complete lattice L in itself thisleast fixpoint can be defined as the limit of Kleene s sequence Kleene 1952 Ay lt L i 4 l L gt Ay EFCA lt 10r 1 and non nil 12 5 lrorTt gt Ea 1 4 gt E gt T gt a FOX lt T TorT and non nil L A 4 Lor T gt lt T non nil ne F L gt T gt Ag FAS lt T Tor T and non nil non nil non nil 4 or T gt SRE non nil snon nil non nil T gt y F A lt T TOr T and non nil non nil non nil non nil or T gt lt T non nil non nil non nil F gt Thus Kleene s sequence converges in a finite num ber of steps which is obvious since L is a finite lattice The solution to our system of equations tells us that P2 P3 P4 non nil which according to our interpretation means that pt is not nil at lines 4 7 and 9 of our program which implies that the accesses of records through ot at lines 5 and 140 are staticly shown to be correct With regard to the value of P1 and PS its interpretation is that pt may be nil at program point
4. pe notion They differ by the fact that one needs a type checker whereas the other uses a type dis coverer but we show the close connexion between type checking and discovery We show that strong type enforcement or dis covery may be equivalent e g nil references type unions collections of non intermixing pointers This is not the case for infinite type systems e g integer ranges which are not compile time checkable In such a case type discovery is really needed and can be facilitated by appropriate syntactic cons tructs Finally we propose a means by which langua ge designers can establish a balance between the 78 security offered by full typing within a suitable linguistic framework to properly propagate strong type properties and the simplicity offered by the flexible but incomplete classical type systems 2 Nil and Non nil Pointers Among the objections against the use of poin ters are the factsthat they can lead to serious ty pe violations PL 1 and that they may be left dang ling One can take care of these objections by gua ranteeing the type of the object pointed at PASCAL 1974 except for variant of records and ensuring that pointers point only to explicitly allocated heap cells disjoint from variable cells which re main allocated until they are no longer accessible PASCAL 1974 when dispose is not used However a pointer may always have the nil value which points to no eleme
5. 2 3 Remarks It is remarkable that both approaches necessi tate the same secure type system yet they differ in the choices of making it available or not to the programmer The refined type system considers the pointer type as the union of two sybtypes pure non nil pointers and dummy nil pointers Type safety is guaranteed by requiring strong typing the type of a value determines which operations may be mea ningfully applied to it In both cases the type correctness has to be verified or established by the compiler For that purpose an often implicit system of equations is used In one case the solution to that system of equations has to be found by the compiler in the other case the compiler simply verifies that the solution supplied by the programmer by means of 81 adequate syntactic constructs is correct Since in this example the type system is finite both ap preaches are equivalent as far as type verifications are concerned 3 Variants of Record Structures 3 1 Unsafe Type Untons in PASCAL In ALGOL 68 1975 a variable may assume values of different types The tyoe of this variable is then said to be the union of the types of these va lues In PASCAL 1974 the concent of type unions is embodied in the form of variants of record struc tures a record type may be specified as consis ting of several variants optionally discriminated by a tag field Example type mode int char type char
6. C1 S2 C2 L u S81 C 1 S2 C2 L K pa x P2 1 C1 S2 C2 L ta 8 c 3 gt P3 extract C2 P2 51 C1 S2 L C2 E 1 4 gt P4 extract L P3 81 C1 52 L C2 We come back for P4 with the value of the previous pass so we stop on that path 89 whereas EUCLID uses a simplified type inclusion scheme A B C T a A B c Si fe 2 Besides and in opposition with EUCLID the collec tions are defined as local invariants Very pree cise and detailed information can beogathered whereas the EUCLID programmer would have to globally specify the union of such information This localization of collections may have impor tant consequences An optimizing compiler will be able to limit the number of objects which are supposed to have been modified by side effects when assi gning to objects designated by pointers use ful in register allocation Run time tests may be inserted before a sta tement dispose X to verify that no variable in the collection of X may access the record which X points to The garbage collector may be called when all variables in a collection are dead i e are not used before being assigned to etc The simple abstract interpretation of programs we illustrated here may be further investigated to recognize that data structures are used in stylized ways Boom 1974 Karr 1975 It is fair however to say that EUCLID compilers may use the
7. Cousot ed Nb 5 Sept 1976 pn 91 172 Habermannl 1973 Critical Comments on the Programming Language Acta Informatica 3 47 57 1973 Pascal Sprin ger Verlag Hoare and Wirth 1973 An Axiomatic Definition of the Programming Lan guage PASCAL Acta Informatica 2 1973 op 335 355 Karr 1975 Gathering Information About Programs Mass Comp Associates Inc CA 7507 1411 July 1974 Kildal1 1973 A Unified Approach to Global Program Optimiza tion Conf Record of ACM Symposium on Princi ples of Programming Languages Boston Mass pp 194 2U6 Uctober 1973 Kleenel1952 Introduction to Metamathematics North Holland Publishing Co Amsterdam LIS 1974 Ichbiah J D Rissen J P Heliard J C Cousot P The System Implementation Language LIS Reference Manual ber 1974 CII Technical Report 4549 E1 EN Decem Revised January 1976 Meertens 19751 Mode and Meaning in New Directions in Algo rithmic Languages 1975 S Schuman ed po 125 138 IRIA Paris Pascal 1974 Jensen K Wirth N PASCAL User Manual and Report Lecture Notes in Commuter Science Nb 18 Springer Verlag 1974 SIMULA 1974 Birtwistle Dahl Myhrhaug Nygaard SIMULA Begin Studentlitteratur Lund 1974 Sintzoff 1972 Calculating Properties of Programs by Valuations on Specific Models SIGPLAN Notices Vol Z Nb 1 Jan 1972 op 203 207 Sint zo
8. However these expensive run time checks are usually turned off before the last programming error has been discovered In the interest of increased reliability of soft ware products the language designer may reply upon The design of a refined and safe type system which necessitates linguistic constructs which propaga te strong type properties The rules of the lan guage must then be checkable by a mere textual scan of programs e g ALGOL 68 1975 and EUCLID 1976 provide a secure use of type unions This language design approach may degenerate to large and baroque programming languages The design of a refined campiler which performs a static treatment of programs and provides im proved error detection capabilities The language then remains simple and flexible but security is offered by compiler verifications e g EUCLID legality assertions which the compiler generates for the verifier This compiler design approach may degenerate into futurustic and mysterious au tomatic program verifiers We illustrate the two approaches by means of examples The compiler techniques we propose for the static ana lysis of programshave a degree of sophistication comparable to program optimization techniques ra ther than program verification techniques Cousot 1976 It proach and is shown that the language design ap the compiler design approach are strong ly related since both need a refinement of the ty
9. and therefore the mutation from one abstract variable to another may be reported 1 The need for heterogeneous structures in two to the user Main cases 84 3 Realization of implicit type transfer functions EUCLID in recognition of the fact that control led breaches of the type system are sometimes necessary provides unchecked type conversions by means of type converters i unsigned int lt lt character H assigns to i the internal code of the character H We have seen how a PASCAL compiler might report this fact to the user Finally it is clear that PASCAL provides fie xibility at the expense of security We have shown that a compiler may report to the user which cons tructs have been used in either secure or insecure ways The results of this static treatment of pro grams might also be useful in code generation Thus we get a sophisticated compiler for a simple lan It is obvious then guage that the programs will not be very readable since the programmer has no preestablished constructs for expressing his inten tions However some simple intentions of the program mer which can be simply caught by compilers may ne cessitate rich and not necessarily easy to understand language constructs This is the case in our next examnle concerning dynamic allocation of records 4 Disjoint Collections of Linked Records 4 1 Colleettons tn EUCLID Suppose in PASCAL we have to renresent two sets of rec
10. can recognize secure programs by the following facts There are no assignments to tag fields other than for initialization which is recognized by the fact that the tag value is changed from null to some value We can also authorize use less tag assignments i e those which assign to a tag without changing its value The unsafe de uniting coercions must be checked This cannot occur when a record variable is as signed to another since all record variables are considered to be of union types Note that such an assignment may indirectly modify a tag value but this is safe De uniting coer cions only occur when accessing a field ina record This is safe only if the tag has been staticly established to be of correct value Otherwise a warning is reported to the user and a run time check inserted in the program guage as 3 4 Flexibility Versus Security Static variants to describe classes of data which are different but yet closely related For example Men and Women may be described as Persons depending on their sex thus EUCLID authorizes Male Female type Person Sex type Man Person Male type Woman Person Female In PASCAL however variables of abstract type Man and Woman may be staticly recogni zed when their tag values never change Dynamic variants to describe objects whose components depend on a possibly changing state For example a car may be moving or stopped thus EUCLI
11. else n 3x n 1 write i end Cousot 1975 will discover an approximate range for n which will be 1 However if the actual range of n were known by the programmer and if the programmer could tell this to the compiler then a verification would be simpler in most cases but not on this difficult example We can now state our main objection egainst sub range types in PASCAL the fact that range asser tions must be given globally in the declaration pre vent the programmer from giving the solution of the system of equations to the compiler The programmer can only give an epproximatian of the solution which is usually insufficient for the 91 compiler to discover local subranges To make it clear instead of P1 P2 P3 P4 the programmer is 1 7001 say that P4 U P2 u P3 u P4 c 1 1001 which adds only able to declare var i that is to an inequation to the system of equations but does not provide its solution We then consider integer suorange types as union tynes since the global de claration must be the union of all local subranges Thus if we declare var i O 2 we really want to say that the type of i at each program point is one of the following alternatives 2 2 2 We then understand a criticism by Habermann 1973 1 that stbranges are not types since a global sub range type definition does not determine the set of operators that are applicable to variablesof that type For example
12. in the definition of type elem to indicate that records of type elem point to col lections of tybe listsupport We get type listsupport DS listsupport parameter forward type elem C listsupport oarameter record var next tC var val R end record tyne listsuoport collection of elem DS var DS1 listsupport DS1 54 DS4 var DS2 listsupport 0S2 S2 DS2 which is precise but somewhat overcomplicated when compared with the PASCAL declarations type list telem elem record next Lists val R end var 51 52 list Si and S2 are disjoint linked linear lists Anart from the difficulty of copeing with a new lin guistic notion the EUCLID approach has the advan tage of the orecision Since the compiler knows that S1 and S2 are disjoint lists it can produce better code especially for register allocation Moreover the combination of collections and restricted variants in records may yield efficient memory allocation strategies Suppose we have a re cord type R with two variants Ra Rb of different memory sizes sey 1 and 3 words Type Rtyoe Ra R Type R tag Rtype case tag in Ra Rb b Jee record end Ra end Rb end case end record We have the cation of co Following al lections of ection of R ection of R var C1 col var C2 col var C3 col the type of it may be R ection of R Ra or R Rb ternatives for memory allo R Ra Rb
13. same techniques to locally refine the collections provided by the programmer The advantage of EUCLID is then that when the programmer has declared his intentions or better part of intentions since the expressive power of col lections is limited he is forced to confarm to his declarations For example he will not be able to use the same pointer variable to traverse two lists which are built in diffe rent collections On the contrary this may con 90 fuse the automatic discovery of collections The advantage however must be counterbalanced by the fact that parameterized collections which are necessary with recursive data structures may become inflexible and difficult to use We now come to an example where a cooperation between the programmer and the compiler is absolu tely necessary for secure and cheap use of type unions that is to say a case when the compiler has definite disadvantages over the programmer 5 Integer Subrange Type A subrange type such as type index 0 9 is used to specify that variables of type index will possess the properties of variables of the base integer type under the restriction that its value remains within the specified range Wirth 1975 In Cousot 1975 we developed a techni que to have the compiler discover the subrange of integer variables Let us take an obvious example P1 while i lt 1000 do P2 i itt P3 P4 Let us denote by a b the predicate a lt
14. this is the case for variables of enumerated type Thus the abstract values of the tag will be chosen in 3 the power set of T which is a finite complete lattice More over if the program contains simple variables of enumerated type T it is convenient to take ac count of them in the program abstract interpreta tion process Finally if the program contains m simple variables of type T or record variables with tag of type T our abstract data space is 27 x x 21 m times Since this space is a com plete finite lattice the abstract execution of programs can be performed at compile time Example type person record case sex male female of var paul mary senior person 1 paul sex male 2 paul age 3 3 mary sex female 4 mary age 3 5 if paul age mary age then 6 senior paul 7 else 8 senior mary 9 10 The symbolic execution of that piece of program would be 83 mary senior null male null null the assignment to paul male female null null age is ignored null the assignment to mary age is ignored Since the value of the test is staticly un known this gives rise to two execution paths 6 and 8 male female null male female male male female null male female female The two execution paths are joined together male u mate female u female mate u
15. D PASCAL CR categortes 4 12 4 13 4 2 5 4 1 Introduetton reference ignores the fact that a reference may be relates to other objects and which actions may be applied to it Unfortunately the classical tyne systems of ALGOL 60 1973 PASCAL 1974 ALGOL 68 1975 do not convey enough information to de termine staticly whether a given action applied to a value will be meaningful For example in AL GOL 60 the type procedure does not include the ty pe of acceptable parameters in ALGOL 68 the type dummy in PASCAL type unions variants of record TRE EY RES Ot AN OD Ieee panes HEAR Ene OE ECE structures are unsafe because of the possibility of erring on the current alternative of the union In all these languages the problem of subscript ran ge is not safely treated by the type concept Like wise the classical type systems define only loose relationships between objects For example in PAS CAL a xointer to a record must be considered as potentially designating any record of a given type One cannot express the fact that two linked linear Attach de Recherche au C N R S Laboratoire lists of the same type do not intermix Finally the Associ N 7 SB heen a rules of the language or the programming discioline This work was supported by IRIA SESORI under grants 75 035 and 76 160 accepted by the programmer are not staticly enfor 77 ced by the compilers so that run time checks are the widely used remedy
16. D authorizes type Car State moving stopped des troyed var mycar Car any Since the actual parameter supplied for the tag is any the variable can be changed from one variant to another during execution by assigning values of different variants to no refinement is al the variable However lowed and no proper subset of the possible tag values can be used var mycar Car moving stopped This fact may be discovered by a static ana lysis of the program and might be useful in memory allocation 2 Storage Sharing Overlays This implies the use of the same storage area expressed in the lan the same actual variable for diffe rent purposes i e for representing different abstract variables whose lifetimes are disjoint This compiler approach has the advantage of fle xibility over the secure language approach It is clear that all EUCLID programs translated in to PASCAL will be recognized to be safe by this technique Following Wirth 1975 there appear to be three different motivations behind the desire for variants of record structures simulated block structure is not incorporated in PASCAL This is a typical case of free union where no tag will be carried along to indicate the cur rently valid variant This tag may be staticly provided that one ensures an appro priate setting of the tag upon assignment to fields of the variant Unsafe assignments will be identified
17. However when the type union system is infinite in teger subrange type it has been shown that static type checking necessitates language constructs which allow subtypes to be locally derived The argument was based upon the observation that type verification consists in establishing a solution to a system of type equations Global type declarations give an approximation of the so lutions to that system The discovery of a particu lar solution from that approximation may involve infinite computations On tne contrary if the lan guage is designed to directly provide a solution to the compiler type checking consists ina straightforward verification This reasoning might turn out to be useful to language designers who until now could not lo gically prove the validity of their design of lan guage constructs Moreover this reasoning may ser ve as a basis to define type satety in languages and prove particular languages to be type reliable 7 Acknowledgements P Cousot benefitted from discussions on this topic with colleagues in IFIP Working Group 2 4 Machine oriented higher level languages The organization with J Horning and K Corrello a debate on EUCLID was esoecially helpful We were very lucky to have F Slanc do the typing for us 8 Bibliography ALGOL 60 1963 Naur P Ed Revised Report on the Algorithmic Language AL GOL 60 CACM 6 Jan 1963 pop 1 17 ALGOL 68 1975 Van Wijn
18. STATIC DETERMINATION OF DYNAMIC PROPERTIES OF GENERALIZED TYPE UNIONS Patrick Cousot and Radhia Cousot Laboratoire d Informatique U S M G BP 53 38041 Grenoble Cedex France Abstract The classical vrogramming languages such as PASCAL or ALGOL 68 do not provide full data type security Run time errors are not precluded on basic onerations Tyne safety necessitates a refinement of the data tyne notion which allows subtynes The comniler must also be able to ensure that basic operations are applicable This verification consists in determining a local suotyoe of globally declared variables or constants This may be achieved by imoroved comniler capabilities to analyze the program oronerties or by language constructs which permit the expression of these properties Both approaches are discussed and illustrated by the oroblems of access to records via pointers access to variants of record structures determination of disjoint collections of linked records and determination of integer subrange Both annroaches are complementary and a balance must be found between what must be specified by the orogrammer and what must be discovered by the compiler Key words and phrases Type safety type unions subtyne data type system of equations type verification discovery error detection canabilities abstract interpretation of programs secure use of pointers variants of record istructures domains collections integer subrange tyne ALGOL 68 EUCLI
19. d the type of an object to be changed from the type nil or non nil pointer to the type non nil pointer and syntactically check the correct use of ope rations i e authorize dereferencing for non nil pointers only Compile time checks to recognize the safe use of a type scheme which is too tolerant We illustrate now this last strategy 2 1 Statie Correctness Check of Aecess to Records via Pointers Consider the simple problem of searching for a record with value n in a linked linear list L value next fa gee N The PASCAL solution is given by PASCALI1974 p 64 as follows 1 pt L b true 2 P14 3 while pt lt gt nil and b do 4 P2 5 if ptt value n then 6 b false 7 P3 8 else 9 P4 10 pt ptt next 11 P5 The above piece of program is correct with re gard to accesses to records via pointers since pt is not nil when dereferenced at lines 5 and 10 This fact is established by the programmer 79 using a simole propagation algorithm from the test of line 3 This reasoning is easily mechanized as follows associate invariantsP1 P2 P3 P4 and P5 to points 2 4 7 9 and 11 respectively According to the semantics of the programming language PASCAL Hoare and Wirth 1973 these in variants are related as defined by the subsequent system of equations 1 P1 pt L and b true 2 P2 P1 or PS and pt l
20. eve it in its original type is by a discriminating case statement This ensures that the type transfer is safe since the tag is explicitly tested 1 case discriminating x alphanum on tag of I x int gt digit 3 end int char gt letter x end char end case This discrimating case statement ensures a com Dlete run time check of which variant of a record is in use corresponding to the checks which can be carried out by the comoiler for all non union types 3 3 Statice Treatment of Type Unions PASCAL has been deliberately designed to pro vide flexible tyne unions at the expense of secu rity Wirth 1975 however a wise compiler should be able to discern the secure programs by using the following abstract interpretation of these programs Record values will be abstractly represented by their tag fields We will consider a program with a single record type with variants identified by a single tag the generalization to nested variants and numerous record tyoes is straightforward The tag is of enumerated type T which is a finite set of discrete values This set is augmented by a null value which represents the non initialized value Since at the same program point but at two diffe rent moments of program execution two different values may be assumed by a tag field of a record variable a static summary of the potential pro gram executions must consider a set of values for tag fields More generally
21. female male female male female Note that at line 10 it is clear that senior may have tag values male or female However we don t appreciate the fact that senior sex if paul age 2 mary age then male else female fi but neither do ALGOL 68 nor EUCLID With these lan guages it is evident that in some cases the program mer knows perfectly well which alternative of a union type is used but is unable to exploit this knowledge since he must use a discriminating case statement This same limitation arises with our static treatment of programs more powerful schemes exist Sintzoff 1975 Finally in the static treatment of programs useful information will be gathered from case sta tements and if statements used as ALGOL 68 confor mity tests Example Paul male Mary Female Senior Male Female if Senior Sex 1 ae Paul sex then else The abstract interpretation of a test A B ina context where A and B are variables which may as sume set of values S4 and Sg delivers a context where A and B may assume the set of values SaN Sg on the true path Thus in 4 we get Paul Senior Male n Male Female Male The context de livered for the false path is A if S nS 1 and not S4 c Sg then S A B Safi Thus in 2 we get Paul male and Senior Fe al else male When this abstract interpretation of programs is terminated we
22. ff 1975 Verifications d assertions pour les fonctions utilisables comme valeurs et affectant des va riables ext rieures in Proving and Improving programs ed G Huet and G Kahn IRIA Tarski 1955 A lattice theoretical Fixooint Theorem and its Apolications Pacific Journal of Mathe matics 5 1955 pp 235 309 WegbreitL1975 Property Extraction in Well Founded Property Sets I E E E Trans on Software Engineering Vole SE 1 Abs Sa Bos 270 285 Wirth 1975 An Assessment of the Programming Language PAS CAL SIGPLAN Notices Vol 10 nbd 6 June 1975 pp 23 30 94
23. gaarden A et al Eds Revised Report on the Algorithmic Language AL GOL 68 Acta Informatica 5 1975 pp 1 236 Boom 1974 Optimization Analysis of Programs in Languages with Pointer Variables Ph D Thesis Dept Appl Anal and Comp Scien ce University of Waterloo Ontario June Cheatham and Townley 1976 Symbolic Evaluation of Programs a look at lo op analysis Proceedings of the 1976 ACM Sym posium on Symbolic and Algebraic Computation August 1976 Cousot 1975 Static Determination of Dynamic Properties of Programs U S M G RR 25 Nov 1975 to appear in the proceedings of Colloque International sur la programmation Paris 13 15 April 1976 Dunod Cousot 1976 Adstract Interpretation a unified lattice model for static analysis of programs by cons truction or approximation of fixpoints Fourth ACM Symposium on Principles of Program ming Languages Los Angeles January 1977 Dembinski and Schwartz 1976 The pointer tyne in programming languages a new approach to appear in the proceedings of Colloque International sur la Programma Paris Uuned tion 13 15 April 1976 Dijkstral1975 Guarded commands non determinacy and formal derivation of programs CACM 18 pp 453 457 1975 EUCLID 1976 Lampson B W Horning J J London R L Mitchell J G Popek G J Report on the Programming Language EUCLID 93 MOL Buliletin P
24. he same collection The output predicate will be P2 P1 u X Y A sensible remark is that the value delivered by the right hand side of the assignment may be nil in which case this may cause a collection to be broken into two disjoint sub collections For sim plicity we ignore this fact other than in the obvious case P1 X P2 Ytenext tnext Nye optional which will cause X to be disconnected from its collection and be connected to a record of the collection of Y When X and Y are not the same variable the output assertion P2 will be rela ted to the input assertion P1 by P2 extract X P4 u x Y Now we will give an example We have cho sen the copying of a linked linear list val stn Ex 88 The following PASCAL procedure is supposed to do the job procedure copy S1 list var 32 list var C1 C2 L list Po C1 Sis 52 i nil L nil p1 while C1 lt gt nil do P2 new C2 C2t val C1t val C2t next nil P3 if L nil then P4 S2 C2 P5 else Ps Lt next C2 P7 P8 L C2 C4 Ci nexts P9 end P10 end According to our abstract interpretation of the basic constructs of the language we can now establish the following system of equations 1 P41 extract L extract S2 extract C1 PQ0 u C1 51 2 P2 P4 u PS Since the test C1 lt gt nil gives us no infor mation on collections when true 3 P3 extrac
25. i lt b The system of equations corresponding to our exam ple is 1 P4 1 1 2 P2 P4 u P3 n 1000 3 P3 P2 1 1 4 P4 P4 u P3 n 1001 o where is defined by a b c d Latc b d and u and n are union and intersection of inter vals Suppose we know the solution to that system i e P1 P4 1 1 P2 1 1000 P3 2 1001 1001 1001 It is ob ious to let the compiler verify that this solution is a fixpoint of the system 1 gt P4 1 1 2 gt P2 P4 u P3 n 1000 1 1 u 2 1001 n 1900 1 1001 n 10007 1 1000 3 gt P3 P2 1 1 1 1000 1 1 1 14 1000 1 2 1001 4 gt P4 P1 u P3 n 1001 1 1 u 2 1001 n 1001 1 1001 n 1001 1001 1001 If on the contrary we want the compiler to dis cover this fixpoint we may try to solve the equa tions by algebraic manipulations Cheatham and Townley 1976 which may be quite inextricable The other way is to use Kleene s sequence but the trou ble is that our abstract data soece is an infinite lattice and we may have infinite sequences Since compilers must work even for programs which may turn out to loop the only way to cope with the undecidable problem is to accept approximative ans wers For examole in the program for i 1 to 100 do maesi while n lt gt 1 go if even n then n i n 2
26. inite advantage since this verification will involve no symbolic formula manipulations Run time tests will remain necessary in difficult cases but their number will be decreased 6 Coneluston We illustrated the fact that unsecure data types which do not guarantee all operations on values of that type to be meaningful can be con sub sidered as the union of secure types Exam ples of these were pointers variants in records records in collections integer subranges A type safe programming system must staticly de termine which safe subtype of the union is used when checking correct use of operations on union typed objects The language designer may achieve this goal by one of the following alternatives Incorporate rules and constructs in the language so that any operation of the langua ge can be staticly shown to be operating on correctly typed arguments Design a compiler in order to verify that the security rules have not been transgres sed although not enforced by the language It was argued that in both cases the same compiling techniques must be used and comparable 92 results will be obtained by type checking or type discovery as long as finite type systems are consi dered The main difference between these approaches is the one between security at the expense of fle xibility or simplicity at the expense of preci sion and of the possibility that compiler warnings be ignored
27. int record case tag mode of int i integer char c character end var digit letter alphanum charint In a program containing these declarations the occurrence of a variable designator alphanum c is only valid if at this point that variable is of type character It is so if and only if alpha num tag char However this is not staticly veri fied by the PASCAL compilers for the following rea sons The tag field of a variant record definition is optional and may exist only in the pro grammer s mind When present the tag field may be assigned thus allowing to realize implicit type trans a variable of fer functions For instance type character alphanum tag char alphanum c H may be interpreted as being of type integer for the purpose of printing the internal representation alphanum tag int writeln alphanum i Note that the tag is appropriately set but without care about its value one can write as well alphanum c H writeln alphanum i 3 2 Safe Type Untons in ALGOL 68 EUCLID Suggestions have been made to provide syntactic structures which ensure that type unions are safe i e compile time checkable Such features forbid assignments to the tag fields and let the compiler determine the current tag value from context using a statement similar to the inspect when of SIMULA 1974 In ALGOL 68 975 we would write mode charint union integer character
28. into disjoint in opposition to EUCLID we collections However will not try to find global collections but local ones Thus the local invariants we will try to com pute at each program doint will be restricted to be of the form V W are pointers to the same collection and X Y Z are pointers to the same collection which we will denote v W X Y Z We now have to define the conjunction U of such pre dicates i e the union of sets of collections for example A B C D E u F A G H A B C F G D E H If on one hand A may point to a record referenced by B and C or on the other hand A may point to a record referenced by F and G it is clear that A B C F and G may point on the same record The instructions of the program provide useful in formation After the instructions X r nil X Y where Y is known to be nil APX amil Ehen new X it is known that X will point to no record at all or will be the only pointer to the newly allocated record Thus we have isolated a collection empty or consisting of a single record With an input predicate P4 X X X x Y EE E oe err E the above instructions lead to an output predicate P2 extract X P1 T E Il More generally with an input predicate PA a poin ter assignement such as Xt next enext Yt next 4 next optional optional may cause X and Y to indirectly point to a common record Hence they are put in t
29. let f be a function with one formal parameter of type 2 10 and i a variable globally declared of type U The variable i may be used at program voint p in the expression fli provided that i may be united to the subrange 2 10 Dyna mically the locai type of i at program point p is i i which is simply derived from the value i of the variable i In the expression flij i must be coerced from the type i i to the type 2 170 This is safe when 2 lt i and i lt 1G Staticly this signi fies that the subrange of i at program point p must be a subrange of 2 5 This subrange of 2 5 cannot be locally specified in PASCAL This understanding of subranges leads us to the con clusion that integer subranges should be specified locally Moreover and in opposition with our pre vious examples we cannot expect the compiler to be able to discover these local subrange properties It is then essential that programmers provide them by means of assertions or as previously by means of conformity clauses so that we would write in the spirit of ALGOL 68 MeertensL1975 i 13 while case i in 1 1000 o i p 14 true out false esac do skip od These constructs give the solution of the system of equations which the compiler has to solve for strong type checking The redundancies equations identically verified can be eliminated Moreover the PASCAL restriction that the bounds of ranges must be manifest constantsis a def
30. lt is that although S1 and S2 may share records on entry of the procedure copy po S1 2 1 C2 L extract L extract S2 51 52 C1 C2 L it is guaranteed that this is not the case on exit extract L S1 C1 S2 C2 L of the procedure P4 81 01 82 C2 L a P10 C1 S1 S2 C2 L 2 gt P2 Pt u P9 SPEU E PI 3 gt P3 extract C2 P2 S1 C1 82 C2 L 4 gt P4 extract L P3 1 01 S2 C2 L 4 3 Remarks 5 gt PS extract S2 P4 u S2 C2 81 C1 S2 C2 L u 2 02 P5 1 1 S2 C2 L 6 gt P6 P3 S1 01 S2 C2 L 7 gt P7 PB L c2 of tags of all records in the collection This a This abstract interpretation of programs may be refined as in EUCLID when records have variants one can associate with each collection the set tj 1 C1 S2 C2 L J L C2 in fact will be the main application of our de S 1 C1 S2 C2 1L 8 gt P8 P5 u P7 54 C1 82 C2 L u 4 C1 82 C2 L S1 1 S2 C2 t 9 gt P9 extract L P8 u L C2 x Pg S41 C1 S2 C2 L velopments of paragraph 3 We will be more fle xible than the one of or any of EUCLID and authorize collections with say two variants A B among three possibilities A B C Other wise stated we reason on the following type hie rarchy We go on cycling in the while loop until the in A B C T variant PO P10 have stabilized we i A B A C B C 2 gt P2 P4 u PS 1
31. nt at all this is a source of frequent errors The type of a value may be viewed as a static summary of the meaningful operations on that value However the operations prescribed by a syntactically valid construct are not always dynamically meaning ful This is the case when dereferencing a pointer value which happens to be nil The pointer type notion must then be refined so that one can distinguish the type of pointers to a record type the subtype of non nil pointers to that re cord type the subtype of nil pointers to that record type which happens to have only one value The rule is that dereferencing can be applied only to pointers of non nil subtype Since this rule must be enforceable by the programming system the language designer has three solutions Run time checks these checks are usually ve ry cheap for pointers when using the hard ware memory protection facilities However for system implementation languages genera ting code in master mode this hardware detec tion is not always utilizable Moreover for more complicated examples such as array sub scripting these run time checks are very expensive Safe language design with strong typing i e a type system which ensures that any opera tion prescribed by a syntactically valid cons truct will always be dynamically meaningful This type scheme must distinguish between nil and non nil pointer types disallow type vio lations i e forbi
32. ords of type R we can use two arrays var 51 52 arrayl1 n of R With such a declaration the PASCAL compiler knows that the sets S1 and S2 are disjoint that is to S1 has no side effect on 52 that n say any modification of and vice versa Suppose the maximal cardina lity of the two sets is not known we will use dy namically linked linear lists type list elem elem record next list Val R end var 541 52 list This time the readers of the program e g PASCAL compilers have to supnose that the sets S1 and S2 85 may share elements and it is now necessary to scan all the nrogram to state the contrary In LIS 1974 one can specify that two pointers never refer to the same record the declarations DS1 DS2 domain of elem domain of elem snecify that 081 and DS2 will be sets of disjoint dynamic variables Now if S1 and S2 are pointers into different domains S1 DS4 S2 DS2 they noint to different records of the same type Unfortunately the confusion between a pointer to the first element of the linked structure and the list is valid only in the programmer s intellect S1 and S2 point to different records of type elem which themselves may point to the same record Thus the idea of domains has to be recursively applied in order to specify that elements of domain DS1 point only to elements of DS1 DSi domain of elem1 tyne elemi record next DS13
33. s 2 and 11 in particular the test on pt at line 3 may not be identically true The simple programmer s idea of generalizing constant propagation may be derived from the above 80 Kleene s sequence when eliminating useless computa tions A symbolic execution of the program where elementary actions are interpreted according to the simplified equations previously established gives the following computation sequence P1 T Pi y te p25 P2 P4 or P5 and non nil Tor 1 and non nil non nil P3 P2 non nil P4 P2 non nil PS P3 or T non nil or T 7 P2 P1 or P5 and non nil T or 7 and non nil non nil same as above stop Kildall 1973 and Wegbreit 1975 algorithms have been recognized they are efficient versions of the Kleene s sequence Following Sintzoff 1972 we call this technique the abstract interpretation of programs Abstract since some details about the data af the program are forgotten and interpretation since both a new meaning is given to the program text and the information is gathered about the pro gram by means of an interpretor which executes the program according to this new meaning We then get a static summary of some facets of the possible exe cutions of the program A theoretic framework of abstract interpretation of programs together with various examples are given in Cousot 1976 2 2 A Safe Lingutstie Framework to Handle Nil Poin
34. s 051 and 082 orocedure insert DS collection of elem DS var S DS val R It is clear that DS 0S1 DS2 are just formal or actual but different collections of the same tyne To make conspicuous that different collections will have the same type we now want to give the name listsupport to the type of the collections sup porting linked linear lists type listsupport collection of elem Since the tyoe of a collection such as OS1 de pends on its name 0S1 the type of the collec tion must be parameterized by that name type listsupport O0S collection of elem DS A declaration such as var DS41 listsupport DS1 means that DS1 is a collection of elements pointing to DS1 However the above declaration of listsup port is incomplete since DS is a collection of type listsupport type listsupport DS listsupport 86 collection of elem DS Since we have entered a recursive question each use of listsunport in the definition of listsunport must be provided by an actual narameter we have to sol ve it by some language convention tyne listsupport DS listsunport parameter collection of elem 0S The keyword parameter indicates that a shorthand has been used the actual parameter will be pro vided later Since we succeeded in defining what is the type of collection supporting lists we now want to replace the definitions of this type by the name of that type in particular
35. t gt nil and b 3 P3 P2 and ptt value n and b false 4 P4 P2 and ntt value lt gt n 5 PS P3 or 3 pt she P4 and pt ot t next Equation 5 has been deliberately oversimplified see Dembinski and Schwartzl1976 Since in general it is undecidable to find a solution to systems such as the one above we must consider simplifications to the prejudice of the precision of our results For that purpose we will ignore the existence of the boolean variable b of the fields value in records of the linear list and thus focusing on pointers Moreover we will consider only the pointer variable pt and the fol lowing predicates on pt pt nil pt lt gt nil pt nil or pt lt gt nil respectively denoted by nil non nil T These pre dicates form a complete lattice whose HASSE s dia gram is T L nil ze ener ee Where 1 is used to denote the fact that nothing is known about the variable pt Since we are only considering an oversimplified subset Of the set of predicates our system of equa tions can be simplified accordingly 44 Pret C2 P2 P1 or P5 and non nil 3 P3 P2 4 P4 P2 5 PS P3 ort In equation 1 we consider pt L since L may be an empty or non empty linear list we get pt nil or pt lt gt nil denoted T in equation 5 we only consider the fact that the function next when defined delivers a nil or non nil pointer
36. t C2 P2 The assignment of non vointer values and a deep modification in the structure pointed to by C2 are ignored 4 P4 extract L P3 5 PS extract S2 P4 u S2 C2 6 P6 P3 since we ignore the fact that L lt gt nil 7 P7 P6 u L C2 8 PB PS u P7 9 P9 extract L P8 u L C2 5 gt P6 P3 The statement C4 51 C1 S2 L C2 C1 next leaves C1 in the 7 gt P7 P6 u L C2 W H same collection P7 S1 1 S2 L C2 8 gt PB P5 u P7 81 01 S2 C2 L u 81 C1 52 L C2 Since the theoretical conditions which ensure pe 81 C01 82 L C2 10 P10 extract C1 P1 U PS that the above system of equations has a solution Same value as above stop on that path It remains ified C 97 are verified Cousot 1976 we can compute the least only the path out of the loop fixpoint using a finite Kleene s sequence 10 gt P10 extract C1 P1 u P9 extract C1 S1 01 S2 C2 L u 1 C1 82 C2 L extract C1 S1 C1 S2 C2 L P10 01 81 S2 C2 L We start with the most disadvantageous initial ore dicate PO where on the one hand the parameters S1 52 and on the other hand the local variables C1 C2 L are supposed to be in the same collection PO 81 82 C1 C2 L Pi 1 Vi e 1 10 The final results are marked by a star The 1 gt P4 extract L extract S2 extract C1 P0 vu c1 81 extract L extract S2 S1 S2 C1 C2 L vu C1 81 main resu
37. ters A comolete and satisfactary solution of the problem of dereferencing or assigning to a nil name as in ref real nil 3 14 is proposed by Meer tens 1976 within the framework of ALGOL 68 The pointer types are restricted to non nil values by exclusion of nil names this is achieved by not pro viding a representation for the nil symbol so that any name refers to a value The type void is used to represent nil names Finally the type of nil and non nil pointers is the union of the previous ones For example we can write a construction like mode list union ref ceil void mode cell struct integer value list next to represent linked linear lists An empty list is represented by the value empty the only void value Our routine would have to be rewritten list pt L while case pt in ref cell pt gt if value of pt n then false else ot next of ot true fi out gt false esac do skip od This program is safe since in ALGOL 68 the non safe coercion of pt from mode union ref cell void to made ref cell has to be made explicit by a conformity case construct The idea is therefore to force the programmer to explicitly perform the run time tests which in this example is dictated anyway by the logic of the problem the rewritten version admittedly looks a bit cumbersome but more convenient ways of expressing such a flow of con trol may be exhibited Dijkstraf 1975
38. val R end and that elements of DS2 can noint only to elements of DS2 DS2 domain of elem2 type elem2 record next DS2 Val 4 Ry ends Since we want to guarantee that two pointers into different domains can never refer to the same va riable we have to consider that 051 and 052 are different types of pointers The trouble is now that elem1 and elem2 are different types so that we have to write twice the algorithms insertion search deletion which handle the two similar lists 1 and S2 EUCLID 1976 is more flexible and authorizes tynes to be parameterized Thus we will describe the types of lists S1 and 52 ance as depending on the domain called collection in EUCLID to which they belong The tyoe elem is parameterized by the name C of the collection to which elements of type elem noint This collection C is a collection of records o7 tyne elem pointing to C type elem C collection of elem C record var next C var val R end record var DS1 collection of elem DS1 var 1 t OSI var DS2 collection of elem 052 var 52 DS2 Now the operations on lists S1 and S2 can be des cribed once it just suffices to pass the name of the collection 0S1 or DS2 to which they refer as a parameter insert DS1 S1 r will insert the record r in list S1 which belongs to collection DS1 Now we have to declare the type of the formal parameter DS corresoonding to the possible actual parameter

STATIC DETERMINATION OF DYNAMIC PROPERTIES OF

Contents

Download Pdf Manuals

Related Search

Related Contents