Home

OCaml + XDuce - Virtual building 8

image

Contents

1. erations and inclusion test for attribute element constraint s In Eighth International Conference on Implementation and Application of Automata 2003 Haruo Hosoya Regular Expression Types for XML PhD thesis The University of Tokyo Japan December 2000 Haruo Hosoya Regular expression filters for XML In Programming Languages Technologies for XML PLAN X 2004 Haruo Hosoya and Benjamin C Pierce XDuce A typed XML processing language In Proceed ings of Third International Workshop on the Web and Data bases WebDB2000 2000 Haruo Hosoya and Benjamin C Pierce Regular expression pattern matching for XML Journal of Functional Programming 13 4 2002 Haruo Hosoya and Benjamin C Pierce A typed XML processing language ACM Transactions on Internet Technology 3 2 117 148 2003 Haruo Hosoya J r me Vouillon and Ben jamin C Pierce Regular expression types for XML In JCFP 00 volume 35 9 of SIGPLAN Notices 2000 Christian Kirkegaard Anders Mller and Michael I Schwartzbach Static analysis of XML transformations in Java IEEE Transac tions on Software Engineering 30 3 181 192 March 2004 Xavier Leroy et al The Objective Caml system release 3 08 Documentation and user s manual 2001 Mil78 MOW99 SLO5 Vou06 Robin Milner A theory of type polymorphism in programming Journal of Computer and System Sciences 1978 Martin Sulzmann Martin Odersky and Martin Wehr Typ
2. lt a gt _ The additional constraint corresponding to the id operator is thus simply lt a gt _ gt 45 which is satisfied by the same instantiation for 5 as in the original example As a consequence the type for z2 is not changed The preprocessing pass is quite simple It consists of an other run of the OCaml type checker where all the XML types are considered equal This allows to identify which sub expressions are of kind XML Section 5 describes for mally this pass Breaking cycles The key condition which allowed us to compute an instantiation for XML type variables in the ex ample was the acyclicity of the constraints This property does not always hold For instance let s extend the original example with the following definition let z3 f z1 Without the preprocessing pass mentionned above this line would force the OCaml type checker to unify 2 and v5 The preprocessing pass actually replaces this definition by let z3 f idi z1 11 The type checker then unifies 11 with v5 and t12 with 2 the resulting constraint for id is thus ly 2 bs which corresponds to the fact that the output of f can flow back to its input We observe that the set of constraints has now a cycle between variabls 01 5 and tg Our type system cannot deal with such a situation It would issue an error explaining that the inferred data flow on XML values has a cycle The programmer is then required to break expli
3. XDuce type The technical presentation introduces a notion of foreign type variables but they are nothing more than a technical device for inferring ground XDuce types Overview In Section 2 we give some intuitions about the behavior of OCam1Duce s type checker The formalization of the type system will be developped by abstracting away from details about XDuce In Sec tion 3 we introduce an abstract notion of extension for eign types and foreign operators and show of XDuce can be seen as an extension In Section 4 we present the type system and type inference algorithm for a calculus made of ML Mil78 Dam85 plus an arbitrary extension The basic idea is to rely on standard techniques for ML type inference Indeed we start from a type system which is an instance of ML where foreign types are considered as atomic types and foreign operators are explicitly annotated with their in put and output types Then we present an algorithm to infer these annotations This algorithm is described as two suc cessive passes the first one is a slightly modified version of an ML type checker and the second one is a simple forward computation on foreign types In Section 5 we present a preprocessing pass called strengthening whose purpose is to make more programs ac cepted by the type system by allowing implicit use of sub typing In Section 6 we present other details of the concrete integra tion in OCaml In Section 7 we compare o
4. For instance if 7 is a XDuce sub type of T and e has type T list then it is possible to coerce it to type T2 list e gt 72 list Crossing the boundary In our system XDuce values are opaque from the point of view of ML and XDuce types cannot be identified with other ML type construc tors Sometimes we need to convert values between the two worlds For instance we have a foreign type St ring which is different from OCaml string This foreign type conceptually represents immutable sequences of arbi trary Unicode characters whereas the OCaml type should be thought as representing mutable buffers of bytes As a consequence we don t even try to collapse these two types into a single one Instead OCamlDuce comes with a runtime library which exports conversion functions such as Utf8 make string gt String Utf8 get String gt string Latinl make string gt Latinl Utf8 get Latini gt string The type Latin1 is a subtype of String it represents all the strings which are only made of latin 1 characters latin 1 is a subset of the Unicode character set The function Ut 8 make checks at runtime that the OCaml string is a valid representation of a Unicode string encoded in utf 8 Similarly we often need to translate between XDuce s se quences and OCaml s lists For any XDuce type 7 we can easily write two functions of types r T list and T list rx the star between square brackets denotes Kleene sta
5. are considered equal but does not touch the expression because no sub expression has a foreign type in a principal typing derivation Still the next pass of the type inference algorithm attempts to unifiy the types 7 and T and thus fails Relaxing the acyclicity condition Inserting applications of the id operator can break the acyclicity condition We can actually relax this condition to deal with the id oper ator more carefully Let us consider a constraint C with a cycle 41 a 41 such that all the edges in this cycle come from elements of the form id Clearly any valu ation p such that I Cp will map all the v in the cycle to the same ground foreign type So instead of considering the most general pre solution and then face a cyclic constraint we may as well unify all these v first all the solutions can still be obtained from this less general pre solution The relaxed condition is There must be no cycle in the con straint except maybe cycles whose edges are all produced by the id operator To illustrate the usefulness of the relaxed condition let us consider the expression e fix Ag Ax f c g x with T fix Va a gt a gt qa f Va a gt a gt a c T The strengthening pass builds a principal typing derivation for e in a type algebra where all the foreign types are identified Here is such a derivation where we write x for foreign types and t a gt x I T g t x a we collapse rules for multip
6. definition is well founded and yields a unique function p Furthermore this function is a valuation if and only if the typing problem has a solution To check this property only constraints whose right hand side is a ground foreign type need to be consid ered 1 Voz Co Epo lt T Also any other valuation p is such that Ve E V tpo lt tp In other words under the acyclicity condition we can check in a very simple way whether a given typing problem has a solution and if this is the case we can compute the smallest valuation for the point wise extension of the subtyping rela tion This computation only involves one call to the abstract semantics for each application of a foreign operator in the expression to be type checked Remark In some cases it is possible to find manifest type errors even when the constraint is not acyclic In practice the computation of po the verification of 1 and the check for acyclicity can be done in parallel e g with a deep first graph traversal algorithm It can detect some violation of 1 before a cycle In this case we know that the typing problem has no solution and thus a proper type error can be issued Manually working around the incompleteness When the algorithm described above infers a cyclic constraint it can not detect whether the typing problem I e t has a solution or not However we have the following property If a solu tion exists then we can always produ
7. is from the example above Here the preprocessing pass succeeds but does not change the expression because no sub expression has a foreign type in the principal type derivation The type scheme inferred for h is a pure ML type schema which makes the type system subsequently fail on the expression We believe that this restriction of the F lt system is rea sonnable It can be implemented very simply by reusing the same type checker as in Section 4 in a different mode where all the foreign types can be unified The simple ex amples at the beginning of this section are now accepted Indeed the preprocessing pass transforms the expressions to f idx id y and Ax id x T1 T2 respectively This allows the type system F to use subtyping where needed Properties The strenghtening pass cannot transform a well typed program into an ill typed one Note however that it might break the acyclicity condition if it was already met See below for a way to relax the acyclicity condition Also if strenghtening fails the typing problem has no pre solution for the typing judgment F and thus no solution However it is not true that if it succeeds a pre solution nec essarily exists for the new program where applications of the id operators have been added As an example let us consider the situation where T x 7 gt T1 y T2 gt T2 f Va a a gt qa and e f x y The preprocessing succeeds because all the foreign types
8. kernel We found no problem to extend it to deal with the whole OCaml type system including recursive types modules classes and other fancy features The two ML like typing passes the one used during strengthening and the one using for the real type checking are done on whole compilation units in the toplevel they are done on each phrase The information 10 from the compilation unit interface the cmi file is in tegrated before checking the acyclicity condition Indeed this information acts as additional type annotations on the values exported by the compilation unit and can thus con tribute to obtaining this condition Also in addition to type annotations on expressions OCaml provides several ways to introduce explicit type informations and thus obtain the acyclicity condition datatype definitions explicit types for constructor and exception arguments record fields module signatures type annotations on ML pattern variables OCaml subtyping OCaml comes with a structural subtyp ing relation generated by object types and polymorphic vari ants subtyping and extended structurally by considering the variance of ML type constructors The use of this subtyping relation in programs is explicit The syntax is e t gt t2 sometimes the type t can be inferred and it simply checks that t is a subtype of t2 Of course the OCaml subtyping relation has been extended in OCamlDuce to take XDuce subtyping into account
9. of ex isting ML implementations 8 Conclusion and future work We have presented a simple way to integrate XDuce into OCaml The modification to the ML type system is small enough so as to make it possible to easily extend existing ML type checkers Realistic sized examples of code have been written in OCamlDuce such as an application that parses XML Schema documents into an internal OCaml form and pro duces an XHTML summary of its content Compared to a pure OCaml solution this OCamlDuce application was easier to write and to get right XDuce s type system en sures that all possible cases in XML Schema are treated by pattern matching and that no invalid XHTML output can be produced We refer the reader to OCam Duce s website for the source code of this application The main limitation of our approach is that it doesn t allow parametric polymorphism on XDuce types Adding poly morphism to XDuce is an active research area In a previous work with Hosoya and Castagna HFC05 we presented a solution where polymorphic functions must be explictly in stantiated Integrating this kind of polymorphism into the same mechanism as ML polymorphism is challenging and left for future work The theory recently developped by Vouillon Vou06 could be a relevant starting point for such a task Another direction for improvement is to further relax the acyclicity conditions that is to accept more programs with out requiring extra type ann
10. typed expressions in ML X in an empty typing environment or an environment which con tains built in ML operators cannot go wrong We also as sume that the operational semantics for an of operator de pends only on o not on the annotations 7 7 This allows us to lift the semantics of ML _X to the full calculus Typing problems A substitution is an idempotent func tion from types to types that maps type variables to types foreign type variables to foreign types ground foreign types to themselves and that commutes with ML type construc tors We use a post fix notation to denote a capture avoiding application of this substitution to typing environments ex pressions types or constraints A substitution is more general than a substitution 2 if o2 2 0 gy Or equivalently because substitutions are idempotent there exists a substitution such that 2 do Q A typing problem is a tuple T e t Usually t is a fresh type variable A solution to this problem is a substitution such that T o e t is a valid judgment in ML X We will now rephrase this definition in terms of a typing judg ment on the full calculus This judgment T Fx e tis defined by the same rules as in Figure 2 except for foreign operators for which we take Thx 06 1 9 9 En DE Typing environment and type schemes that are used in the judgment x are allowed to contain foreign type variables We say that is a pre solu
11. OCaml XDuce Alain Frisch INRIA Rocquencourt Alain Frisch inria fr January 2006 Abstract This paper presents the core type system and type inference algorithm of OCamlDuce a merger between OCaml and XDuce The challenge was to combine two type checkers of very different natures while preserving the best proper ties of both principality and automatic type reconstruction on one side very precise types and implicit subtyping on the other side Type inference can be described by two succe sive passes the first one is an ML like unification based al gorithm which also extracts data flow constraints about XML values the second one is an XDuce like algorithm which computes XML types in a direct way An optional prepro cessing pass called strenghtening can be added to allow more implicit use of XML subtyping This pass is also very similar to an ML type checker 1 Introduction This paper presents the core type system of OCamIDuce a merger between OCaml L 01 and XDuce Hos00 HPOO HP03 HVP00 OCamlDuce source code documentation and sample applications are available at http www cduce org ocaml OCaml is a widely used general purpose multi paradigm programming language with automatic type reconstruction based on unification techniques XDuce is a domain specific and type safe functional language adapted to writing trans formations of XML documents It comes with a very precise and flexible type system based on regul
12. ar expression types and a natural notion of subtyping The basic type checking primitives for XDuce constructions are rather involved but the structure of the type checker is simple types are com puted in a bottom up way along the abstract syntax tree the input and output types of functions are explicitly provided by the programmer The high level objective of the OCamlDuce project is to enrich OCaml with XDuce features in order to provide a robust development platform for applications that need to deal with XML but which are not necessarily focused on XML The challenge was to combine two type checkers of very dif ferent natures while preserving as much as possible the best properties of both principality and automatic type recon struction on one side very precise types and implicit sub typing on the other side Our main guideline was to design a type system which can be implemented by reusing existing implementations of OCaml and CDuce BCFO03 Fri04 CDuce can be seen as a di alect of XDuce with first class and overloaded functions for the merger with OCaml we don t consider these extra features Because of the complexity of OCaml s type sys tem it was out of question to reimplement it The typing algorithm we describe in this paper has been successfully im plemented simply by combining a slightly modified OCaml type checker with the CDuce type checker and by adding some glue code As a result OCamlDuce is a stric
13. atch T 7 X x pi gt e1 Ph gt en where 7 lp f and p p 71 U U 74 1 the restric tion of p to values which do not match any pattern form an preceding branch We have used in this translation a new built in ML constant dispatchjr Tn of type scheme Va mn U U Tt gt a gt a gt B gt a gt p 6 which we assume to be present in the initial typing envi ronment Its intuitive semantics is to drop the first argument it is used only to force the type checker to verify that x has type T U Tn which corresponds to the XDuce s pattern pth matching exhaustivity condition and to call the func tional argument 1 lt k lt n on the second argument when k is the smallest integer such that this argument has type Tx In principle the technique described in this paper could be used to integrate many of the existing extensions to the original XDuce design such as attribute element con straints HM03 or XML filters Hos04 without any addi tional theoretical complexity In its current form however OCam1Duce integrates all the CDuce extensions except over loaded functions XML attributes as extensible records se quence and tree pattern based iterators strings as sequences of characters hence string regular expression types and pat terns etc 4 Type system In this section we present a type system and a type inference algorithm for a fixed extension X Our language will be
14. ce an expression e by adding annotations to e such that the algorithm succeeds for the typing problem T e t and that is equivalent for the equivalence induced by the more general ordering to the solution o computed by the algorithm In other words even if the algorithm is not complete be cause of the acyclicity condition and makes a choice be tween most general solutions the smallest one for the sub typing relation for any solution to a typing problem the programmer can always add annotations so that the algo rithm infers this very solution or an equivalent one Partial operators The foreign operators were assumed to be total This means they should apply to any foreign value We can simulate partial operators by adding a new top ele ment T to the set of ground foreign types 7 and by requiring the abstract semantics of operators to be such that whenever an argument is T the result is also T Since the typing algo rithm infers the smallest valuation for foreign type variables we can simply look at it and check that no foreign type vari able is mapped to T 5 Strengthening As we mentioned above we can see the type system for the calculus as an elaboration into its ML X fragment which immediatly gives type soundness In this section we consider another elaboration from the cal culus into itself Namely this elaboration is intended to be used as a preprocessing pass rewriting expressions into ex pr
15. citly this cycle by providing more type annotations For instance the programmer could use the same annotation as above on z1 let z1 lt a gt _ f lt a gt lt b gt lt a gt lt b gt or maybe he will prefer to annotate the input or output type of f 3 Abstract extension of ML The previous section explained the behavior of OCaml Duce s type checker on a example It should be clear from this example that the type system is largely independent of the actual definitions of values types patterns and opera tors from XDuce and could be applied to other extensions of OCaml as well In this section we will thus introduce an abstract notion of extension and show how XDuce fits into this notion This more abstract presentation should help the reader to understand the structure of the type checker with out having to care about the details of XDuce type system Definition 1 An extension X is defined by e a set of ground foreign types T e a subtyping relation lt on T which is a partial order with a finite least upper bound operator U e a set of foreign operators O e for each operator o O an arity n gt 0 and an ab stract semantics T T which is monotone with respect to lt on each of its argument We use the meta variable 7 to range over ground foreign types The foreign operators are used to model both foreign value constructors and operations on these foreign values Si
16. e inference with constrained types TAPOS 5 1 1999 Martin Sulzmann and Kenny Zhuo Ming Lu A type safe embedding of XDuce into ML In The 2005 ACM SIGPLAN Workshop on ML 2005 J r me Vouillon Polymorphic regular tree types and patterns In POPL 2006 To appear 13
17. eoretical contribu tion is in the definition of a subtyping relation which com bines C named subtyping inheritance and XDuce set theoretic subtyping Since the resulting type algebra does not have least upper bounds the nice locally complete type inference algorithm for XDuce patterns HP02 cannot be transferred to Xtatic In Xtatic XDuce types and C types are stratified but the two algebras are mutually recursive XDuce types can appear in class definitions and C classes can be used as basic items in XDuce regular expression types This does not really introduce any difficulty because C types are not structural The equivalent in OCamlDuce would be to allow OCaml abstract types as part of XDuce types which would not be difficult except for scoping rea sons abstract types are scoped by the module system In the last ten years a lot of research effort has been put into developping type inference techniques for extensions of ML with subtyping and other kinds of constraints For instance the HM X framework MOW99 could serve as a basis to express the type system presented here The main modifi cation to bring to HM X would be to make foreign type variables global Another way to express it is to disallow constraints in type schemes which is what we do in the cur rent presentation We have chosen to present our system in a setting closer to ML so as to make our message more explicit our system can be easily implemented on top
18. es the XDuce types quotiented by the equivalence induced by the subtyping relation that is types with the same set theoretic interpretation are consid ered equal The least upper bound operator L corresponds to XDuce s union type constructor usually written We use the following families of foreign operators e a unary operator for each XML label a a unary opera tor e a binary operator corresponding to the concatenation e a constant operator corresponding to the empty se quence e for any pattern p and variable x in Var p a unary op erator written match x p its semantics is to return the value bound to x when matching its argument against the pattern p The abstract semantics for all these operators follows directly from XDuce s theory Element constructor concatenation and the empty sequence expressions can directly be seen as foreign operators This is not the case for a pattern matching match e with p e1 Pn en We are going to present an encoding of pattern matching in terms of operators and normal ML expressions This encoding is rather heavy in practice the implementation deals with pattern matching directly First we define the translation p e of a single branch where Var p x1 Xn as the expression AX let x match x p x in let x match x p x in e Then the translation of match e with p gt e1 Pn gt en is defined as letx ein disp
19. essions in order to make the type system accept more pro grams We call this elaboration procedure strengthening The issue addressed by strengthening is a lack of implicit subsumption in our calculus We already hinted at this issue in Section 2 We will now give more examples Subsumption missing in action We consider the typing problem T1 e1 8 where x 71 y T2 f Va a gt a a and e f x y It admits a solution if and only if 71 T2 In a system with implicit subtyping we might expect to give type T T U 72 to both x and y so that the application succeeds and the result type is 7T Similarly the expression Ax x 7 T2 is not well typed even if 7 lt T unless 7 72 A naive solution Let us see how to implement the amount of implicit subtyping we need to make these examples type check The following rule could be a reasonable candidate as an addition to the type system we write F lt for the new typing judgment Tree T T H lt e T TLT A concrete way to see this rule is that any subexpres sion e can be magically transformed to the application id 2 e where id is a distinguished foreign operator such that id r T and 11 t2 are fresh foreign type variables The type system extended with this rule would accept the examples given above to illustrate the lack of implicit sub sumption However this rule as it stands would add a lot of complexity to the type inference algo
20. esulting constraint T ground foreign type L foreign type variable Types constructed type variable foreign type vow ti Foreign types ex Expressions x program variable Ax e abstraction ee application letx eine local definition e t annotation e l Q e existential variable foreign operator Figure 1 Types and expressions T x amp t T x t Fe tz The ty te Theo tj They t T x Genp t1 F eg t2 TeEx t T F Ax e ty gt t2 T F eies te T F let x e4 in e te TFe t T F efto a t a 7 IT TH e t t TF dae t TF nS S mOT Figure 2 Type system for the ML X fragment It is almost straightforward to adapt unification based exist ing algorithms for ML type inference and their implementa tions to compute a most general pre solution if there exists a pre solution or to report a type error otherwise Indeed the typing judgment Hx is very close to a normal ML type system In particular it satisfies a substitution lemma if TFx e t then x ed to for any substitution Of course if the typing problem has no pre solution it has no solution as well For the remaining of the discussion we assume given a most general pre solution o Let us write V for the set of foreign type variables that appear in To e 0 t o and Co for the constraint C edo A solution to the typing problem is in particular a pre solution As a consequence a substitution is a so
21. gt lt c gt from which we read the following constraints lg gt matchl y pl 41 l5 2 l2 le u 2 lt a gt lt b gt lt a gt lt b gt n 2 lt a gt lt a gt i 2 lt a gt lt c gt In this system we consider match y p as a function from XML types to XML types given by XDuce s type in ference algorithm for pattern matching Similarly the oper ator is now intrepreted as a function from pair of types to types The set of constraints generates dependencies between vari ables We say that a variable on a left hand side of a con straint depends on variables of the right hand side In our example the graph of dependencies between variables is acyclic In this case we can topologically order the vari ables and find the least possible ground XML type for each of them we assign to a variable the union of all its lower bounds In the example we will thus compute the following instantiation u RI tg match y p R1 R2 l5 l2 l2 R2 R2 where R1 is the regular epxression lt a gt lt b gt lt a gt lt b gt lt a gt lt a gt lt c gt and R2 is the regular expression lt a gt lt a gt lt b gt lt a gt lt a gt lt c gt Type checking is over we have found an instantiation for XML type variables which satisfies all the constraints In essence the type checker has collected all the XML types that can flow to the inpu
22. hing or when calling a func tion So we can say that XDuce tries to compute a min imal type for each sub expression by applying basic type checking primitives We will do the same and to make it work we need some acyclicity property which corresponds to the bottom up structure of XDuce s type checker Definition 2 Let C be a constraint We write t a lg if C contains an element o such that t2 and 1 appears in We say that C is acyclic if the directed graph defined by this relation is acyclic Our type inference algorithm only deals with the case of an acyclic constraint Co this condition does not depend on the particular choice of the most general pre solution If the condition is not met we issue an error message It is not a type error with respect to the type system but a situation where the algorithm is incomplete Remark The acyclicity criterion is of course syntactical it does not depend on the semantics of constraints but on their syntax but it is not defined in terms of a specific inference algorithm Instead it is defined in terms of the most general pre solution of an ML like type system In particular it does not depend on implementation details such as the order in which sub expression are type checked Below we furthermore assume that Co is acyclic We define the function po V T by the following equation We V tpo _ 6 Epo 0 Co The acyclicity condition ensures that this
23. ion is then used twice includ ing once indirectly through a call to the function List map from the OCaml standard library of type Va 3 a 3 gt alist list For the purpose of explaining type checking we will rewrite the body of the function f as let f x let y match y p x in y y The y and p parameters of the mat ch operator represent the capture variable under consideration and the pattern itself In OCamlDuce XML values elements sequences and regular OCaml values are kept appart An XML value can of course appear as part of an OCaml value e g the XML ele ments which are put into an OCaml list but an OCaml value cannot appear within an XML value The same applies to types an XML type can appear as part of a complex OCaml type expression but the converse is impossible XML op erators can be applied to XML values and return new XML values In the example we can see three kind of XML oper ators XML literals no argument XML concatenation two arguments and XML pattern matching one argument The basic idea of the OCamlDuce type system is to infer XML types for the inputs and outputs of XML operators This is done by introducing internally a new kind of type variables called XML type variables Before proper type checking starts each XML operator used in the program is annotated with fresh XML type variables in subscript po sition for the inputs and in superscript position for the
24. ion of a type system and do not study the interac tion between XDuce type checking and ML type inference XDuce code can call ML functions but their type must be fully known These last points are precisely the issues tack led by our contribution For instance our system makes it possible to avoid some type annotation on non recursive XDuce functions Another difference is that in our approach the XDuce CDuce type checker and back end compilation of pattern matching can be re used without any modifica tion whereas their approach requires a complete reengineer ing of the XDuce part because subtyping and pattern match ing relations must be enriched to produce ML code and it is not clear how some XDuce features such as the Any type can be supported in a scenario of modular compilation We believe our approach is more robust with respect to exten sions of XDuce and that the XDuce to ML translation can be seen as an alternative implementation technique for XDuce which allows some interaction between XDuce and ML the same kind of interaction as what can be achieved with the CDuce OCaml interface described above The Xtatic project GP03 is another example of the inte gration of XDuce types into a general purpose language namely C Since both C s and XDuce s type checkers op erate with bottom up propagation explicit types for func tions methods no type inference the structure of Xtatic type checker is quite simple The real th
25. lated work The CDuce language itself comes with a typed interface with OCaml The interface was designed to i let the CDuce pro grammers use existing OCaml libraries ii develop hybrid projects where some modules are implemented in OCaml and other in CDuce The interface is actually quite simple each monomorphic OCaml type t is mapped in a structural way to a CDuce type t A value defined in an OCaml mod ule can be used from CDuce the compiler introduces a nat ural translation t Similarly it is possible to provide an ML interface for a CDuce module the CDuce compiler checks that the values exported by the module are compati ble with the ML to CDuce translation of these types and pro duces stub code to apply a natural translation t t to these values This CDuce OCaml interface is used by many CDuce users and served as a basis to the to_ml and from_m1 oper ators described in Section 6 Sulzmann and Zhuo Ming Lu SLO5 pursue the same ob jective of combining XDuce and ML However their contri bution is orthogonal to ours Indeed they propose a compi lation scheme from XDuce into ML such that the ML rep resentation of XDuce values is driven by their static XDuce type implicit use of subtyping are translated to explicit coer cions Their type system supports in addition used defined coercions from XDuce types to ML types However they do not describe a type inference algorithm for their abstract 11 specificat
26. le abstraction and application TF fix t gt t gt t IHF fi x gt x gt x IT Fg t T HFz a I ga IH felgx x T F Ag Ax f c gx t gt t Tre a x I ke x On this principal derivation we observe three sub expressions of a foreign type Accordingly strengthening in troduces three instances of the id operator and thus rewrites the expression to e fix Ag Ax id f idj c idi g x The type checker which is then applied performs some uni fications 41 t4 t6 L2 L5 L3 T We can for instance assume that the computed most general pre solution maps 14 and ig to 41 and 5 to t2 The first and third instances of the id operator in e thus generate the dependencies 1 s lg and Lg 3 t1 Strictly speaking the constraint is cyclic but we can break the cycle simply by unifying 1 and t2 The small est valuation is then given by 119 T We would have ob tained the same solution if we had applied the type checker directly on e without the strengthening pass In this exam ple strengthening is useless and the relaxed acyclicity condi tion is just a way to break a cycle introduced by strenghten ing We can easily imagine more complex examples where strenghtening is really necessary but introduces cycles that can be broken by the relaxed condition 6 Integration in OCaml We have described a type system for basic ML expressions Of course OCaml is much more than an ML
27. lution if and only if o o o and if it maps foreign type vari ables in V to ground foreign types in such a way that IF Cod The minimal modification we need to bring to o to get a solution is to instantiate variables in V so as to validate Co Formally we define a valuation as a function p V gt T such that I Cop To any valuation p we can associate a solu tion defined by t top and any solution is less general than the solution obtained this way from some valuation In particular a solution exists if and only if a valuation exists So we are now looking for a valuation We won t give a complete algorithm to check for the exis tence of a valuation This would lead to difficult constraint solving problems which might be undecidable this of course depends on the extension X Even if they are decidable for a given extension they might be intractable in practice and so we prefer to stick to our design guideline that type infer ence shouldn t be significantly more complicated than both ML type inference and XDuce like type inference XDuce computes in a bottom up way for each sub expression a type which over approximates all the possible outcomes of this sub expression The basic operations and their typing discipline corresponds respectively to our foreign operators and their static semantics XDuce s type system uses sub sumption only when necessary e g to join the types of the branches of a pattern matc
28. m F lt Unfortu nately we don t have such an oracle We could try all the possible choices of sub expressions and this would give a complete type checking algorithm for the type system H lt We prefer to use a computationally simpler solution We also expect it to be simpler to understand by the program mer The idea is to use an incomplete oracle The oracle first runs a special version of an ML type checker on the ex pression to be type checked This type checker identifies all the foreign types together The effect is to find out which sub expressions have a foreign type in a principal derivation that is which sub expression have necessarily a foreign type in all the possible derivations The preprocessing pass con sists in adding an application of the identity operator above all these sub expressions and only those The important point here is that the oracle may be overly conservative Let us consider a type variable which has been generalized in the principal derivation In a non principal derivation it could have been instead instantiated to a for eign type If this derivation had been considered instead of the principal one the preprocessing pass would have added more applications of the identity operator Maybe this would have been necessary in order to make the resulting expres sion type check An example is given by the expression let h eg in h n gt n gt 7 gt 1 UT gt T U T3 gt t t where eg
29. nce we are not going to formalize dynamic semantics we don t need to distinguish between these two kinds of opera tors The monotonicity requirement on the abstract semantics en sures that our resolution strategy taking the union of lower bounds for each variables for constraints is complete We don t formalize in this paper the operational semantics of operators Instead we assume informally that it is given and compatible with the abstract semantics XDuce as an extension We now show how XDuce fea tures can be seen as an extension We consider here a simple version of XDuce with the following kind of expressions element constructor ale seen as a unary operator empty sequence concatenation e e2 and pattern matching match e with p gt e p e OCamlDuce is actually build on CDuce which considers for instance XML element constructors as ternary operators the tag and a specification for XML attributes are also considered as arguments The meta variable p ranges over XDuce patterns We don t need to recall here what they are We just need to know that for any pattern p we can define an accepted type lpf a finite set of capture variable Var p and for any type T and any variable x in Var p a type match x p 7 which represents the set of values possibly bound to x when the input value is in 7 and the pattern succeed Here is the formal definition of an extension X for XDuce We take for foreign typ
30. ons in the syn tax is just a way of simplifying the presentation The main technical contribution of the paper is an algorithm to infer ground foreign types for the foreign type variables The ML X fragment We call ML X the fragment of our calculus where all the foreign types are restricted to be ground Figure 2 defines a typing judgment T F e t for ML X It is exactly an instance of the ML type sys tem Mil78 Dam85 if we see ground foreign types as atomic ML types and ground annotated foreign type oper ators o as built in ML constants or constructors we also introduce explicit type annotation and type variable intro duction We recall classical notions of type scheme typing environment and generalization A type scheme is a pair of a finite set of type variables and of a type t written Va t The variables in amp are considered bound in this scheme We write o lt t if the type t is an instance of the type scheme o A typing environment is a finite mapping from program variables to type schemes The generalization of a type t with respect to a typing environment I written Genp t is the type scheme Va t where is the set of variables which are free in t but not in I Type soundness of the ML X fragment We assume that a sound operational semantics is given for the ML X cal culus This amounts to defining d reduction rules for the o2 operators which are coherent with the abstract semantics for the foreign operators Well
31. otations Once the set of con straints representing XML data flow and operations have been extracted by the ML type checker we could use tech niques which are more involved than simple forward com putation over types The static analysis algorithm used in Xact KMS04 could serve as a starting point in this direc tion Acknowledgments The author would like to thank Didier R my and Fran ois Pottier for fruitful discussion about the design and formalization of type systems References BCF03 V Benzaken G Castagna and A Frisch CDuce an XML friendly general purpose lan guage In ICFP 03 8th ACM International Con ference on Functional Pr ogramming pages 51 63 Uppsala Sweden 2003 ACM Press 12 Dam85 Fri04 GP03 HFC05 HM03 Hos00 Hos04 HP00 HP02 HP03 HVP00 KMS04 L 01 Luis Manuel Martins Damas Type assignment in programming languages PhD thesis University of Edinburgh Scotland April 1985 Alain Frisch Th orie conception et r alisa tion d un langage de programmatio n fonction nel adapt XML PhD thesis Universit Paris 7 December 2004 Vladimir Gapeyev and Benjamin C Pierce Reg ular object types In European Conference on Object Oriented Programming ECOOP Darms tadt Germany 2003 Haruo Hosoya Alain Frisch and Giuseppe Castagna Parametric polymorphism for XML In POPL 2005 Haruo Hosoya and Makoto Murata Boolean op
32. out puts let f x let y match y p x in y 3 y th let z1 f lt a gt lt b gt lt a gt lt b gt 1 let z2 List map f lt a gt lt a gt 17 Pio Sam leee Ly ee FE The regular OCaml type checker is then applied It gives to each XML operator an arrow type following the annotations and then proceeds as usual generalizes types of 1et bound identifiers instantiates ML type schemes when an identifier is used and performs unifications to make type compatible For instance the concatenation operator in our example is given the type t3 t4 Us and the type checker performs the following unifications t2 t3 v4 the type for y L1 le l7 lg the type for the argument of f It also produces the following types for the top level identifiers val f a gt b5 val zl 65 val z2 t5 list Of course we must still instantiate the XML type variables with ground XML types Each occurence of an XML op erator in the program gives one constraint on the instantia tion Indeed we can interpret each n ary operator as as n ary function from XML types to XML types If we choose 41 and v2 as representatives for their classes of equivalence modulo unification the program is let f x let y match y p 2 x in es ae ie aa rI let z1 fF lt a gt lt b gt lt a gt lt b gt let z2 List map f lt a gt lt a gt 1 lt a
33. r Similarly we can imagine a natural XDuce counterpart of an OCaml product type 7 X 72 namely 71 T2 and coercion functions However writing this kind of coercions by hand is tedious OCamlDuce comes with built in support to generate them automatically This au tomatic system relies on a structural translation of some OCaml types into XDuce types lists and array are translated to Kleene star types tuples are translated to finite length XDuce sequences variant types are translated to union types etc Some OCaml types such as polymorphic or functional types cannot be translated OCamlDuce comes with two magic unary operators to_ml from_ m1 both written inthe concrete syntax The first one takes an XDuce value and applies a structural coercion to it in order to obtain an OCaml value this coercion is thus driven by the output type of the operator The type checker requires this type to be fully known polymorphism is not allowed Similarly the operator from_m1 takes an OCaml value and apply a structural coercion in order to obtain an XDuce value Since the type of its input drives its behavior the type checker re quires this type to be fully known This system can be used to obtain coercions from complex OCaml types e g obtained from big mutually recursive def initions of concrete types to XDuce types whose values can be seen as XML documents This gives parsing from XML and pretty printing to XML for free 7 Re
34. rithm As a matter of fact the type system would not admit most general pre solutions anymore We can see this on a very simple exam ple with the typing problem x T x a We could argue that a more liberal definition of being more general should allow some dose of subtyping So let us consider the more complex example T3 f Va a a a and e3 Ax Ay Az Ag g f x y f x z In ML the inferred type scheme would be Va 3 a gt a gt a gt a gt a gt p B which forces the first three arguments to have the same type But if the arguments turn out to be of a foreign type another family of types for the function is possible namely YET gt Tn gt T gt Or Ete gt T Lg gt b gt and these types cannot be obtained as instances of the ML type scheme above we can obtain V3 7 gt T2 gt T3 gt m U T2 U T3 gt T U T2 U 73 gt 8 6 but this is less precise A practical solution We will now describe a practical so lution Instead of modifying the type system by adding a new subsumption rule we will formulate the extension as a rewriting preprocessing pass The rewriting consists in in serting applications of the identity foreign operator id The challenge is then to choose which sub expressions e should be rewritten to id e If we had an oracle to tell us so the composition of the rewriting pass and the type system of Sec tion 4 would be equivalent to the type syste
35. t exten sion of OCaml programs which don t use the new features will be treated exactly the same by OCaml and OCam1Duce It is thus possible to compile any existing OCaml library with OCamlDuce Also we believe our modifications to the OCaml compiler are small enough to make it easy to main tain OCamlDuce in sync with future evolutions of OCaml Our experience so far confirms that Another guideline in the design of OCamlDuce was that XDuce programs should be easily translatable to OCaml Duce in a mechanical way In XDuce all the functions are defined at the toplevel and comes with an explicit signa ture We can obtain an OCamlDuce program by some mi nor syntactical modifications the new constructions in the language are delimited to avoid grammatical overloading of notations Explicit function signatures are simply translated to type annotations The design goals pushed us into the direction of simplicity We choosed to segregate XDuce values from regular ML val ues Of course a constructed ML value can contain nested XDuce values but from the point of view of ML XDuce values are black boxes and similarly for types Also we de cided not to have parametric polymorphism on XDuce types A type variable can of course be instantiated to an XDuce type or to a type which contains a nested XDuce type but it is not possible to force a generalized variable to be instan tiated only to XDuce types or to use a type variable within an
36. t of the function and then type checked the body of the function with the union of all these types In general the OCaml type checker is used to infer the data flow of XML values in the programs The way to solve the resulting set of constraints by forward computation cor responds roughly to the structure of the XDuce type checker Implicit subtyping Let s see what happens if we add an explicit type constraint for z1 let z1 lt a gt _ lt a gt lt b gt lt a gt lt b gt The algorithm described above will infer a much less precise type for z2 as well which is unfortunate The reason is that the OCaml type checker unifies 15 with lt a gt _ Ba sically the unification based type system forgets about the direction of the data flow There is some dose of implicit subtyping in the algorithm but only for the result of XML operators because of the way we interpet them as subtyping not equality constraints In order to address this lack of implicit subtyping we use a preprocessing pass whose purpose is to detect which sub expressions are of kind XML and to introduce around them a special unary XML operator id which behaves semantically as the identity but allows subtyping This preprocessing pass would rewrite the definition for z1 as let z1 lt a gt _ ide lt a gt lt b gt lt a gt lt b gt The variable tg will then be unified with v5 and 419 with
37. the kernel of an ML like type system enriched with types and operators from the extension X Types and expressions The syntax of types and expres sions is given in Figure 1 We use a vector notation to repre sent tuples E g t stands for an n tuple t1 t We assume a set of ML type constructors ranged over by the meta variable P Each ML type constructor comes with a fixed arity and we assume all the types to be well formed with respect to these arities The arrow is considered as a distinguished binary type constructor for which we use an infix and right associative syntax We assume given two infinite families of type variables and foreign type variables respectively ranged over by the meta variables a and v In an expression da e the type variable a is bound in e Expressions are considered modulo a conversion of bound type variables The construction da e thus serves to introduce a fresh type variable a to be used in a type annotation somewhere in e Foreign operators are annotated with the type of their argu ments in subscript position and of their result in super script the number of type arguments is assumed to be co herent with the arity of the foreign operator However in practice the source language does not include the annota tions they are automatically filled with fresh foreign type variables by the compiler we also use this convention in this paper for some examples Putting the annotati
38. tion to the typing problem T e t if the assertion T Fx ed to holds Of course the new rule for foreign operators forgets the constraints that relates the input and output types of foreign operators In order to ensure type soundness we must also enforce these constraints Formally we define a constraint C as a finite set of annotated foreign operators o We write I C if all the elements of C are of the form o2 with 6 7 lt r For an expression e we collect in a constraint C e all the instances of foreign operators o that appear in e Note that for any substitution we have C e d C e We are ready to rephrase the notion of solution Lemma 1 A substitution is a solution to the typing prob lem T e t if and only if the following three assertions hold e T ed and t do not contain foreign type variables e isa pre solution to the typing problem e IH C ed Type soundness Type soundness for our calculus is a triv ial consequence of the type soundness assumption for the ML X fragment Indeed we can see a solution to a typ ing problem T e t as an elaboration into a well typed pro gram in this fragment Type inference Let us consider a fixed typing problem T e t We want to find solutions to this problem Thanks to Lemma 1 we will split this task into two different steps e find a most general pre solution o e instantiate the remaining foreign type variables so as to satisfy the r
39. ur approach to related works 2 Anexample In this section we illustrate the behavior of OCamlDuce s type checker on the following code snippet let f x match x with tell risa To SR Re ee ye Tt et z1 f lt a gt lt b gt lt a gt lt b gt 1 1 et z2 List map f lt a gt lt a gt lt a gt lt c gt The exemple is intended to illustrate the use of the OCaml type checker to perform a data flow analysis of XML values and also how OCaml features here higher order functions and data structures interact with XDuce features Double curly braces are used in OCamIDuce only to avoid ambiguities in the grammar they carry no typing information For instance the symbol used for list con catenation in OCaml is re used for denote XML sequence concatenation Similarly the square brackets are used both to denote OCaml list literals whose elements are separated by semi colons and XML sequences literals when used within double curly braces their elements are separated by whitespace XML element literals are written in the form lt tag gt content The first line of the program above declares a function f which consists of an XML pattern matching on its ar gument with a single branch The XML pattern p y lt a gt _ _ extracts from an XML sequence all the elements with a tag lt a gt and put them in order in the capture variable y The funct

Download Pdf Manuals

image

Related Search

Related Contents

Samsung E958 User Manual    Règlement d`Ordre Intérieur  HBC-70/200 Icelined refrigerator Service Manual  Mode d`emploi du médicament BISEPTINESCRUB - Para  Sony PCV-RZ40C Technical Specifications    Braun MILLENNIUM A5 User's Manual  LIBRATONE ZIPP OWNER`S MANUAL  SBS CO9TF2002  

Copyright © All rights reserved.
Failed to retrieve file