Home
Logical Expressions: Analyzing, Generalizing, Rewriting
Contents
1.
2. BK O 99 BK V 96 BK V 96 b BK V 98 BSV 97 BSV 98 J ohn75 CH P91 CO B Reasoning Systems Palo A Ito California Refine U ser s Guide 1992 M G J van den Brand A van D eursen P K lint A S K lusener and E A van der M eulen Industrial applications of ASF SDF In M Wirsing and M Nivat editors Algebraic M ethodology and Software Technology AMAST 96 volume 1101 of Lecture N otes in Computer Science pages 9 18 Springer Verlag 1996 H J Boehm Boehm Garbage C ollector Code and documentation can be found at http reality sgi com boehm_mti gc html J A Bergstra J H eering and P K lint editors Algebraic Specification ACM Press A ddison Wesley 1989 M G J van den Brand H A deJong P Klint and P A O livier Efficient Annotated Terms Submitted 1999 J A Bergstra and P K lint TheD iscrete Time ToolBus a software coordination architecture Science of Computer Programming 31 2 3 205 229 July 1998 MG J van den Brand T uipers L M oonen and P O livier Implemen tation of a prototype for the new ASF SDF M eta Environment Pro ceedings of the 2nd International Workshop on the T heory and Practice of Algebraic Specifications September 1997 M G JJ van den Brand P Klint and P A O livier Compilation and memory management for A SF SD F CC 99 M G JJ van den Brand P Klint and C Verhoef Core Technologies for System Renovation In G Jeffrey J
3. gical Expressions Analyzing G eneralizing R ewriting embedding a scientific implementation in a commercial environment E Alexander van den Bergh 1999 Preface ThePlan D ecember 1998 finished my last tests and was only a master thesis away from gradu ating A fter four and ahalf years at the U niversity of Amsterdam thought it was time to take a look in the corporate world Since had no idea where to apply for a scientific internship asked Paul K lint whether hehad any options A fter afew searches and some rejections ended up at TriLoc Software Engineering TriLoc had aclose relationship with the university and needed someoneto experiment with the technology they were buying from them It turned out the technology was mostly ASF SDF with which was already acquainted We formulated a project in which it was my task to see whether the A SF SD F rewrit ing capabilities could be useful for TriLoc and if it would be possible to embed them in their software renovation processes The university agreed with the plan and although nobody including myself thought the project would be completed in the given time we started the implementation TheO utcome Theenvironment at TriLoc turned out to bequite good They provided me with some nice equipment a Sun workstation Sparc 5 for the M eta Environment an IBM PC 266 Pentium II for C development and porting issues and of course had my own Apple Powerbook and 3 desktop t
4. GLF AriExpr GLF AriExpr pow GLF AriExpr GLF AriExpr GLF AriExpr 22 1 4 2 U ntyped Infix GLF Although G LF iseasy to interpret from acomputer parser perspective larger expres sionsarehard to read for humans H uman analysts who work with G LF need amore readablefor m To meetthosedemandsweintr oducethe U ntyped1nfix GLF or just UIGLF sinceitsmainly purposeisto improvereadability for usersit can beregarded User Interface GLF aswell UIGLF isGLF translated to an infix form withoutthetype tags with extra brackets to improvereadability and with asmall set of easy to interpret key words The U ntyped in UIGLF U ntyped of coursemeanstheremoval of thetags R emoval of tagsalso meansremoval of someinformation butinthiscaseweassumetheanalyst has knowledgeof thecontext of thecode such astheprogramming languageso heor shecan differentiate between variables and keywords O n theother hand theremoval of information makestheex pression easier to understand The Infix in UIGLF Wehavechoseninfixinstead ofpr efixnotationwhenitcomestor eadability Innatural aswell as mathematical notation thisistheway to go 4 2iswritten and pronounced in infix four minustwo Thealert reader will noticethat this also meansthat in somecases brackets haveto be inserted mul 4 add 2 3 can notbewritten as 4 2 3 it must bewritten with brackets around theaddition 4 2 3 The U ser Interface in UIGLF
5. env ReleaseStringUTFChars env 7 string c string W hen these issues have been addressed correctly C functions can be called from Java arguments can be passed to them and return values can be received from them theJN I provides even more functionality but we will not be using that in this project TheJN within our system is shown in Figure 9 2 1 Sglr Parser TRS subcomponent External System JNI Java part Y JNI C part A Y M eta Adapter A SF ix2T ext Figure 9 2 1 The external J ava system communicates with aJ ava package which has aC counterpart with which it communicates using the JN I The JNI C part communicates with the M eta adapter which in turn is connected to the subcomponents of our TRS component Error H andling N o C rashes Please At various points in therewriting process in the TRS component things can go wrong unparsable input terms can cause the system to crash illegal parse tables a TR S sub component with a slightly different signature than the parse table etc An error does not have to bea problem a crash is O ur system is embedded in a larger Java system that requires safe and secure error handling We do not want the whole system to crash just because our TRS component has encountered an error Fortunately the ATerm library provides a way to implement custom error functions With the ATsetE rrorH andler function we can pass our own error handl
6. informa tion logic from legacy systems The approach focuses on the business logic rather than the technical ground of the system It sees the legacy system as an important asset with reusable valuefor the future instead of amillstonearound one s neck This in depth analysis done by Logic M ining can be helpful in various situ ations such as isolating components that perform certain tasks for mainte nance codeslicing techniques identifying business processes and modelling or documenting the system Term rewriting A nalysis and transformation of existing code form the core functionality of software renovation systems When using formal methods liketerm rewriting for thesetasks we work with afirm mathematical basis for the system To describealanguagein an impera tive programming language like C we need to write a parser for it Directly writing a parser for a certain language can quickly become a large project and therefore hard to debug In a formalism like SDF we define a grammar without having to worry about parsing techniques and algorithms The usage of compiler construction techniques such as scanning parsing ty pechecking and analysis can be useful in the field of re engineering as described in BK V 96 b Sys tems that support specification formalisms nevertheless are not widely available A nd even if wehaveaformalism at our disposal we face serious technology transfer problems when wetry to sell term rewriting to the indu
7. the embedding of a scientific system within a commercial el system and theextending of asystem madewith conventional languages UINIVERSITEIT VAN AMSTERDAM with a system designed with an algebraic formalism E 1 3 Related Work There have been projects that focussed on rewriting CO BOL We will name a few here Control flow normalization in order to improvemaintainability of COBOL CICS systems which also makes use of normalized logical expressionsis described in BSV 98 N ormalization of COBOL conditions is also a subject in SV 98 In BK V 98 a global description regarding the rewriting of CO BOL systems is given Issues regarding C 0 BOL grammar in SDF disambiguation preprocessing and restricting can be found in BSV 97 In this thesis we will try to be more thorough We will introduce a language indepen dent format for logical expressions and normalizations on this language independent format In BK V 96 possible software re engineering applications of the M eta Environment technologies are described A software renovation factory is described in BSV 98 It contains a number of assembly linesthat perform anumber of tasks in afixed order O ur project is not quite the same since it integrates a component in a system on a more ob ject oriented way it can be called at any given time from another process Part A General Model for Logical Expressions In the field of software re engineering higher levels of abs
8. A BORA C AND A D OR A E butthat is not the case The dis abbreviation of logical expressionsismoreatextual than alogical process C AND D OR E A A A 0 AND A gt DORA gt B OR B OR Combiningsomeofthepr oper tiesof CO BOL wehaveseensofar thefollowingtwo expressionshavethesamemeaning A NOT NOT A B OR NOT C AND B B OR A C AND B D AND gt E OR F D AND B gt E OR B gt F A COBOL programmer can choose any stylein which heor she wants to program Stylescan also bemixed leadingto unreadableexpressions 16 COBOL alsoallows88 fieldswithin logical expressions T hesefields are actually tests whether avariable orfield within arecord hasacertain predefined value T hesystem wepresent doesnot pay attention to them in our casethey havebeen modified to rela tional expressionsby apreprocessing phase Consider therecord 01 TEST RECORD 03 STATUS PIC 9 05 END OF LINE VALUE 1 05 END OF PAGE VALUE 2 05 END OF FILE VALUE 3 Theconditional expression IF END OF LINE THEN istranslated by the preprocessor to the isarecord field notation record field IF TEST RECORD STATUS TEST RECORD STATUS END OF LINE Prioritiesin COBOL Tofor cethepriorities within arithmeticoperationsin C O BO L wehavedesigned our COBOL grammar in SDF insuch away thattheparsetreealready containsthesepriori ties A rithmeticoperations on operands aretyped O perands of an operation can beof th
9. D uring the porting and testing process we discovered a few incompatibilities that still need to be addressed Sun and Microsoft have different virtual machines As described earlier the TRS component in Windows throws an exception when an error is encountered Strange enough this still resulted in a crash when it was tested After reviewing the code and the input we decided to use a different virtual machine In our situation at TriLoc softwareengineering weareworking with an external system that contains a lot of Java and has an interface which is created with M icrosoft s Visual Basic In order to make the communication between the interface and J ava as smooth as possible we used M icrosoft s J ava virtual machine T his generated crashes when we were throwing exceptions Sun s virtual machine for Windows did not have this problem We hope this issue is addressed in future versions of M icrosoft s virtual machine Generated C code and O Id Meta E nvironment R eact Differently Atthetimeof this writing the development of A SF SD F specifications was still done using the old M eta Environment based on the C entaur system We noticed that in some cases some terms where not rewritten to their normal form under Windows when under the M eta Environment they were This was the fact be cause the generated C codeof the A SF SD F specification generated by the M eta E nvi ronment is more specific about the use of types than the old M eta E n
10. Expressions Another issueistheremoval of thenegations that wherewithin thecondition C on siderthefollowingrulethat addressesthecasein which an expression like A BORNOT gt C must beconverted to A lt BORNOTA gt C continuewith operand A andoperator N OT gt CUR2b CleanUpRel NOT Op RightOp Last LeftOp Last Op NOT Last LOp Op RightOp Last LeftOp NOT Op Thelasttwor ulesr etur n logical expr essionsthat begin with aN OT Thesecasescan produceredundant double negations O n ahigher level theU nary expression this problem issolved when weencounter aN OT befor eaU nar y wepr ocessthatU nar y first ifthatgeneratesasecondN OT weleavethem both out Remove multiple NOT s CU3c CleanUpUnary Unaryl Left 001 001 NOT Unary2 Left OP2 Op2 CleanUpUnary NOT Unaryl Left 001 001 Unary2 Left_OP2 Op2 E 5 2 COBOL to GLF TheCO 80 L toGLF translator is differ ent fr omtheC O BOL Cleaner T helatter uses only onegrammar for both thesourceand thetarget language whiletheformer really translatesonelanguageto another both specified intwo dif fer entgrammarsinA SF 4SDF So wearedealing with two grammarsand aset of equationsthattranslatestermsof one grammar into ther equivalentsin theother grammar Thiscan beseen when onelooksat themodular str uctur eofthetranslator Asabasis wehavemodulesthat areshared by both grammars i e characters and digits above thatthetwo grammarsaresep
11. O ther readability issues are addressed within U IG LF aswell Thepriority rules for instancearewritten down with bracketsto maketheanalyst awareofthem 4 3 2 iswritten down as 4 3 2 and iswritten down as AORBANDC AOR BANDC Thisfeatur ecan causeundesir edef fects T oo many bracketscandecr easer eadability A AND B AND C ANDD ORE Wefindthatthebracketsthat separatethebinar y and operatorsfr omeachotherar e redundant So aslongasweencounter thesameoperator or an operator with thesame priority weleaveoutthebrackets A AND BAND C AND D ORE 23 A General Model for Logical Expressions UIGLF isdeveloped for r eadability only Itisnot designed to work asan analysis format for computerised detailed analysis but it can behapful forthe understanding of themeaning of acertain G LF expression Wewill givean exampleofthedif ferencebetween G LF and U IG LF by pr ovidingtheir SDF definitionsof arithmetic expressions Highest Priority Notice how GLF does not allow brackets while UIGLF does GLF Identifier GLF AriExpr GLF Literal GLF AriExpr GLF FigLit GLF AriExpr UI Identifier UI PrimExpr UI Literal gt UI PrimExpr UI FigLit gt UI PrimExpr UI Avribxpr gt UI PrimExpr neg GLF AriExpr gt GLF AriExpr UI CHR MINUS UI UnaryExpr gt UI UnaryExpr Chain functions UI PrimExpr gt Ur UnaryExpr UI UnaryExpr gt UI AriExpr add GLF AriExpr GLF A
12. R and N OT threerules must be ap plied to getan expression in C N F movingthenegationsinwards D eM organ not A AND 8 lt NOT A OR NOT B not A OR B lt NOT A AND NOT B InASF SDF itiswritten likethis mn3 move negs not and GLF LogExprl GLF LogExpr2 or move negs not GLF LogExprl move negs not GLF LogExpr2 mn4 move negs not or GLF LogExprl GLF LogExpr2 and move negs not GLF LogExprl move negs not GLF LogExpr2 N ow that all thenegationsaremoved inwardsw eneed to distributethe or over and or A and B C and or A B or A C Sothe and sar emovedtotheoutsideandthe or sto theinside Thecoreofthisiswritten down likethisin A SF SDF dol distribute ors or GLF LogExprl and GLF LogExpr2 GLF LogExpr3 and distribute ors or GLF LogExprl GLF LogExpr2 distribute ors or GLF LogExprl GLF LogExpr3 When weencounter adisjunction with no direct conjunctionsin it thisis becauseit is adefault equation theformer equation has got ahigher priority and would match first in such acase wedistributethedisjunctionswithin thesub expressionsfirst W hen they aretransformed to CN F aconjunction could havebeen moved outsideto oneof the results in this casethesecond operand of theconjunction wastransfor med to CN F GLF LogE xpr4 and containsan conjunction This means that thew holeexpression hasto beprocessed again sincethis conjunction must bemoved ou
13. external system is hard to do since it also means that the signature of the A SF SD F specification should be parsed or the parse table from the M eta Environment should be interpreted 1 8 3 Wrapping up the design Theideaisto writecodethat usesthefunctionality of our threesubcomponents SG LR the TRS and AsFix2Text in order to achieve the desired effect We could also have cho sen to use these subcomponents as separate tools and thereby modifying the existing code as little as possible but that would mean we had to initialize and instantiate a new instance of the ATerm library for each subcomponent N ow we only instantiate the ATerm library once meaning we have minimal overhead and maximal sharing The code is stored in alibrary called the M eta A dapter TheM eta A dapter can be viewed upon as a kind of sequencer The components under U nix are mostly tools stand alone applications that communicate with each other us ingtheToolbus Wewantto modify thesetools in such a way that they become libraries independent of the Toolbus This mostly requires the removal of code such as main functions and Toolbus functionality In our approach theM eta adapter takes over thecommunication between the subcom ponents previously handled by the Toolbus An in depth look at the M eta adapter The M eta A dapter is the most interesting part of the Windows N T system when it comes to understanding the design We will now give a more detailed descr
14. have to modify The Toolbus is a software applica the whole system when we use a different A SF SD F specification tion that utilizes a Scripting i language based on process algebra Thanks to the U nix way of installing softwarewe have the source to describe communication between code of every component we need so we can modify it in away to software tools make it suitable for our purposes We prefer to leave all automati cally generated code of the TRS untouched because when we modify that we will have to do so each time we create a new TRS The complete system will look like figure 8 2 1 n ASF SD F2 specification ATerm C source and Parse T able gt External AsFix TRS lt System component Support WA 1 R Figure 8 2 1 The M eta Environment is still operating under U nix while the rest of the system is ported to Windows N T The TRS component that must be created can be a library a stand alonetool or something in between likeadynamic linked library that can beloaded and called from the external system A moretechnical thing wehaveto keep in mind istheuseof theATerm library A lot of the components we need make use of this library and if we want to fully exploit its maximal sharing capabilities we must try to makethem use the same ATerm environ ment instance of the ATerm library Initializing and using the library explicitly for each component would makethe syst
15. lates Pascal logical expr essionsto GLF and theGLF T oolbox as well astheU IGLF translator will work on them A larger extension of thesystem would betheextension of G LF TheGLF described in thisthesiscan not handlefunction callsor other user defined Boolean atomics Although lots of opportunitiesforfuturework areavailablewestressthat thecurrent system isenough to do someserious rewritingon CO BO L legacy systems 33 Part Il Embedding A Formal Specification in a Commercial Environment The ASF SDF M eta Environment is an interactive programming environment for developing specifications in the algebraic specifica tion formalism ASF SD F gt providesa variety of tools useful for the development of stand aloneterm rew riting systems based on thisfor malism Theuseof formal specificationscan enhance the development of soft ware especially in the area of software re reverse engineering but in practice it is used only rarely due to various practical issues This part of the master s thesis describes an industrial application of the ASF SD F M eta Environment especially focusing on porting it to and integrating it in an existing commercial system Though various applications of ASF SD F in commercial environ ments are known BD KM 96 there hasnever been an integration on thislevel where wetakethe functionality of the M eta E nvironment and totally adjust it to its new surroundings 7 Introduction In this chapte
16. theF OR G and defined by theabovethebinary logical connective and F G which abbreviates not or not F not G Allthefor mulasover Var L denotedas Form L ar edefined asfollows e Anelementof Var L isafor mulaover Var L e IfFand Garefor mulasover Var L then not F called negation andor F G called disjunction ar efor mulasover Var L W ecan definethew ell known semanticsof theseoperations with theBoolean set bools true false and avaluation function v Form bools which satisfies v not F false ifv F true v not F true ifv F false v or F G false ifv F falseandv G false v or F G true ifv F true or v G true Although wewill notbeevaluatingthelogical expressionsin thisthesis w ewill need to know their semanticsin order to maketheright decisionsin futurerewritings for in stanceitisnothar dto seethat not not F hasthesamesemanticsasF 21 A General Model for Logical Expressions Tagsin GLF GLF labelscertain tokenswith atype tag T hisisdoneto prevent lossof information H erearesomeexamples i 4 decimal number integer 4 h 4 hexadecimal number 4 r 4 real number 4 Tokenscan getmor ethan onetag becauseof theuseof chain functions within SD F typelisoftype2 vn BLA variablenameBL A id vn BLA identifier variablenameBL A Sinceoperatorsin G LF arewritten in prefix they do not get an extra tag add i 1 i 1 also knownas1 1 add mul i 1 i 1 i 1 can bewritten as 1 1 1
17. will translateour CO BOL to GLF on which wecan performvarious transformations A Il of theserewritings can bedoneusingan ASF SDF term rewriting system TR S 2 2 ASF SDF To specify thegrammarof our sour ceandtar getlanguagesand to specify thebehavior of our rewritingsystenswewill usetheA SF SD F formalism TheA Igebraic Specifica tion Formalism A SF BH K 89 providesuswith the ability to definethe abstract syntax i e typeand arity of functions rewriterules and writeconditional equationsdefining thesemanticsof thesefunctions T heSyntax D efinition For malism H K R 89 SD F al lowstheintegrated definition of lexical context freeand abstract syntax It contains enoughinfor mationto generateparsetablesfor thespecified syntax T ogetherthey for m ASF SD F specifications Why ASF SD F A number of formalismsexistin which itis possibleto defineprogramming languages Systems likethewell known Lex Y accduo L S86 ohn75 and R efine A Ito92 also offer parsing and rewriting techniques A SF SD F with itsM eta E nvironment BK M 97 hastheadvantageofpr oviding modularity Modularityisofgr eatimpor tanceconsider ingthesizeand extendibility of our project FurthermoretheL R parsing doneby Re fineand Lex4Y accislessthor ough than SG LR parsing V is97 2 doneby A SF SD F sincetheformer may offer shift reduce and reduce reduceconflicts B SV 97 Inpartll ofthisthesiswewill giveamoredetailed description of theA SF SD F M eta Envir
18. Brackets except for addexpr UIGLF AEb add1 UIGLF AriExpr b add add GLF AriExprl GLF AriExpr2 UIGLF AriExpr add GLF AriExprl GLF AriExpr2 UIGLF AEb add1 UIGLF AriExpr b add sub GLF AriExprl GLF AriExpr2 UIGLF AriExpr sub GLF AriExprl GLF AriExpr2 default UIGLF AEb add UIGLF AriExpr b add GLF AriExpr UIGLF AriExpr b GLF AriExpr UIGLF AriExpr b mul Brackets except for mulexpr UIGLF AEb add1 UIGLF AriExpr b mul mul GLF AriExprl GLF AriExpr2 UIGLF AriExpr mul GLF AriExprl GLF AriExpr2 UIGLF AEb add1 UIGLF AriExpr b mul div GLF AriExprl GLF AriExpr2 UIGLF AriExpr div GLF AriExprl GLF AriExpr2 69 A ppendices default UIGLF AEb add UIGLF AriExpr b mul GLF AriExpr UIGLF AriExpr b GLF AriExpr PrimExpr UIGLF PE1 UIGLF PrimExpr GLF Identifier UIGLF Identifier GLF Identifier UIGLF PE2 UIGLF PrimExpr GLF Literal UIGLF Literal GLF Literal UIGLF PE3 UIGLF PrimExpr GLF FigLit UIGLF FigLit GLF FigLit UnaryExpr UIGLF UE1 UIGLF UnaryExpr GLF AriExpr UIGLF PrimExpr GLF AriExpr UIGLF UE3 UIGLF UnaryExpr neg GLF AriExpr UIGLF UnaryExpr GLF AriExpr module GL F R elE xpressions equations oe By Alex van den Bergh july 1999 RelExpr UIGLF RE1 UIGLF Rel
19. C and B D or not B E DN F without negations A gt B or A gt D and A gt C and B D or B lt E 32 6 Concluding R emarks after Part N ow that we have presented our source C O BOL or Clean COBOL our target GLF or UIGLF and our rewriters wewill summarizethe results 1 6 1 Conclusions Inthispartwehavediscussed thenon intuitiveway logical expressions can bew ritten inCO BOL V havepr esentedar ewritingsystemthat enablesusto disabbr eviatethose expr essionsto acleanfor matfr om which wecantranslatethem to ageneral logical for mat GLF Thedesign ofther ewriting system andthelogicalfor matismodular so the system asawholecan easily beextended to different languages Wehaveshownthaton G LF varioustranslationsand transfor mationsar epossibleto improvethereadability and analysis of G L F expressions T his makes control flow analysiswithin legacy systemsan easier task The pluggingin ofthissysteminto an existing analysis system is described in part ofthisthesis E 6 2 Future work A lotof opportunitiesfor extendingthecurrent rewriting system exist T hemost dli giblepartto extend istheG LF toolbox O necan think of optimizations of logical ex pressionsor statistics about thesize theamount of operatorsand whether an expression can besatisfied if onerelational expression within that expression can not O ther languages can beadded to thesystem as well Simply makearewriter that trans
20. C are different languages not just when looking at the object oriented way Javais designed but also by means of implementation Java for instance has a built in garbage collector that keeps track of everything within the Java program that uses memory Fortunately with Java theJava N ative I nterface or JNI JN 199 is developed to make communication p with C possible eH TheJNI provides aset of C functions that take care of various translations and 8 memory incompatibilities It can take a Java class file with definitions of native lt methods which are the Java counterparts of C functions and generate a C JAVA header file with the definitions of the C functions ee 50 A Java native method with its C counterpart Application public static native String C Side Java Side DoItInC String istr String ptable String ssym JNIEXPORT jstring JNICALL init 3 Java DoItInC JNIEnv jclass jstring jstring jstring These definitions must be used as definitions of the actual C func The Java Native Interface talking tions In C several issues regarding Java s memory management and C and drinking Coffee different types have to be addressed For instance strings haveto be decoded before they can be used in C and simply freeing memory of variables declared in Javais not allowed Decode a javastring fo string env GetStringUTFChars env j string 0 Release a string so the Java garbage collector can remove it
21. Combining the Sources TheM eta A dapter has external references to functions that exist in the TR S subcom ponent the generated TRS code the latter is linked along with the M eta A dapter to a higher component which does the actual linking of the whole component This design enables us to rewrite terms in a clean independent and an efficient way The TRS component can be replaced by any other rewriting mechanism without the 47 Embedding a Formal Specification in a Commercial Environment external system being modified in any way whatsoever its functionality is well defined when the A SF SD F specification is known and clear and it makes use of the efficient ATerm memory management TheCompilers We deal with a system that needs to be ported from U nix to WindowsN T AnANSI C compiler like CC or GCC is used under Unix and we use Microsoft s Visual C compiler under Windows NT Operating System Sucks Rules O Meter P 5 y Ses woes roche a Solving Differences Pree SI Linn as wena 8 N ow that we have defined our current situation and 5 our desired result it istimeto actually implement it We will be discussing many of the global differences M between platforms and languages that have influ inani 0 EORUM enced our design strategies or are otherwise worth Platforms differences seem to reach further than mentioning O f course we will also present ways to just te
22. Expr GLF AriExpr UIGLF AriExpr b GLF AriExpr UIGLF RE2 UIGLF RelExpr Ilte GLF AriExprl GLF AriExpr2 UIGLF AriExpr b GLF AriExprl lt UIGLF AriExpr b GLF AriExpr2 UIGLF RE3 UIGLF RelExpr gte GLF AriExprl GLF AriExpr2 UIGLF AriExpr b GLF AriExprl gt UIGLF AriExpr b GLF AriExpr2 UIGLF RE4 UIGLF RelExpr lt GLF AriExprl GLF AriExpr2 UIGLF AriExpr b GLF AriExprl lt UIGLF AriExpr b GLF AriExpr2 UIGLF RE5 UIGLF RelExpr Ggt GLF AriExprl GLF AriExpr2 UIGLF AriExpr b GLF AriExprl UIGLF AriExpr b GLF AriExpr2 UIGLF RE6 UIGLF RelExpr neq GLF AriExprl GLF AriExpr2 UIGLF AriExpr b GLF AriExprl lt gt UIGLF AriExpr b GLF AriExpr2 UIGLF RE7 UIGLF RelExpr eq GLF AriExprl GLF AriExpr2 UIGLF AriExpr b GLF AriExprl UIGLF AriExpr b GLF AriExpr2 RelExpr b Brackets only when relExpr with relational operator 5 so no brackets around identifiers literals etc UIGLF RE b1 UIGLF RelExpr b GLF AriExpr UIGLF AriExpr b GLF AriExpr UIGLF RE default UIGLF RelExpr b GLF RelExpr UIGLF RelExpr GLF RelExpr 70 module GLF C onditions equations LogExpr UIGLF LE1 UIGLF LogExpr b or GLF LogExprl UIGLF LogExprl UIGLF LogExpr b or GLF LogExpr2 UIGLF LogExpr2 UIGLF LogExpr or GLF LogExprl GLF LogExpr2 UIGLF LogExprl or UIGLF LogExpr2 UIGLF LE2 UIGLF LogExpr b and GLF LogExprl UIGLF LogExprl UIGL
23. F LogExpr b and GLF LogExpr2 UIGLF LogExpr2 UIGLF LogExpr and GLF LogExprl GLF LogExpr2 UIGLF LogExprl and UIGLF LogExpr2 UIGLF LE2 UIGLF LogExpr GLF LogExpr UIGLF LogExpr UIGLF LogExpr not GLF LogExpr not UIGLF LogExpr UIGLF LE2 UIGLF RelExpr GLF RelExpr UIGLF LogExpr UIGLF LogExpr GLF RelExpr UIGLF LogExpr LogExpr b Expr with Brackets except for not because that already has brackets UIGLF LEb1 UIGLF LogExpr b not GLF LogExpr UIGLF LogExpr not GLF LogExpr default UIGLF LEb UIGLF LogExpr b GLF LogExpr UIGLF LogExpr GLF LogExpr LogExpr b or Expr with Brackets except for or UIGLF LEb or1 UIGLF LogExpr or GLF LogExprl GLF LogExpr2 UIGLF LogExpr UIGLF LogExpr b or or GLF LogExprl GLF LogExpr2 UIGLF LogExpr default UIGLF LEb or UIGLF LogExpr b GLF LogExpr UIGLF LogExpr UIGLF LogExpr b or GLF LogExpr UIGLF LogExpr LogExpr b and Expr with Brackets except for and UIGLF LEb and1 UIGLF LogExpr and GLF LogExprl GLF LogExpr2 UIGLF LogExpr UIGLF LogExpr b and and GLF LogExprl GLF LogExpr2 UIGLF LogExpr default UIGLF LEb and UIGLF LogExpr b GLF LogExpr UIGLF LogExpr UIGLF LogExpr b and GLF LogExpr UIGLF LogExpr Condition UIGLF LOE1 UIGLF LogExpr GLF LogExpr UIGLF LogExpr GLF2UIGLF Condition GLF LogExpr UIGLF LogExpr 71 A ppendices A ppendix D The Java part of the M eta adapter to i
24. GLF LogExprl GLF LogExpr2 distribute ors or GLF LogExprl GLF LogExpr3 All the other cases can have no ands directly within an or default do4 distribute ors GLF LogExprl GLF LogExpr3 is and GLF LogExpr3 NO distribute ors GLF LogExpr2 GLF LogExpr4 is and GLF LogExpr4 NO distribute ors or GLF LogExprl GLF LogExpr2 or GLF LogExpr3 GLF LogExpr4 default do5 distribute ors GLF LogExprl GLF LogExpr3 is and GLF LogExpr3 YES distribute ors GLF LogExpr2 GLF LogExpr4 distribute ors or GLF LogExprl GLF LogExpr2 distribute ors or GLF LogExpr3 GLF LogExpr4 default do6 distribute ors GLF LogExprl GLF LogExpr3 distribute ors GLF LogExpr2 GLF LogExpr4 is and GLF LogExpr4 YES distribute ors or GLF LogExprl GLF LogExpr2 distribute ors or GLF LogExpr3 GLF LogExpr4 do8 distribute ors and GLF LogExprl GLF LogExpr2 and distribute ors GLF LogExprl distribute ors GLF LogExpr2 since we have already moved the negations to the RelExprs SS with move negs they don t need to be processed further i e a not is allways around a GLF RelExpr do9 distribute ors not GLF RelExpr not GLF RelExpr do10 distribute ors GLF RelExpr GLF RelExpr 5 distribute ands 88 and p or q r or and p q and p r dal distribute ands and GLF LogExprl or GLF LogEx
25. Kral and M Bartosek editors 50 FSEM 96 Theory and Practice of Informatics volume 1175 of LN CS pages 235 255 Springer Verslag 1996 M G JJ van den Brand P Klint and C Verhoef R eengineering needs generic programming language technology Technical R eport P 9618 U niversity of A msterdam Programming R esearch G roup 1996 M G JJ van den Brand P Klint and C Verhoef Term Rewriting for Sale Electronic N otes in Theoretical Computer Science 15 1998 Available at http www elsevier nl locate entcs volumel5 html M G JJ van den Brand M P A Sellink and C Verhoef O btaining a COBOL grammar from legacy code for reenigineering purposes In M P A Sellink editor Proceedings of the 2nd International Workshop on the Theory and Practice of A Igebraic Specifications electronic Workshops in Computing Springer verslag 1997 Available at http adam wins uva nl x cfn cfn html M G JJ van den Brand M P A Sellink and C Verhoef Control flow normalization for CO BOL CICS legacy systems In Proceedings of the Second Euromicro C onference on M aintenance and R eengineering pages 11 19 1998 Available at http adam wins uva nl x ref ref html S C Johnson YACC Yet Another C ompiler C ompiler Technical Report Computer Science N o 32 Bell Laboratories M urray Hill N ew Jersey 1975 J R Cordy C D H alpern H amu and E Promislow TXL A rapid prototyping system for programming language dialects C omputer Langu
26. LF AriExpr GLF AriExpr 37 GLF AriExpr 9 GLF AriExpr p Ng GLF AriExpr GLF AriExpr 9 neq GLF AriExpr GLF AriExpr 5 eq GLF AriExpr GLF Arikxpr UIGLF RelExpr gt UIGLF RelExpr UIGLF FigLit UIGLF RelExpr UIGLF AriExpr UIGLF RelExpr UIGLF AriExpr UIGLF RelExprOperator UIGLF AriExpr UIGLF UnaryExpr GLF RelExpr GLF RelExpr GLF RelExpr GLF RelExpr GLF RelExpr GLF RelExpr UIGLF RelExpr 63 A ppendices module GLF C onditions context free syntax GLF RelExpr GLF LogExpr and GLF LogExpr GLF LogExpr gt GLF LogExpr NOE GLF LogExpr GLF LogExpr gt GLF LogExpr not GLE LogExpr GLF LogExpr GLF LogExpr GLF Condition UIGLF LogExpr UIGLF Condition UIGLF RelExpr UIGLF LogExpr not NN ULGLE LOgExpr y UIGLF LogExpr UIGLF LogExpr and UIGLF LogExpr UIGLF LogExpr UIGLF LogExpr or UIGLF LogExpr UIGLF LogExpr UIGLF LogExpr UIGLF LogExpr Appendix B Interesting parts of the GLF Toolbox rewriters module R emoveO perators equations 2 5 Romeove all Ors amp RO1 RemOr GLF RelExpr GLF RelExpr A not around an or continue RO2 RemOr not or GLF LogExprl GLF LogExpr2 not RemOr or GLF LogExprl GLF LogExpr2 A not around an and continue RO3 RemOr not and GLF LogExpr
27. M ining system or any other system for that matter whenever that isfound beneficial E xternal systems need only minor adjustments and can function prop erly without the use of the TRS as well Because of practical problems that where beyond my reach have not been able to do benchmarks on the system as it was implemented at TriL oc For statistics on the perfor mance of the M eta Environment under U nix please refer to BK O 99 The rewriting system has been tested on a real life system from a D utch government agency and it worked correctly without unacceptable performance problems T he system had several thousand lines of code containing several hundred conditional expressions that were rewritten to Clean CO BOL GLF and finally UIGLF in CN F without negations It ismy personal opinion that the system can be commercially competitive and can be applied in many other areas where rewriting is done within the Logic M ining system or other systems Although the technical and theoretical issues that we have described in this thesis are important we can not forget the practical side have had a great time working at TriL oc and staying in touch with theC WI and the U vA Thisshows that commercial and scien tific flavors can blend together in a successful project 57 Logical Expressions A nalyzing Generalizing R ewriting 58 R eferences A Ito92 BD K M 96 Boe99 BH K 89 BJK O 99 BK 98 BK M 97
28. Thisassures usthat every bit of codein G LF ispreceded by atag describing its mean ing Functions and Priorities in GLF To keep things compact and simple wehave chosen to usefunctionswith a maximal arity of 2 Because of thisand of theprefix notation wedo not haveto worry about associativity Thearithmetical expression 1 2 3 can beinterpreted as 1 2 3 or 1 2 3 wheress mul mul i 1 i 2 i 3 can only beinterpreted as 1 2 3 Prioritiesarealso implicitly added and bracketsareobsolete Theexpression 1 2 3 canonly bewrittenas mul add i 1 i 2 i 3 and 1 2 3 leadsto add i 1 mul i 2 i 3 IntheC 0 BOL grammar wedefined our priorities by means of types W ithGLF we cannot do that Sincetheprefix notation allowseach operator to haveoperands of any priority withouttheuseofbrackets W ework with arithmeticexpr essions only _ rather than with addition expressions multiplication expressions and power expressions ForinstancetheG L F multiplication can havetwo arithmetic arguments without any constraints on the operation that isin effect on its arguments A rithmetic operations form expressions of thesamety peand they can all beoperandsfor other arithmetic operators add GLF AriExpr GLF AriExpr GLF AriExpr sub GLF AriExpr GLF AriExpr GLF AriExpr mul GLF AriExpr GLF AriExpr GLF AriExpr div GLF AriExpr
29. ages 16 1 97 107 1991 Cobol Language reference on the internet at IBM http WW W s390 ibm com bookmgr cgi bookmgr cmd BO O K S IG Y LR 102 59 Logical Expressions Analyzing G eneralizing R ewriting DK V 99 Ebb90 E ck 98 Em98 H K R89 JN 199 J O 99 K li92 KN 97 K R88 L S86 RA 92 SAV91 SM 97 SV 98 Vis97 Vis97 2 Wirth71 CONTENTS A van D eursen P K lint and C Verhoef Research issues in the renova tion of legacy systems In J P Finance editor Fundamental A pproaches to Software Engineering LN C S Springer Verslag 1999 Available at http adam wins uva nl x etaps etaps99 html W B C Ebbinkhuijsen CO BOL vijfde druk ISBN 90 14 04560 3 B Eckel Thinking in Java Downloadable from http www bruceeckel com M ore technical descriptions can be found at http java sun com Emendo Software G roup theN etherlands Emendo Y 2K White paper 1998 Available at http www emendo com J H eering P R H H endriks P K lint and J R ekers The syntax definition formalism SD F Reference manual SIGPLAN N otices 24 11 43 75 1989 A Iso available at http www cwi nl gipe C ode and documentation regarding the Java N ative Interface can be found at http java sun com products jdk 1 1 docs guide jni H A deJong and P A Olivier ATerm U ser M anual 1999 P Klint A meta environment for generating programming environ ments ACM Transactions on S
30. al expressions Conditional expr essionsar eimpor tantwhenanalyzingthecontr ol flow whichisof interest when tryingto understand thebehavior of aprogram A ssoon asthereisany form of interaction with usersor datayou will find conditional expressionsthat act accordingto thevaluesof logical expressions A n exampleof abasic conditional expres Sion If Logical Expression then Statement 1 else Statement 2 endif In our casealogical expression consistsof oneor morerelational expressions separated by logical operators F or instance SALDO 0 AND PINCODE 3675 isavalid logical expression sinceit consists of therelational expression SALDO gt lt 6 relational expression PINCODE 3675 and thelogical operator Anp Clear and readablelogical expressions areof greatimportancefor softwareanalysts Logical expressionscan bewritten in variousways T hey can berewritten to other nota tionsto bemorereadable easier to analyzeor easier to classify without losing their infor mation and their behavior 11 A General Model for Logical Expressions The Problem too Many Different N otations Although they mostly stick to rather simple propositional logic many discrepancies betweenlogicalexpr essionsin variouspr ogramminglanguagesexist n Pascal W irth71 SAV91 wecanforinstancewrite a B which yields TRU E when the value of A equalsthevalueof B whereasin C KR88 a s meansthatthevalueof B is assigned to thevariableA T
31. and easy to understand system This might seem an option from a nonprofessional point of view but when looking closer at the state of affairs within the software crisis weseethatthis is not realistic First of all these systems are big not to say huge It will cost more then just a little money it will require enormous amounts of money time and manpower It means that theinvestment in the current system which could easily have reached billions of dollars is practically thrown away Some companies like banks might be able to finance at least the first part of the project however the time and manpower will often not be available A second thing that is overlooked by the start again from scratch option is the fact that virtually all company knowledgeis hidden within thesource code of theold system This means that the new system must support all features of the old one including the ones that were hidden U nfortunately there often is no other source of information such as reliable documentation describing the behavior of the old system which makes are implementation impossible Theonly option seemsto beto keep the current sources and to retrieve as much infor mation from them as possible Logic Mining TheLogicM iningtechniqueisinitiated developed and proclaimed by TriL oc Software Engineering This company proposes a new revolutionary approach on software reno vation T heir strategy is build around theidea of extracting mining
32. arated but with thesamestructureand in themiddlewe seethetranslation modules tisillustrated by Figure5 2 1 CUB Logical GLFA n G 5 8 Git Relizionali ic COG Acichraatikce GLI Aritboeatice Figure 5 2 1 Sincewear eworkingwith Clean CO BOL thetranslatingpr ocessispr ettystraightfor ward H ereyou seehow asubtraction istranslated translatetheleft operand translate therest and subtract it 26 AE2 COB2GLF MulExpr COB MulExprl COB2GLF AddExpr COB MulExpr2 COB AddSubExprList COB2GLF AddExpr COB MulExprl COB MulExpr2 COB AddSubExprList sub GLF AriExprl GLF AriExpr2 GLF AriExprl GLF AriExpr2 N otethedifferencebetween the C O BOL and theG LF sorts WhereC 0 BOL distin guishes between several arithmetic expressions such as M ultiply E xpressionsand A d dition E xpr essions G LF only workswithonesor t theArithmetical Expr essions Fur thermoretheC O BO L expression workswith two operands an operator and therest of theexpression whereasG L F justhasan operator on two A rithmetical expressions con tainingtherest Logical expr essionsar ehandled similarly Attheground levd of thetranslator several typeconversionsareneeded T heseinclude theaddingof labelsor thetranslation of typical C O BO L languageconstructsin amore genera way Thesetranslationsar enotexplicitly mentioned intheabovesincethey ar e of merely technical use 5 3 TheGLF Toolbox N ow that wehaveG LF wecan manipulate
33. bewritten without brackets A B C D A negation doesnot need extra brackets not A is written asnot A 29 A General Model for Logical Expressions e Withinrelational expr essionswedo not add extra brackets A B OR C will bewritten as A gt B OR C So when wetranslatean operator such astheaddition wetranslatetheoperandswith bracketsexcept when thefirst operator isan operator with thesamepriority astheaddi tion UIGLF AriExprib add such as subtraction or addition itself UI AE1 UIGLF AriExpr add GLF AriExprl GLF AriExpr2 UIGLF AriExpr b add GLF AriExprl T UIGLF AriExpr b add GLF AriExpr2 Thefunction that placesthebrackets UI GLF AriExpr b add isimplemented to con tinue without placing brackets when itencountersan addition or substraction and to continuewhileplacingbrackets UI GLF AriExpr b inall other cases With addition proceed without brackets UI AEb addl UIGLF AriExpr b add add GLF AriExprl GLF AriExpr2 UIGLF AriExpr add GLF AriExprl GLF AriExpr2 With subtraction proceed without brackets UI AEb addl UIGLF AriExpr b add sub GLF AriExprl GLF AriExpr2 UIGLF AriExpr sub GLF AriExprl GLF AriExpr2 In all other cases just add brackets default UIGLF AriExpr b add GLF AriExpr UIGLF AriExpr b GLF AriExpr The urarrF ariExpr b function just adds brackets around any expression Withlogicalexpr essionthesamer ulesapply exceptin thiscaseadding brackets
34. cations Performance however can become a problem with this implementation In the exter nal system the input is produced in some way and has probably already been parsed N ow it needs to be translated back to text which is being parsed again by our compo nent rewritten and converted back to text A sFix2Text to be parsed for the third time by the external system Interface 2 The Direct Interface The direct interface leaves the external system with the task of creating A sF ix terms This means that the system must have access to the ATerm library and preferably the AsFix library as well and must have some knowledge about the signature of the ASF SD F specification which must be embedded in the parsed input term The TRS just rewrites these terms and returns them in AsFix format The ATerms can be trans ferred in the compact binary format Figure 8 2 4 ATerm java gt External AsFix System trm notion of i i thesignatue AsFix TRS tof theASFASDF x Swim component Specification Figure 8 2 4 The parsing is done in the external system which means it can be dedicated and opti mized for the given TRS N o additional translations from parsed terms to text or vice versa is needed and with appropriate implementation some time is saved O f coursethere can be no independence between the external system and the TRSsince parsing has to be done according to the signature of the TR S The external system ha
35. ceeBus rewriting system TRS as a separate part within our component so its re writing capabilities can easily be changed without having to modify all the P url compte other technical parts of the system pem In the Meta Environment The M eta Environment exists of a large collection of tools that communi the components are cate with each other using the Toolbus BK 98 The tools are independent connected by the Toolbus executable applications Since we want to embed their functionality in an external system they should betranslated into libraries This changein design means we have to replace the Toolbus with a new tool that explicitly handles the communication between the other tools Platform and Language differences W hen wetry to embed a scientific system likethe M eta Environment in a commer cial system we run into the differences between scientific and commercial computer platforms The M eta Environment is developed on the U nix platform i e Sun So laris Linux A ppleM ac O S X Compaq Tru64 etc while in commercial cor porations the most widely used development computer platform is Wintel WindowsonlIntel processor based machines F ortunately in theU nix world i it is very common to distribute scientific applications and tools with their I ARTS sourcecode and makefiles This makes it more important to U nix pro Sun s Solaris isthe Qrammers to program platform independent at least U n
36. chnical issues as shown by the Operating solvethe problems System Sucks Rules O Meter which expresses opinions found on websites on the internet 9 Binary IO n W hen working with binary IO weran into a problem the compiler did not warn us for Within U nix there does not seem to be areal difference between text and binary filelO When using Windows it is very important to tell the system you are doing your IO in binary mode If you forget to do so strangethings will happen We for instance were confronted with the fact that the hexadecimal pattern 1A would simply make the pro gram crash as soon as it was encountered during the 0 When this pattern was read from a file that was open for non binary reading it was interpreted as an end of file causing the system to stop reading O f coursethis led to a variety of errors that where hard to trace TheATerm library was doing its binary IO correctly when it came to files but when stdin and stdout were used as files the library forgot to actually set them in the binary mode Within U nix this does not result in problems so we were thefirst to interpret this as an error Along with the ATerm library developers we decided to fix this problem within the binary file IO function of the ATerm library with the windows specific _setmode statement ifdef WIN32 if setmode fileno file O BINARY 1 perror Warning Cannot set file to binary mode fendif By solving this p
37. ded that on an error the T RS component just returns an empty string and that the ava adapter can check afterwards whether an error has occurred or not If one has occurred it can then explicitly get the error message and throw an exception if that is desired This way weonly call C functionality from Java and not the other way around meaning that the external system really is an independent tool IDs 01 with no links to the external system In Java the code looks likethis public String RewriteTerm String input str throws TrsException String output str DoItInC input str if CErrorTest 0 return output str else throw new TrsException GetCErrorMsg I call to aC function always returns a valid string 8 15 method which 6616126 even on an error Thecerrortest method checks whether an error has occurred and if so the cetcerrormsg method gets the error message that can then be thrown as a J ava exception This way the external system does not even know the TRS component is written in a different language 52 1 a Concluding R emarks after Part Adding it all up somethingscan besaid about our and other approaches on embedding ASF SDF functionality in commercial environments E Start We will also be describing a few practical technical issues After this porting and embedding process one should be able to start and run a TRS under Windows Sy 10 1 Practical Issues
38. ean important source to describe the processes within a corporation U nfortunately there is very little knowledge about these software sys tems which have become legacy systems over time therefore the adjustment of them can be atime consuming and painful process for software analysts Special task forces of analysts are full time operational to maintain these systems To help these analysts with their analysis and renovation specific softw are renovation systems have been developed Some platforms for software renovation factories areT L CH P91 CO SM O S E m98 and REFINE RA 92 The lack of knowledge about the legacy systems does not only cause large technical problems such as the Y 2K problem but it also enlarges the time to market In a fast moving market companies need to be able to react quickly to changes with unmanage able legacy systems it takes too much time it is too expensive and it involves taking too many risks Competitiveness means flexibility and speed when it comes to new services and products The situation referred to as the software crisis started when people experienced that correct software could not beengineered likebridges and buildings first designing it on paper and then creating it without any errors uncertainty about what is going to happen when the Logical Expressions Analyzing 6 eneralizing R ewriting What to do So why not invest a little money and start building a new clean modular
39. ed to be processed further i e a not is allways around a GLF RelExpr da9 distribute ands not GLF RelExpr not GLF RelExpr da10 distribute ands GLF RelExpr GLF RelExpr A ppendix C Interesting parts of the GLF to UIGLF translator module GL F AriE xpressions equations UIGLF AriExpr amp UIGLF AE1 UIGLF AriExpr add GLF AriExprl GLF AriExpr2 UIGLF AriExpr b add GLF AriExprl UIGLF AriExpr b add GLF AriExpr2 UIGLF AE2 UIGLF AriExpr sub GLF AriExprl GLF AriExpr2 UIGLF AriExpr b sub GLF AriExprl UIGLF AriExpr b sub GLF AriExpr2 UIGLF AE3 UIGLF AriExpr mul GLF AriExprl GLF AriExpr2 UIGLF AriExpr b mul GLF AriExprl UIGLF AriExpr b mul GLF AriExpr2 UIGLF AE4 UIGLF AriExpr div GLF AriExprl GLF AriExpr2 UIGLF AriExpr b div GLF AriExprl UIGLF AriExpr b div GLF AriExpr2 UIGLF AES5 UIGLF AriExpr pow GLF AriExprl GLF AriExpr2 UIGLF AriExpr b GLF AriExprl UIGLF AriExpr b GLF AriExpr2 UIGLF AE6 UIGLF UnaryExpr GLF AriExpr UIGLF UnaryExpr UIGLF AriExpr GLF AriExpr UIGLF UnaryExpr UIGLF AriExpr b amp no brackets around UnaryExpr UIGLF AEb1 UIGLF UnaryExpr GLF AriExpr UIGLF UnaryExpr UIGLF AriExpr b GLF AriExpr UIGLF UnaryExpr default UIGLF AEb UIGLF AriExpr b GLF AriExpr UIGLF AriExpr GLF AriExpr UIGLF AriExpr b add
40. einformation weneed theoperand A and theoperator lt is higher in thetreeup to thelevel ofthe L ogical A N D Expression 17 A General Model for Logical Expressions 1 3 2 Clean COBOL O ur first ter m rewriting system translated CO BOL dir ectly into G LF Although it worked correctly wedecided to take another approach sincethespecification had be cometoo complex to analyze verify and adjust M ostother programming languagesdo not allow abbreviated conditionsw hich makes the process of translating to G LF lesscomplex Fortunately all abbreviated conditions can be disabbreviated to theirnon abbr eviated counterpart To makethespecification of our rewriter moremodular and easier to understand wedecided to splittheC O BO L toG LF translation processin two phases We will first disabbreviate CO BOL to Clean CO BOL then we willtranslate C lean COBOL to theG eneralized L ogical Format Clean CO BOL doesnot allow abbrevi ated expressions negationswithin relational expressions redundant double N O T operatorsor English synonyms for relational operators C lean CO BOL is 10096 pure COBOL butwritten down in asimilar way modern languagesarewritten down Within Clean CO BOL wework with simpler elationalexpr essionsconsistingof two arithmetic expressions consisting of identifiers constant numbersor mathematical ex pressions with abinary relational operator such as gt or lt O nahigher level wework with U nary Logical Ex
41. em very inefficient Interfacing our TRS with the existing system As mentioned above we have several choices regarding the functionality of our TRS Wecan choose to leave this tool as small as possible thereby requiring the external sys tem to translate the in and output to and from AsFix We can also choose to integrate the translations within the TRS and hide the A sFix notation from the external system We will discuss three possible interfaces in some detail Interface 1 The Indirect Interface Theindirect interface leaves the external system unaware of theA sFix A textual repre sentation of aterm like true false can be fed to the TRS and it will produce the rewritten term in a textual representation like true as output Figure 8 2 2 true false gt true Figure 8 2 2 With this interface the TRS component consists of 4 subcomponents an adapter to handle communication and flow of control an SG L R parser subcomponent an A SF ix to text converter subcomponent and of course a TRS subcomponent Figure 8 2 3 External System A dapter AsFix2Text converter Text rw trm Figure 8 2 3 A benefit of this implementation is theindependence between the component and the external sy stem Terms are being rewritten and there is no need for further interaction or tuning between both systems A totally different TRS component can be linked to the system without any necessary modifi
42. equentstyle COBOL Logical Expressions COBOL COB doesnot put many restrictions on programmers to writeclear code L ogical expressions can be written in numerous ways which can bequiteconfusing With theE nglish languagein theback of their mindsthedesignersof CO BO L allowed theprogrammer to writeasimplerdational expression in which two operandsarecom pared in many differentways A B LS A A IS EQUAL B A IS EQUAL TO B O f course notationslikethesecan easily betranslated to onepreferred notation this can bedonein apreprocessing phasethereby reducing themany alternativesin produc tionrulesofthecontext fr eegrammar Inther emainder of thisthesisweshall only be usingthe mathematical notation A B rather than the E nglish notation Another somewhat special notation can exist becauseof thefact that CO BO L allows theembedding of thelogical operator N O T thenegation within relational expres sions 15 A General Model for Logical Expressions Thismeansthat all thefollowing expressionshavethesamemeaning A lt gt 5 A NOT NOT A B B this must be read like NOT A B N otethatthefirstexpression isarelational expression thelastisalogical expression negated relational expression andthemiddleoneisalogical expression wherethelogi cal operator isplaced within therelational expression Thebiggestpr oblem with CO BOL conditions however aretheabbr eviated condi tions T hisallowsprogrammersto leaveconsecutive
43. er to the ATerm library A n error message will be passed to this handler with optional arguments void ATsetErrorHandler void handler const char format va list args W hen we encounter an error it is important to stop the current processes to prevent more errors or even crashes from occurring We must return to a safe state in which we can report an error to the external system In Java atry catch statement can be used to this In C we must do it by hand with the functions setjmp and longjmp 51 Restart Sorry a serous system error occurred Resume Embedding a Formal Specification in a Commercial Environment if setjmp env 0 do something if something goes wrong call longjmp env 1 else 1 Longjmp has been called an error has occurred W hen we encounter the call to setjmp for the first time it returns zero and we continue with the code Within our custom error handler that is called every timean error occurs we have a call to 10ngjmp and when that is called the stack pointer is changed to the value it had when set jmp was called i e we travel back in time to the set jmp call This time however set jmp returns 1 and instead of performing the statements that caused the error again we proceed with the else block Throwing errors from C to Javais possible with JN I butin our case wedid not find it preferable We wanted to have as little Java C communication as possible We deci
44. esamety peor they haveto beof atypeof an arithmeticoperation with ahigher priority W ewill illustratethis by taking alook atthearithmetical expr essionsin CO BOL Forinstanceamultiplication can take other multiplications as operands it can takean involution asan operand but it can nottakean addition asan operand CB PowerOfExpr CB MulSubExpr CB MulExpr CB UnaryExpr CB PowerOfSubExpr CB PowerOfExpr MA CB PowerOfExpr CB MulSubExpr 4 CB PowerOfExpr CB MulSubExpr wen CB UnaryExpr CB PowerOfSubExpr A number or an identifier hasthehighest priority and asubstraction hasthelowest priority Rulessimilar to thisshould apply for relational and logical operatorsaswell U nfortu nately CO BO L sabbr eviated expr essionscausepr oblems becausetheparsetr eesfor theseabbreviated expressionscan not beevaluated or translated becausethey do not contain all theneeded information C onsider theC O BO L expression A gt B AND c When wewould beusingour priority forcingtypeswewould createtheparsetreeshown in Figure3 1 1 Logica AND Expression A BANDC Relational Logical Relational Expression O perator Expression A B AND nic Identifier Relation Identifier Identifier Relation Identifier O perator O perator A lt B 6 Figure 3 1 1 TherightR elational Expression subtreedoesnot contain all theinformation for trans lation into A C Th
45. esented in A sFix to AN SI C code Sincewewant to do our rewriting without an interactive environment and on a different platform we prefer to useC code that can becompiled to alibrary and used by our external systems to represent our TR S rather than the A SF SD F specification itself So without the com piler integration to the level we prefer would surely be impossible M ore information on the A SF SDF to C compiler can be found in BK V 98 The SGLR Parser Another important component in the A SF SDF M eta Environment is the parser Before we can rewrite our input it needs to be recognized and identified that is where a parser comes in The parser should support all kinds of grammars that can be specified in the M eta Environment with SDF It needs to be language independent and capable of parsing SD F specifications The parser that is used is the Scannerless G eneralized L eft to right Rightmo lt derivation parser or just SG LR parser Vis97 n many cases scanners are used to dividethe input into lexical tokens specified by thelexical syntax theregular grammar which arein turn parsed by aparser Dr Eelco Visser who returnsatree according to the context freesyntax A downsidetothis Expert on sglr parsing approach isthefact that no knowledge of the parsing context is availableto the scanner meaning that the context of a token can not be used to disambiguate its meaning The S in SGLR standsfor Scannerless this means amon
46. ewrite a term under Windows we have a memory leak that occurs when the term is parsed and the parse table is looked up in the parse table database This bug is known both at theM eta Environment development team as well as TriL oc but it has not been precisely corrected or localized yet 10 2 Conclusions Embedding the functionality of a scientific system like the A SF SD F M eta Environ ment in a commercial environment is possible To overcome system platform and language differences various choices have been made Some parts need to be rewritten some parts have to be grouped together to acquire a system which is as safe as possible without losing the necessary structure We have described the integration using a prototype of the M eta Environment since it is still in a development stage during the process we have given feedback to the devel opers which they have used to adjust the system to make future portings and projects likethis much easier HE 10 3 Future Work n the future when there is need for several term rewriting systems to work with an external system dynamically linking the TRS subcomponent can be very beneficial In the current situation the T R S subcomponent is fixed to the system at compiletime while for instance the parse table can be chosen at run time Future work might also includethe porting of the Toolbus and the M eta Environment itself but we suggest to wait with this until the development of the lat
47. ewrite method with paramters public String rewriteTerm String p func int p id String p cond String p ptablIERSOption p option throws TRSException Jf get input string write it in the right format for the rewriter String 1 buildInputString p func p id p cond p option get output string String l ostr _rewriteTerm l_istr p_ptable compress output string bf remove redundant spaces returns etc introduced by the TRS IL LOSE compressString l ostr if errorTest 0 no error has occured return 1 ostr throw new TRSException _getErrorMsg P Method that builds the input string for the TRS from the function name the id and the condition public static String buildInputString String p func int paid String p_comR Option p_option return p func pid p_cond p_option getTextRepr W Method that compresses the string parameter private String _compressString String p str StringBuffer l_result new StringBuffer for int qty 0 1 len p_str length Ia lt l len I E char desc p str charAt 1l i boolean l space false while l i lt l len amp amp 1c LI lc xp I lc Na lc NA BT qd LE 6 str charAt 1 i l space true if l space true l result append Nes l result append 1 c return l result toString trim 73 A ppendices 7A
48. g other things that the lexi cal and context freesyntax areboth handled by asinglecontext free analysis phasethereby providing the ability to takethe context of atoken into account during the parsing pro cess For further disambiguation reject rules and context free restrictions can be used providing ways to for instance force prefer longest match and prefer keywords over tokens The G istaken from Generalized G eneralized L R parsing Vis97 2 basically comes down to the fact that when an ambiguity is found the parser produces all possible parse trees of the input resulting in a so called parse forest In practice we found the forests very helpful while disambiguating our grammars it proved to be anice debugging facil ity The L stands for left to right scanning of input And finally the R stands for rightmost derivation in reverse The consequences of these properties are beyond the scope of this thesis Creating a Term Rewriting System using the M eta Environment Theporting process reaches beyond the need to just compilethe code provided by the compiler on a different platform Translations are needed as well as choices regarding design To clarify our needs we will now describe how a stand alone rewriting system that is operational under U nix is created with the M eta Environment The process is illus trated by Figure 8 1 1 42 AsFix Support ed gt i 0 0 M eta C source E
49. haveconjunctions of dis junctions wework with disjuncted conjunctionsin theD N F Thetransfor mation is very similar to theC N F transformation wewill not describeit here R emoval of operators W hen testing thesystem wefound it useful to add afew other operationsto theG L F Toolbox Theseoperationsr emovecer tainoperatorsfr omlogical expr essions TheG LF Toolbox canr emovenegations r emovetheminwar dsandinver tther elational opera tor disjunctions and conjunctions Disjunctionsareremoved using thefollowingrule D eM organ or A 8 not and not A not B Conjunctions can beremoved aswell D eM organ and A B 2not or not A not B 5 4 GLF toUIGLF Themostinteresting thing abouttheG LF to U ser I nterfaceG LF translator isits bracket policy A sdescribed befor eitintr oducesbrackets when operators haveahigher priority and whenthey arenecessary to forcepriorities Bracketsareleft out when thesame binar y operator isused afew times successively Sincein that caseit does not contribute tother eadability In GLF itisnotvery hard to seewhich sub expressions must be resolved first may need brackets W henever weencounter any kind of operator with two operandswe know thosetwo operandshaveahigher priority to beresolved and thus may beparen thesized Too many par enthesesdonotimpr over eadability so wedo apply afew extrar ules Whenanoperator of thesamepriority isencounter ed don tusebrackets A B C D can simply
50. he registers in a data structure which is automatically placed on the stack The stack can then be checked from top to bottom U nfortunately this functionality is not provided under Windows We need our own way to check the registers O n the internet we found the Boehm garbage collector Boe99 which is availablefor both U nix and Win dows from which we concluded that the only way to solve our problem was to use assembler T he assembler code copies the values of the registers into Hans Boehm local variables which are explicitly being fed to the garbage collector Ilus Aa trated by the following code this is not the exact code it merely serves as an collector extra explanation asm Get the registers into local variables mov r eax eax to check them for ATerms later mov r ebx ebx mov r ecx ecx reg 0 ATerm r_eax Put the register values into an array reg 1 ATerm r_ebx reg 2 ATerm r ecx First traverse the reg array to count the nr of ATerms that were in registers for i 0 i NR OF REGISTERS i if AT isValidTerm reg il AT markTerm reg il if AT isValidSymbol Symbol reg i AT_markSymbol Symbol reg i E 9 2 Language D ifferences The commercial environment in which we have to adapt our system is written in Java To make communication possible between the two systems an interface between C and Javais needed T his interfaceisfar from trivial sinceJava and
51. his expression yieldsthevaluestored in B which can beeither zero regar ded FALSE or non zero TRU E Thecorrect C equivalent of the Pascal state mentis A B Somelanguagesallow notationsthat areintuitivey hard to understand F or instancein COBOL CO B Ebb90 onecan writea 8 or cwhich can also bewritten in aclearer Way A B OR A C morecomplexexampleiS A equals B OR C OR NOT D AND is greater than E or F Whichisequivalentto A B OR A C OR not A D AND A gt E OR A F Analyzingandrewritingall these different notations isacomplex task For human analystsit ishard to interpretthoselargeC O BO L expres sionswithout making mistakes F or computer programmers asystem that can rewrite logical expressionsfor all theselanguages can becometoo largeto create verify and manage TheSolution a U niversal N otation Althoughthenotationsdif fer mostlogical expr essionsar ebasedonthesamepr oposi tional logic T hesolution wepresent isto translatethevarious notations of logical ex pressionsof alanguage or morelanguages into ageneralized form T hisnotation will beunambiguous compact languageindependent and easy to interpr etby acomputer Wecallthisnotationthe G eneralized L ogical Format orjust G LF Expressionswritten inGLF canthen betransformed into normal forms amore human readableform or perhapseven back to a different programming language The Approach Step by Step Theprocess wewill describeinthisthesis
52. iption of its behavior by focussing on some of its source code Along with someerror handling functions the M eta A dapter contains an initialization function in which the parsetableisloaded parsed and a rewrite function that starts the actual rewriting process Initialize TRS Theinitialization function gets the location of the parse table and tries to load it When the parse table is parsed successfully it initializes the rewriter The parsing of the parse table is the most time consuming process within the rewriter so it is very beneficial for the efficiency to parse it only once 46 In pseudo C it looks like this Pseudo code of Metadap c the implementation of the indirect interface The code is not clean C it represents only a fraction of the real code but it is more readable leaving out unnecessary details F Ej void initialize trs char ptable AFinit initialize the ATerm library ou d read parsetable ptable init rewriter initialize the TRS subcomponent wf The initialization of the ATerm library through our whole component only occurs in this function Since initialize trs is called only once this means that all our ATerm op erationsthat areinitiated by therewriter parser or other subcomponents will operateon the same ATerm environment This ensures us of maximal sharing within our system to keep our memory usage as efficient as possible by the current ATerm library standards Re
53. isthetranslation of C O BO L logical expres sionsto G LF from which they can betranslated back to areadablerepresentation T he benefit of translating different languagesto G LF first givesustheability to useonly one rewriter for thenext rewriteoperations T hisnotonly saves us alot of development work but it also providesuswith avery flexiblesystem as shown by Figure2 1 1 Language A Language A n GLF GLF Toolbox gt Readable Language B T to GLF gt Language B gt Figure 2 1 1 Figure2 2 1 showsthat only onepart of thesystem is input languagedependent T he other transfor mationssimply work on GLF BeforewetranslateGLF tor eadableex pressions which wewill later call UIGLF andwecan perform additional GL F trans formationson them T hosetransformationscan beadjusted extended or deleted with outhavinginfluenceontheoriginal input programming languages W hen asystem likethishas been developed for language A wesimply haveto make onerewriter that translates language B to G LF and all theexistingG L F operationscan bereused for thislanguageas well If atranslator to GL F for agiven languageexistthe rewritingson GLF canberegarded language parameterized or generic 12 G LF can also bevaluableto codeanalystssinceit clarifiesthe meaning of thelogical expression Terminology Weidentify thr eetypesof rewritersin our pr ocess When working with CO BOL we will seethat disabbreviation isneeded Afterthat a translation
54. ition and equations of a lan guage it can parse rewrite and pretty print terms programs over that language It con sists of a variety of components together forming a programming environment With the development of the new A SF SD F_M eta Environment BK M 97 a number of features have been added that open opportunities for integrating its rewriting capa bilities in other systems Existing software renovation systems can thereby take ad vantage of the A SF SD F specification formalism in some of their components without having to modify existing ones Still it is not just a matter of plugging in a rewriting system into an other system Problem D espitethefunctional benefits of formal specifications they are far from widely spread outside the scientific world There are few complete solutions like the M eta E nviron ment that providefunctionality for formal specifications and they are mostly developed 37 Embedding a Formal Specification in a Commercial Environment in scientific environments like universities rather than in commercial like corpora tions environments Several differences betw een a scientific and a commercial environ ment exist that have led to this unilateral development The development of such a system requires a lot of theoretical background and re search in order to give it its necessary firm basis It is a long term project that can take years to finish Companies often do not take the risk to in
55. itto makeit meet our demands A slongas westay withintheG LF grammar wecan perform all kindsof operations without modi fyingtheotherr ewriters W ehaveimplenentedtwonor malizersand afew other opera tions TheConjunctive N ormal Form CNF N ormal formscan bebeneficial for both humansaswell ascomputersin interpreting thevalueofalogicalexpr ession W ewill describethetransfor mingofGLF expr essions to their conjunctivenormal form or CNF A conjunctivenormal form is defined likethis jtistheconjunction and ingtogether of clauses eachdauseisadisjunction or ingtogether of blocks and e each block iseither an atomic expression or anegated atomic expression pre ceded by a NOT In our caseatomic expressionsarerelational expressions Translatingthisdefinitionto G LF wehavethefollowing components Anatomicexpression in our casethisisarelational expression such as eq A B meaningA equalsB A negated atomicexpression thisissimply anegated atomicrelational expression like not eq A B Disjunctionsof blocks Just apply thedisjunction to blocks for instance or disjunctions of blocks disjunctions_of_blocks Conjunctions of clauses i e and conjunctions of clauses conjunctions of clauses So if A B and C ar evalid atomics and A or A or B not C A AND A and A or A not or B C A AND A isin CNF is not in CNF 27 A General Model for Logical Expressions W hen working with theoperators AN D O
56. ix platform inde development pendent and make relatively easy to compile distributions of their sources platform on which Windows T has got quite a bit of U nix or U nix like core so porting the 2 code seems at least possible developed Still lots of differences arise between the platforms Like it or not source code on a U nix machine compiled with compiler A can have different be havior from the same source code compiled with compiler B on a Wintel machine if it even compiles Functionality that is available on the U nix platform might not be available within Windows N T some functions have slightly different side effects while others have different preconditions and so forth W hen wehavethe ideal situation of a complete system under W intel we discover a new challenge language differences A Il the source code we need from the M eta Environment is written in AN SI C KR88 C isa very common programming language that is used in alot of commercial environments H owever larger systems like the one in which we like to adopt our rewriting system often are written in more than T i one programming language In our case the core of the external iii system is mostly written in Java E ck98 whereas the graphical user TriLoc to develop most of their interface has been made using Visual Basic SM 97 tools O ur goal will beto filter out any vital platform incompatibilities and develop an interface adapter between our component and
57. l GLF LogExpr2 not RemOr and GLF LogExprl GLF LogExpr2 A not around another not remove both RO4 RemOr not not GLF LogExpr RemOr GLF LogExpr We also remove inner Not s RO5 RemOr not GLF RelExpr RemNot not GLF RelExpr RO6 RemOr and GLF LogExprl GLF LogExpr2 and RemOr GLF LogExprl RemOr GLF LogExpr2 Here he comes p or q lt not not p and not q Notice how the RemOr s are around the newly introduced nots in order to remove possibly added double nots RO7 RemOr GLF LogExprl GLF LogExpr2 not and RemOr not GLF LogExprl RemOr not GLF LogExpr2 Remove all nots amp RN1 RemNot GLF RelExpr GLF RelExpr RN2 RemNot not and GLF LogExprl GLF LogExpr2 or RemNot not GLF LogExprl RemNot not GLF LogExpr2 RN3 RemNot not or GLF LogExprl GLF LogExpr2 and RemNot not GLF LogExprl RemNot not GLF LogExpr2 RN4 RemNot not not GLF LogExpr RemNot GLF LogExpr RN5 RemNot and GLF LogExprl GLF LogExpr2 and RemNot GLF LogExprl RemNot GLF LogExpr2 RN6 RemNot or GLF LogExprl GLF LogExpr2 or RemNot GLF LogExprl RemNot GLF LogExpr2 Invert operators RN7 1 RemNot not St GLF AriExprl GLF AriExpr2 gte GLF AriExprl GLF AriExpr2 RN7 2 RemNot not gt GLF AriExprl GLF AriExpr2 lte GLF AriExprl GLF AriExpr2 RN7 3 RemNot not ste GLF AriEx
58. lexander van den Bergh Afstudeer docent prof dr Paul K lint Begdeiders dr M ark van den Brand dr Gert Veltink A pril September 1999 1 1 The Software Crisis 1 Introduction This chapter serves as a general introduction to the remainder of this thesis We will try to put thefollowing partsinto context by describing the need for this research D uring the past decade software renovation has become an important field in computer science D ueto large problems like Y 2K and the new Eu sw RT A mer ropean currency the Euro billions of dollars have d Mol Although one could argue that the scenario presented in rr vex t been and will beinvested in projects to analyzeand this cartoon is a little pessimistic There still is a lot of renovate existing systems Year 2000 Bug is going to hit our planet Legacy Systems The development of larger software systems started decades ago their source code grew over the years and because of that they have become an unmanageable chaos of code from a mixture of different dialects of programming languages T hese systems are referred to as legacy systems L egacy systems and their complexity have presented us with the undesirable situation of extremely hard to manage software systems which has resulted in a new Software Crisis Currently software has becomethe backbone of most companies Since almost every thing is computerized it has becom
59. llustrate how the TRS is connected to other sys tems and to show how it can be used FileTRS java package COM triloc javatrs Class TRS public class TRS public static final String public static final String COBOL_PARSE_TABLEc batch tbl TEST FUNCTZAONCB2UIGLF Condition publio static final String public static final int public static final String static JAVATRS DiL TEST ID 0 c iNNjavatrs TEST CONDIZIONECORD FIELD System loadLibrary JAVATRS DLL Native methods 5 actually implemented Ep in C private static native _iS amp wrinoe Term String private static native XonadParseTable String private static native emtrorTest private static native de t EmgorMsg ER Default constructor E7 public TRS Load a Parse Table ap public void loadParseTable String throws TRSException FileNotFoundException File l file new File p_ptable FileReader l filereader l filereader close Check whether an error while loading the xf if _errorTest 0 If an error has and throw it in loadParseTable p ptable has occured parse table occured get the an throw new TRSException 72 exception _getErrorMsg tiny p_ptable in native message di STRING String p_ptable IOException new FileReader 1 file code p_ptable R
60. mal sharing only new terms are created if aterm to be constructed already exists that term is reused and only a reference to that term is cre ated M emory management is doneby a built in mark sweep garbage collector Although alot of redundant terms are produced during the rewriting process the use of maximal sharing gives the ATerm library a sometimes spectacularly efficient memory usage as described in BK O 99 This is beneficial since ATerms can also be used as an internal data structure for data such as parse trees within running applications In order to make use of this feature the components have to share the same ATerm environment with the same stack the same garbage collector and the same memory manager To represent ASF SDF specifications and terms a special incarnation of ATerms is used the A SF SD F Fixed format or just AsFix ASFix terms can be regarded as parse trees of A SF SD F specifications They can be managed just like A Terms but are easier to use when working with A SF SD F An extra C library is available the support library to give support for creating and manipulating A sF ix terms Wewill mostly be working with A sFix but since every A sFix term is a valid ATerm as well they can be regarded ATerms 41 Embedding a Formal Specification in a Commercial Environment TheCompiler The compiler perhaps is the most important part of the system for our project It com piles the A SF SD F specification repr
61. means adding brackets except when anegation isencountered No we don t add brackets here UI LEb3 UIGLF LogExpr b not GLF LogExpr UIGLF LogExpr not GLF LogExpr In all other cases we do add brackets default UIGLF LogExpr b GLF LogExpr UIGLF LogExpr GLF LogExpr 1 5 5 Examples illustratetheuseof thesysten wepr esentsomeexamplesofr ewritings A Small expression Wewill beworking withthefollowinginputin CO BOL unr estricted A B OR NOT C AND D W hen disabbreviated to Clean C O BOL itlookslikethis noticehow the N O T doesnot propagatesinceit doesnot preceed an operator A gt B OR NOT A gt C AND A gt D 30 translated to G L F wege noticehow theconjunction bindsstronger or gt id vn A id vn B and not gt id vn A id vn C gt id vn A id vn D to UIGLF A gt B or not A gt C and A gt D to CNF A gt B or not A gt C and A gt B or A gt D to DN F noticethattheexpression wasalready written in DN F A gt B or not A gt C and A gt D to DN F without negations A gt B or A lt C and A gt D A large expression Theexpression in unrestricted CO BO L A gt B OR lt D AND NOT C AND B DOR BNOT gt E After acleanup wegg A gt B OR A gt AND NOT A gt C AND B OR NOT B gt E Itcan betranslated to thefollowingG L F expression pret
62. n wetraversealogical expression from left to right we alwaystakewith usthelastleft relational expression operand and thelast relational expression operator weencountered E very timeweencounter arelational expression with amissing operand or operator wefill in thegap Weaddr essthisissuewith thefunctionsthattakethr eear guments apar toftheexpr es sion that hasto becleaned thelast left operand weencountered and thelast operator we encounter ed CleanU p Expr ession_par t Last LeftO p L ast O p W ebr eak up theex pression into smaller piecesto eventually end up with relational expressionsthat haveto becleaned H ereareafew equations A complete relational expression no cleaning necessary Proceed with new arguments CURla CleanUpRel LeftOp Op RightOp Last LeftOp Last Op LeftOp Op RightOp LeftOp P Op No Left Operand but already an operator proceed with the new operator CUR2a CleanUpRel Op RightOp Last_LeftOp Last Op Last LeftOp Op RightOp Last LeftOp Op No Left Operand nor operator CUR3a CleanUpRel RightOp Last LeftOp Last Op Last LeftOp Last Op RightOp Last LeftOp Last Op N egations N egationsarehandled aswell they areof courseplaced beforethecondition CUR3b CleanUpRel ROp Last LOp NOT 285 Op NOT Last LOp Last Op ROp Last LOp NOT Last Op 25 AND 1 COBOL A KINO OF CADBAGI OR cae A General Model for Logical
63. nter Which choices haveto be made O ur goal will be to extract only the functionality we need from the M eta Environ ment to modify it in such a way that we can make a stand alone component of it and to embed it in an external system Lots of problems need to be investigated and solved in order to make the low level integration possible We encounter system design differences platform differences and language differences that have to be solved E 7 2 Environment D ifferences W hen trying to integrate the M eta Environment in a commercial environment prob lems on different levels arise O f course we will encounter various technical incompat ibilities but also on a higher level several issues regarding the design of both systems need to be addressed System D ifferences We could try to makethe whole environment part of the system but it is preferableto only usethefunctionality weneed and build this into atool or somekind of component that can be plugged into the existing system Such a component should be set up in a way that it is as independent as possible of both the renovation system and the M eta Environment It should be unpluggable at any time without messing up the other sys 38 tems We would like to have as little user interference as possible so it can Unparser o voee oe stay a clean black box with a clear input and output We would also prefer to leave the exact A SF SD F specification the term
64. nvironment us EM P sse en le AsFix AsFix2Text Text Tw em io wom The built in compiler of the M eta Environment translates our specification into ANSI C The obtained 6 source is fed to a C compiler along with the ATerm and A SF ix libraries as well as an extra support library which can then be compiled to an executable application the T RS executable Thisterm rewriting system executable cannot just rewrite strings of terms it re quires a parsed term in A sF ix format as input In order to obtain a parsed A sF ix term from our input string we need to parse it and embed the production rules of the input gathered from the signature of our ASF SDF specification into that term A textual representation of an input term can be parsed to an AsFix term by the SG L R parser given some parse table in ATerm format ThisAsFixterm can be used as input for our TRS executable which will produce arewritten term This rewritten AsFix term can be translated to text in various ways with some kind of A sFix2Text tool functionality is provided by the A sFix library Text SGLR AsFix trm parser trm gt TRS Executable Figure 8 1 1 With a design like this we can e g create an A SF SD F specification for the Booleans feed the term true false to SGLR rewrite it and receive true from our A sF ix2Text tool 8 2 A TRS component for WindowsNT N ow that we know the sequence of steps to rewri
65. o make the platform family complete must have gained at least ten pounds during the internship thanks to the very tasty and cheap lunches and the numerous birthday pies ate had the freedom to visit the university and the library whenever considered it useful After alittlemorethan four monthsthe practical sideof the project was completed We even went further than the actual plans As write this the ASF SDF technology is ported to Windows NT and practically ready to be used within the TriL oc system A very comprehensive C O BOL logical expression rewriter is implemented and tested on COBOL codefrom real life systems Acknowledgments O neof the main reasons that the project succeeded in such a small period of time is the tremendous feedback received from both the University of Amsterdam the N ational Re search nstitutefor M athematics and C omputer Science CW I and TriLoc Atall sidespeoplewerewillingto cooperate with the project Withouttheir feedback it would definitely haveturned into a time consuming frustrating project would like to thank a few of them by name First of all 1 would like to thank M ark van den Brand and zc G ert Veltink for their support on all domains G ert especially The Mats Environment support team Jeroen helped me get started and trace the hard to find bugs in the Tobias and Mark their ears still red from the ATerm library sources while M ark clarified numerous issues many phonecalls made
66. oftware E ngineering and M ethodology 2 2 176 201 1993 K N elte Formulas of First O rder Logic in Distributive N ormal Form D epartment of M athematics and A pplied M ethematics U niversity of CapeTown B Kernighan and D M Ritchie TheC programming language ISBN 90 6233 488 1 M E Lesk and E Schmodt LEX A lexical analyzer generator Bell Laboratories unix programmer s supplementary documents volume 1 PS1 edition 1986 Reasoning Systems Palo A Ito C alifornia R efine U ser s G uide 1992 W A Savitch Pascal third edition ISBN 0 8053 7450 7 E Stroo editor M icrosoft Corporation Visual Basic 5 0 Programmer s G uide M P A Sellink and C Verhoef An architecture for automated software maintainance Technical Report P9807 U niversity of Amsterdam Programming R esearch G roup 1998 Available at http adam wins uva nl x asm asm html E Visser Phd thesis Syntax D efinition for Language Prototyping ISBN 90 74795 75 7 Programming R esearch G roup U niversity of A msterdam E Visser Scanerless G eneralized L R Parsing Report P9707 A ugust 1997 Programming R esearch G roup U niversity of A msterdam N Wirth The design of a Pascal compiler Software Practice and E xperi ence 1 105 133 1971 A ppendices Appendix A Interesting parts of the SDF2 definitions of GLF and UIGLF module GL F AriE xpressions context free syntax Q LF AddExpr LF AddExpr Q Q LF MulExp
67. onment also describingitsA sFixto C compiler which wewill beneedingin the future Finally theM eta E nvironment is academic rather than commercial which makes itafreeproduct for universitiesand research institutes 13 A General Model for Logical Expressions 14 3 COBOL 3 1 COBOL Logical Expressions The main focus of our project will beto work on CO BOL logical expressions T his chapter explains why this language is especially eligible for our purposes Grace Murray Hopper a mathematician and pioneer in data pressing An exp ron Inthelate50 scomputer manufactur ersand userssaw the compilers and a key figerin the design of need for amoreuniversal way to program computers A group pee 3 of expertswasbroughttogether to develop aprogramming languagethat beeasy to write read and maintain ideal for writing businessapplications and becompatiblewith all computers T helanguage CO BOL Ebb90 which standsfor CO mmon BusinessO riented L anguage wasborn t indeed becamethe most commonly used programming languagefor dataprocessing which isoften donein business applica tions M any dif fer ent dialectshavebeen developed for CO BOL thr oughtheyearsN ow in thelate90 s many companiesar edealing with millionsof linesof old legacy CO BOL codewithin their software A nalysisand maintenanceof thiscodeishard sinceit often containscodewritten in outdated dialects with hardly any up to datedocumentation and without aclear and cons
68. operandsand or operators out of certain logical expressions T hiswill beillustrated by acoupleof examples Theabbreviated expression is quitereadable thevariableis being compared to aset of possiblevalues T heoperator and theoperand A areimplicitly applied to therest of theoperandsin theexpression A A B OR C OR B OR A CORA D D Inthefollowingexampletheoperand propagatesthrough theexpression to therela tional expr essionsC D andE Itcan however notpr opagatestraightfor war d because thethird relational expression gt D hasitsown operator which propagates to thelast relational expression W hen anew operator isencountered weproceed with that one A A B OR C OR gt D OR E B OR A 0 08 A gt DORA gt E When a N OT isencountered right beforean operand without an operator that rela tional expression isnegated T he N O T doesnot propagatefurther in the expression Thebracketsin theexpression areonly addedto improvereadability A A B OR NOT C OR B OR NOT A C ORA D D Whena NO T isencounter edbefor ean operator theoperator aswell asthe N OT propagatethrough therest of theexpression A A B OR NOT gt C OR D B OR NOT A gt C OR NOT A gt D A bbreviated expressionsdo not careabouttheprioritiesof logical operators O necan arguethat sinceconjunction bindsstronger than disjunction the operator should not propagateoutsidetheconjunction makingthefollowing expression equal to
69. pr2 GLF LogExpr3 67 A ppendices distribute ands and GLF LogExprl GLF LogExpr2 distribute ands and GLF LogExprl GLF LogExpr3 and or q r p or and p q p and p r da2 distribute ands and or GLF LogExpr2 GLF LogExpr3 GLF LogExprl or distribute ands and GLF LogExprl GLF LogExpr2 distribute ands and GLF LogExprl GLF LogExpr3 All the other cases can have no ors within an and default da4 distribute ands GLF LogExprl GLF LogExpr3 is or GLF LogExpr3 NO distribute ands GLF LogExpr2 GLF LogExpr4 is or GLF LogExpr4 NO distribute ands and GLF LogExprl GLF LogExpr2 and GLF LogExpr3 GLF LogExpr4 default da5 distribute ands GLF LogExprl GLF LogExpr3 is or GLF LogExpr3 YES distribute ands GLF LogExpr2 GLF LogExpr4 distribute ands and GLF LogExprl GLF LogExpr2 distribute ands and GLF LogExpr3 GLF LogExpr4 default da6 distribute ands GLF LogExprl GLF LogExpr3 distribute ands GLF LogExpr2 is or GLF LogExpr4 YES GLF LogExpr4 distribute ands and GLF LogExprl GLF LogExpr2 distribute ands and GLF LogExpr3 GLF LogExpr4 da8 distribute ands or GLF LogExprl GLF LogExpr2 or distribute ands GLF LogExprl distribute ands GLF LogExpr2 since we have already moved the negations to the oe oe oe 065 oe oe RelExprs with move negs they don t ne
70. pressions wherewecan takearela tional expr ession and putan optional N O T befor eit orwecantakeaLogicalOr Expression and put brackets around it with an optional N OT TheseU nary L ogical E xpressions are valid L ogical A nd Expressions which may be followed by thelogical operator A N D conjunction and another U nary L ogical Ex pression Clean CO BOL canisastandardized way to writedown logical expressions It can be regardeda canonical form of COBOL Finally wehaveL ogical Or Expressionswhich consist of aLogical And Expr ession optionally followed by thelogical operator O R disjunction and another L ogical A nd Expression A partial parsetreeof NoT A gt B AND A gt C OR a olookslikeFigure3 2 1 Logical OR Expression NOTA gt BANDA lt CORA 0 Logical AND L ogical Logical AND Expression O perator Expression NOTA gt BANDA C OR A 0 U nary Logic Logical U naryLogic Expression O perator Expression NOTA gt B AND A lt C Logical Relational O perator Expression NOT A gt B Arithmetic Relational Arithmetic Expression O perator Expression A gt B Figure 3 2 1 N ew priorities in Clean COBOL heparsetreeillustratesthe priority rulesthat apply to logic first thenegation then theconjunction and finally thedisjunction Sincewedo not work with N O T opera torswithin rdational expressions
71. prl GLF AriExpr2 gt GLF AriExprl GLF AriExpr2 RN7 4 RemNot not gte GLF AriExprl GLF AriExpr2 lt GLF AriExprl GLF AriExpr2 RN7 5 RemNot not eq GLF AriExprl GLF AriExpr2 neq GLF AriExprl GLF AriExpr2 RN7 6 RemNot not neq GLF AriExprl GLF AriExpr2 eq GLF AriExprl GLF AriExpr2 65 A ppendices Lo 6 6 Remove all 8 RA1 RemAnd GLF RelExpr GLF RelExpr A not around an or continue RA2 RemAnd not or GLF LogExprl GLF LogExpr2 not RemAnd or GLF LogExprl GLF LogExpr2 A not around an and continue RA3 RemAnd not and GLF LogExprl GLF LogExpr2 not RemAnd and GLF LogExprl GLF LogExpr2 A not around another not remove both RA4 RemAnd not not GLF LogExpr RemAnd GLF LogExpr We also remove inner not s 5 RemAnd not GLF RelExpr RemNot not GLF RelExpr RA6 RemAnd or GLF LogExprl GLF LogExpr2 or RemAnd GLF LogExprl RemAnd GLF LogExpr2 Here he comes p and q lt not not p or not q 5 otice how the RemAnd s are around the newly introduced nots in order to remove possibly added doubles RA7 RemAnd and GLF LogExprl GLF LogExpr2 not or RemAnd not GLF LogExprl RemAnd not GLF LogExpr2 module N ormalForms equations CNF1 move negs GLF LogExprl GLF LogExpr2 distribute ors GLF LogExpr2 GLF LogExpr3 2CNF GLF LogExp
72. r LF MulExpr Q LF PowerOfExpr LF UnaryExpr LF UnaryExpr UIGLF AriExpr UIGLF AriExpr UIGLF AriExpr UIGLF AriExpr UIGLF AriExpr GLF AddExpr GLF AriExpr GLF MulExpr GLF AriExpr GLF PowerOfExpr GLF AriExpr GLF UnaryExpr GLF AriExpr GLF PrimExpr GLF AriExpr GLF Identifier gt GLF AriExpr GLF Literal GLF AriExpr GLF FigLit GLF AriExpr add GLF AriExpr GLF AriExpr pos sub GLF AriExpr GLF AriExpr mul GLF AriExpr GLF AriExpr gt div GLF AriExpr GLF AriExpr 5 pow GLF AriExpr GLF AriExpr gt neg GLF AriExpr GLF PrimExpr GLF Identifier GLF PrimExpr GLF Literal GLF PrimExpr GLE FigLit GLF PrimExpr UIGLF AriExpr UIGLF CHR PLUS UIGLF AriExpr gt UIGLF AriExpr UIGLF CHR MINUS UIGLF AriExpr lt UIGLF AriExpr UIGLF CHR MUL UIGLF AriExpr gt UIGLF AriExpr UIGLF CHR DIV UIGLF AriExpr gt UIGLF AriExpr UIGLF CHR POW UIGLF AriExpr gt UIGLF UnaryExpr UIGLF AriExpr UIGLF CHR MINUS UIGLF UnaryExpr UIGLF PrimExpr UIGLF UnaryExpr UIGLF Identifier UIGLF PrimExpr UIGLF Literal UIGLF PrimExpr UIGLF FigLit UIGLF PrimExpr UIGLF AriExpr gt UIGLF PrimExpr module GL F R elE xpressions context free GLF FigLit gt syntax GLF RelExpr GLF AriExpr gt GLF RelExpr lite GLF AriExpr GLF AriExpr 97 gte G
73. r we will briefly describe the two environ ments between which we are working We will give glo bal information on the properties of these environments and issues we must address during the embedding pro cess E 7 1 TheCurrent Situation Software renovation systems often consist of many com ponents each with different functionality D uring thereno vation process various compilers or other translators rew rit ers play an important role In some of these phases a need for abstraction exists Formal specifications of program ming languages are a way to implement translators and re writers in an abstract way A bridge a way to connect two environments Since the systems that need to be renovated often are of that are separated by water or gaps vital importance to society such as systems at banks air ports hospitals government institutes or even military systems and nuclear plants the renovation system must bereliable A firm theoretical description of the software reno vation system can bevery beneficial to its reliability A n algebraic formalism like A SF SD F supported by the ASF SD F M eta Environment K li92 is very suitable for these pur poses ASF BH K 89 and SDF HK R89 together form a powerful way of describing specifications and term rewriting systems TheASF SDF Meta Environment TheASF SDF M eta Environment K li92 is a programming environment for writing ASF SDF specifications From a modular language defin
74. riExpr GLF AriExpr sub GLF AriExpr GLF AriExpr gt GLF AriExpr mul GLF AriExpr GLF AriExpr gt GLF AriExpr div GLF AriExpr GLF AriExpr gt GLF AriExpr pow GLF AriExpr GLF AriExpr gt GLF AriExpr UI AriExpr UI CHR PLUS UI AriExpr gt UI AriExpr UI AriExpr UI CHR MINUS UI AriExpr gt UI AriExpr UI AriExpr UI CHR MUL UI AriExpr gt UI AriExpr UI AriExpr UI CHR DIV UI AriExpr gt UI AriExpr UI AriExpr UI CHR POW UI AriExpr gt UI AriExpr Itishar dto comeup with anobjectivemeasur X ementforr eadability W hatmight appeal to oneanalyst could feel cumbersometo theother O neofthebenefitsof G LF and our modular approach is thefact that any kind of U IG LF can be connected to the system whenthatispreferred 24 TheRewriters The preceding chapters describe the core issues regarding the grammar of our various notations languages T his chapter will fo cus on the disabbreviations translations and transformations be tween these notations We will frequently be showing pieces of the rewriters in the form of ASF equations please keep in mind that t these are only small parts of the equations and that they merely serve as examples 5 1 Cleaning Up COBOL Beforewedo anything with C O BO L logical expressions weclean them up D isabbreviation Themost important issuein thecleaning up of C O BOL expressionsisremoval of abbreviated expressions W he
75. rl GLF LogExpr3 DNF1 move negs GLF LogExprl GLF LogExpr2 distribute ands GLF LogExpr2 GLF LogExpr3 2DNF GLF LogExprl GLF LogExpr3 move negs move negations inwards no removal mni move negs GLF RelExpr GLF RelExpr mn2 move negs not GLF RelExpr not GLF RelExpr mn3 move negs not and GLF LogExprl GLF LogExpr2 or move negs not GLF LogExprl move negs not GLF LogExpr2 mn4 move negs not or GLF LogExprl GLF LogExpr2 and move negs not GLF LogExprl move negs not GLF LogExpr2 mn5 move negs not not GLF LogExpr move negs GLF LogExpr mn 6 move negs and GLF LogExprl GLF LogExpr2 and move negs GLF LogExprl move negs GLF LogExpr2 mn7 move negs or GLF LogExprl GLF LogExpr2 or move negs GLF LogExprl move negs GLF LogExpr2 is and ial is and and GLF LogExprl GLF LogExpr2 yes default ia is and GLF LogExpr 66 no iol is or or GLF LogExprl GLF LogExpr2 yes default io is or GLF LogExpr no 5 distribute ors or p and q r and or p q or p r do1 distribute ors or GLF LogExprl and GLF LogExpr2 GLF LogExpr3 and distribute ors or GLF LogExprl1 GLF LogExpr2 distribute ors or GLF LogExprl GLF LogExpr3 or and q r p and or p q or p r do2 distribute ors or and GLF LogExpr2 GLF LogExpr3 GLF LogExprl and distribute ors or
76. roblems within the code of the ATerm library we make sure all the code that makes use of the functions is corrected It makes the library both easier and safer to use Memory Management G etting the registers O f course low level memory management brings lots of problems when porting code Weencountered variousincompatibilities Weeven discovered a bug in amemory initialise statement during the porting process A certain part of memory had to be initialized to zero but only part of it actually was set The function memset was called which sets a specified number of bytes to a certain value in this casezero unfortunately it was called to set anumber of ATerms to zero without providing it with the size of an ATerm the call was memset start pointer 0 nr of ATerms but it should have been memset start 49 Embedding a Formal Specification in a Commercial Environment pointer 0 nr of ATerms sizeof ATerm Within U nix this did not result into crashes because memory seems to be zero by default within Windows N T however the bug started to become a problem because memory is explicitly initialized to a different value and the library started to crash at unpredictable moments The garbage collector within the ATerm library uses the mark sweep tech nique BJK O 99 themark phasethememory must be checked for ATerms F in order to mark them This includes checking the stack and the registers In U nix the sigsetjmp function places t
77. s to have knowledge of the production rules of the TRS and the ATerm library If another TRSisneeded this meansthat extensive changes are needed within the external system in order to make communication possible 45 Embedding a Formal Specification in a Commercial Environment Interface 3 Total Integration Thethird possibility is total integration The key feature of this implementation is that the external system has the same ATerm environment as the TRS This is the solution for real maximal sharing Providing us with a very memory and time efficient solution To make this possible we need to haveas much ATerm functionality available within our external system as possible U nfortunately such functionality is not yet available in J ava and language differences especially the memory management differences between J ava and C make maximal sharing almost impossible T his implementation also puts alot of constraints on the external system making it totally dedicated to a specific TRS Our choice In our situation the independence between the components is of vital importance The M eta Environment the external system and the TRS areall under construction and we must be able to plug and unplug a specification at any desired moment In certain situations it is even desirable to use different TR S s from the external system T his leads us to the first type of interface Furthermore the construction of AsFix input terms in the
78. stry BK V 98 E 1 2 The Project TriLoc however has taken the challenge to investigate whether state of the art scientific technology can be adapted to be useful within the Logic M ining pro cess In close cooperation with the U niversity of Amsterdam and the N ational CWI 7 Research Institute for M athematics and C omputer Science C WI a project was initiated to accomplish this The project contains four subprojects 1 Describing a rewriter in the algebraic specification formalism A SF SDF that can be used to normalize CO BOL logical expressions 2 Defining other rewriters and grammars including a language independent logical format on which various normalization techniques can be applied 3 Porting the rewriting component that supports execution of A SF SD F rewrit ing systems of the A SF SD F M eta E nvironment to Windows N T 4 Embedding its functionality in the Logic M ining system We will describethe process in two parts which can be read independently 1 The analysis normalization generalization and rewriting of CO BOL logical expressions 2 Porting parts of the A SF SDF M eta Environment to a non U nix environment and embedding these parts in an external commercial software renovation sys tem A key feature of this project isthe communication between science and practice the cooperation between TriLoc on one side and the U ni versity and the CWI on the other the port of a U nix sys tem to Windows NT x
79. ter has been final ized at least to a certain degree The TR S functionality the ATerm and A SF ix libraries and the SG LR parser can already be used in cases like this thereby bridging the gap between this scientific and other commercial systems C oncluding R emarks 11 Concluding R emarks Returning to the ntroduction of thisthesis we look at which goals wehaveachieved 11 1 condusion Returning to the beginning of this thesis we will review what has been done 1 We have described a CO BOL rewriter in the algebraic specification formalism ASF SDF that can be used to disabbreviate CO BOL logical expressions This was the initial assignment for this master thesis 2 We have also defined a language independent logical format the GLF on which various rewritings are possible These rewritings include transformation to nor mal forms the removal of logical operators and the important translation to the readable U IGLF format This makes the system much more powerful and opens a world of opportunities for future extensions 3 We have ported major parts of the A SF SD F M eta Environment rewriting sys tem that supports A SF SD F to Windows NT This allows future projects to make use of ATerms A sF ix SGLR and the support library under N T We solved some problems and gave feed back to the M eta Environment developers for fu ture releases 4 Wehave created a stand alone TRS component that can be linked to the TriL oc Logic
80. teterms with an independent TRS we need to make decisions about which parts of the system we need to port how we will make them work together and in what way they should communicate with the external System What can we port We will leave the M eta Environment and the Toolbus untouched since these systems arevery complex still partly under construction hard to port and therefore beyond the scope of this project Besides it is our goal to create alibrary rather than aset of tools that we have to embed in an existing system T his means wewill still haveto create the source codeof our TRSon Unix but wecan rewrite terms on another platform Perhaps when this project succeeds and most of thetools and libraries are ported it will beinteresting to look at possibilities to port larger components like the Toolbus to Windows N T as well Embedding a Formal Specification in a Commercial Environment Without the Toolbus it is necessary to think about a way the re maining tools haveto communicate n our case we need to connect n theTRSto alarger system and the ideal situation is an independent pa fe stand alonelibrary that can beused in thecurrent system T his means i wehaveto modify thetoolsto libraries with functionality indepen nium dent of the Toolbus and other unused components A nother thing Pee 7 2 we have to keep in mind is that we want the system to be modular with at least the TRS as a separate part so we do not
81. the existing system to overcome the language differences Vorkstation 39 Embedding a Formal Specification in a Commercial Environment a T he Components In thischapter wewill givean in depth description of our existing source components and the component we are going to build the target E 8 1 An in depth look at the M eta E nvironment The M eta Environment is a complex system consisting of many differenttoolsthat work together n order to adaptthis system to its new surroundings wehaveto have detailed knowledge about its tech nical background and its design The following sections discuss the parts of the system that are important to our project To get a clear view on the whole picture it is necessary to understand its components ATerms and AsFix In order to exchange data between its components the M eta Environment uses Anno tated Terms BJK O 99 Annotated terms or just ATerms form a data structure that is ideal for communication between tools because they are platform and language inde pendent and simple to create compact and extend To makeATermsavailableto other applications a C library JO 99 has been developed that provides high performance operations on ATerms as well as input output and effi cient memory management Input output is possible in textual or in a more compressed binary format A minimal amount of memory is required when working with ATerms thanks to the technique of maxi
82. to them regarding the whole M eta Environment and everything that comes with it Furthermore alot of credits go to J eroen Scheerder for providing support on the parser Tobias K uipers for reacting on my phone calls and emails Pieter O livier for his help with the ATerm library and Pieter Bloemendaal for his cooperation from TriLoc s side FOA Paul Klint gave me general directions without him G er Bakker G ert Jan Tretmans and other people from the TriL oc management team this project never would have ex isted in the first place Lia Bos and Lucienne Evers provided me with information on Logic Mining and Jacob Brunekreef supported me on the A SF SD F development and this thesis Although all these people helped me a great deal and have got enough talent of my own none of this would have even come close to success if it wasn t for the uncondi tional support of my parents So last but definitely not least would liketo thank them G eorge and M ientje for supporting me W hen reading this thesis you might find the layout to be alittle different from what is expected from a master thesis have tried to make this thesis both pleasing for the mind as for the eye hope you enjoy reading it Yourstruly Alex Photography by Remko Schnorr Disce ut semper victurus vive ut cras moriturus Logical Expressions Analyzing G eneralizing R ewriting embedding a scientific implementation in a commercial environment A
83. traction are needed in order to analyze existing systems Source code must be analyzed and visualized in a dear and unambiguous way This part discusses a language independent format for logical ex pressions the G eneral L ogical Format GLF that is used in a re en gineering environment A rewriting system for translating CO BOL logical expressions into this independent format will be presented as well as various transformations on GLF 2 Introduction In this chapter we will explain why wefeel that rewrit ing of logical expressions is necessary We will describe some practical issues as well as some excentric properties of logical expressionsin COBOL CO B E 2 1 TheCode Animportant step in softwareanalysisand renovation is theanalysisof sourcecode A Ithough documentation ofthe codecan beof help itis often out of dateand notin sync DEA Vy with theactual programs T heonly way to becertain that 22222 youareindeed analyzingthemost up to dateinformation 22 isby analyzingtheactual sourcecode Softwae development should just like other large construction work be done by engineers Conditional and Logical Expressions Codecan bedivided into variousclasses W ecan identify data assignment statements input output code subroutines conditional expressionsand many more A Il theseitems havetheirown effect on thebehavior of theprogram n thisthesis wewill focuson the logical expressionsthat arethemost important part of condition
84. tsidethewholeex pression aswell default do6 distribute ors GLF LogExprl GLF LogExpr3 convert arguments distribute ors GLF LogExpr2 GLF LogExpr4 to CNF is and GLF LogExpr4 YES this arg has one or more and s distribute ors GLF LogExprl GLF LogExpr2 distribute ors or GLF LogExpr3 GLF LogExpr4 Ifwewanttotransform an expression to C N F wefirstmovethenegationsinside since that operation can generateadditional conjunctions and then distributethedisjunction CNF1 move negs GLF LogExprl GLF LogExpr2 distribute ors GLF LogExpr2 GLF LogExpr3 2CNF GLF LogExprl GLF LogExpr3 In practicewefound it useful to makeatransformation that removesthenegations whiletransfor mingan expr essionto CN F Whenthenegationsar emovedtotheinside oftheexpression i e up to therelational expressions wecan makethem disappear by invertingtherelational operator 28 not A gt B can bewritten as A lt B not A B can bewritten as A lt gt B or in GLF not gt A B Ite A B not eq A B neq A 8 Thismeanswewill removethenegationsinstead of just movingthem CNF1 remove negs GLF LogExprl distribute ors GLF LogExpr2 2CNF GLF LogExprl GLF LogExpr3 GLF LogExpr2 GLF LogExpr3 The Disjunctive N ormal Form DNF The DisjunctiveN ormal Form or just DNF isver y similar to theC N F Thedif fer ence liesin theuseof thelogical operators W herein theC N F we
85. ty printed by hand or or gt 0r gt id vn A id vn B gt BORA lt D AND NOTACCAND B D ORNOTBSE or andi ST dt id vn A 1 id vn D i A gt B A lt DANDNOTA lt CANDB DORNOTB gt E 20 mid PRESS not lt id vn A id vn C A lt D AND NOTACANDB D NOTBSE eq id vn B id vn D Mnom dn A gt D NOTA lt CAND B D not gt id vn B id vn E Partial parse tee of A gt BOR lt D AND NOTCAND B DORBNOT gt E after cleanup befertranslation to GLF N oticehow thedisjunction sinceithasalower priority isfirstin theG LF expression andthereforehigher in theparsetree W hen translated to U IG LF weseehow theexpression becomes readable A gt B or A lt D and not A gt C and B D or not B gt E N oticethatonly when anew logical operator isencountered brackesareused 31 A General Model for Logical Expressions Wecantransfor mitto CNF A gt B or not B gt E or A D and A gt B or not B gt E or not A gt C and A gt B or not B gt E or B D and to CN F withoutthenegations for improved readability A gt B or B lt E or A lt D and A gt B or B lt E or A gt C and A gt B or B lt E or B D to D N F wenoticethattheexpression wasalready in D N F A gt B or A D and not A
86. vest their money time and personnel into such a time consuming project especially in the fast IT business E ven purchasing a system likethat if acompletesystem isfound is not straightforward C om panies have different requirements and digital infrastructure than most scientific insti tutes Scientific institutes are willing to take chances and time to develop such a project be causetheir interest lies in theresearch as well astheresults T hey arealso willingto share and exchange knowledge with other institutes rather than trying to keep the competi tion ignorant of their projects The A SF SD F M eta Environment is no exception and is also developed in a scientific environment It has already been used in some commercial projects BD K M 96 but it has never truly been embedded in an external system O ur question our goal Weare going to describe the process of alow level integration of the A SF SD F formal ism using the functionality of the M eta Environment in a commercial environment T his means that the formalism will be a part of a commercial application rather than part of a process We will show how it is done which issues have to be considered and which choices have to be made This project can function as a pilot or an example for future projects The main questions of this part of the thesis are Can we embed the A SF SDF functionality of the M eta Environment in a com mercial environment Which problems do we encou
87. vironment Theissuecan be looked upon as atype casting problem In the old M eta Environment subsorts of sort A areimplicitly casted to work on rewrite rules of A The generated C code expects the A SF SD F programmer to be more specific about his or her type usage This is a known problem and will either be solved or correctly documented in the future by the M eta Environment development team Generated C code will not compile In rare cases the generated C code simply will not compile because of a ty peincompat ibility Since the developers of the compiler within the M eta E nvironment believe it will 53 Embedding a Formal Specification in a Commercial Environment be beneficial for the efficiency they do as little explicit typecasting as possible T he case we discovered has been reported to them and we expect this issue to be resolved in the next distribution of the system Not ALL C codeisGenerated by the Meta Environment Strictly the M eta Environment does not generate all of the source code needed to cre ateaTR Sunder Windows An important file called init c is not generated by the built in compiler but is implicitly defined in the generated M akefile So after the C code is generated it is necessary to compile it under U nix to get the init c file Thisisaknown fact but wearenot sureit is abug or afeature Atthistimethereisno information available on whether this issue will be addressed or not Memory L eak W hen we r
88. wecan makeour typesmoresimple T hisparsetreeis aclear top down representation of theactual logical expression A ny subtreeisavalid COBOL expression 18 Thepriorities of the logical operators in C lean CO BO L areforced by thehierarchi cal str uctur eofthegrammar Thesor tsar eor der ed in such away that they for cecer tain languageconstructs with ahigher priority to below in theparsetree whileothers exist onahigher level lower priority Clean COBOL in SDF Clean CO BOL hasno explicit definitioninSDF Itismer ely asubset of thenor mal COBOL grammar Wher eC O BOL allowsabbreviatedr elationalexpr essionsand NOT S in relational expressions C lean CO BO L only focuses on their clean counterparts context free syntax CB RelExprOperator NOT CB RelExprOperator Clean COBOL and COBOL CB AddExpr CB RelExprOperator CB AddExpr normal COBOL only CB AddExpr CB OpNot CB AddExpr CB OpNot CB AddExpr CB AddExpr TheL ogical expr essionshavethesamegrammar theuseof sorts and subsorts Highest priority CB RelExpr CB LogOrExpr NOT CB UnaryLE AND CB UnaryLE CB AndShort CB UnaryLE CB AndShortList Lowest priority OR CB LogAndExpr CB OrShort CB LogAndExpr CB OrShortList the disjunction CB OpNot CB OpNot CB RelExpr CB RelExpr CB RelExpr CB RelExpr N oticehow prioritiesar efor ced with relational expressions and brackets CB Unar
89. write Term Rewriting terms is the actual goal of our system O nce the initialize function has been called the rewriter will be executed many times In pseudo C it looks like this char rewrite_term char input_term char ptable ATerm at char output term at SG Parse SG LookupParseTable ptable input term at innermost at Rewrite AFsource at output term AsFix2Text return output term As seen above the rewrite term function gets an input term in string format and re turns the rewritten term also in string format The subcomponents are not called as separate tools with a script or something but a more low level approach has been taken by directly calling certain functions that exist within the sources of the components This is a major deviation from the Toolbus ap proach that is normally used within the M eta Environment The subcomponents are linked as static libraries Improvements of the M eta A dapter For the future it might be interesting to link the TRS sub component dynamically allowing the use of other TRS s without having to recompile the entire M eta A dapter This would mean that we would have to re initializethe new rewriter and perhaps close de initialize the old one A user can locate TRS dll s and their accompanying parse tables manually or through somerepository We however regard this extension as atask that lies beyond the scope of this thesis and leave it for future work
90. yLE CB UnaryLE CB UnaryLE CB AndShort CB AndShortList CB LogAndExpr AJ OR CB OrShort gt GH Dr5horthist CB LogOrExpr TheC leanup disabbreviater takes and returns C 0 BOL oe oe oe oe oe oe and removes double NOT s CleanUp fixes abbreviated relexpressions moves NOT s within relexpr to their UnaryLE s removes redundant comma s CB CleanUp CB LogOrExpr CB LogOrExpr 19 A General Model for Logical Expressions 20 4 Generalized L ogical Format eben Logical Expressions in programming languages are ond i n ine 0 A gt MOFLA much alike Therefore it is possibleto writethem in alan 70 00 04 ra mos guage independent format This chapter will introduce F o2 gt or a 4 theG eneralized L ogical Format G LF and its more read A fragment of in and outputofn the first able form U IG LF real system that waewritten by the TRS The input is COBOL the output is UIGLF in CNF without negations GLF 4 1 WithGLF wecanwritelogical expr essionsin alanguageindependent way that is easy to describe parseand rewrite TheLogic of GLF W hen describingthelogical operators used in G LF weregard it as a propositional language L containingthefollowingsymbols e asetofpr opositiona variables Var L thesear eourr elationd expr essions K N 97 theunary logical connective not F theN OT F thebinary logical connective or F G
Download Pdf Manuals
Related Search
Related Contents
電動ワイヤーアッパー Monnit Wireless Control User's Guide ASUS CM1745 TR8163 User's Manual Lamborghini - Thin 24 MCS LG Chocolate VX8575 User Guide User Manual - Lemmen Engineering CLIMATIZZATORE D`ARIA Hisense Corporation CC1010DK User Manual Moisture Content Meter - Fondriest Environmental Copyright © All rights reserved.
Failed to retrieve file