Home

An investigation of the methodology for software translation from

image

Contents

1. required by the selected translation methodology The translated code is built directly from the design specifications The result of this effort the translated program is supported by the design documentation a programmer s guide and a user s guide The programmer s guide including the design documentation is a stand alon locument that describes the translated code to support maintenance and future development efforts The user s guide is constructed for immediate use by program users II BACKGROUND A RELATIONSHIPS AND DEFINITIONS 1 Software Reusability Software reusability is a simple concept in theory Since the word reuse means to use something more than once software reuse should imply using software more than once The immediate question then becomes How is software defined in the context of reusability There are a number of definitions which have been proposed by researchers and programmers for software reusability There is also the absence of a definitive description of what should be considered for reuse and little consensus by researchers on terminology or methodology A marrow definition of software reusability is the reuse of code Code can be reused in a number of ways using previously developed library routines in a new program porting functions without major changes from one program or system to another and translating a program or a portion of a program from one environment to
2. The Record of the 1983 Software Maintenance Workshop IEEE Computer Society Press 1984 8 Fay S and Holmes D Help I Have to Update an Undocumented Program The Proceedings of the Conference on Software Maintenance 1985 IEEE Computer Society Press 1985 9 Jensen K and Wirth N Pascal User Manual and Report Springer Verlag 1974 10 Gehani N C An Advanced Introduction Bell Telephone Laboratories 1988 11 Kernighan B and Ritchie D The C Programming Language Prentice Hall 1978 107 12 Feuer A and Gehani N A Comparison of the Programming Languages C and Pascal Comparing and Assessing Programming Languages Ada C and Pascal Prentice Hall 1984 13 Sneed H and Jandrasics G Inverse Transformation of Software from Code to Specification Proceedings IEEE Conference on Software Maintenance IEEE Computer Society Press 1988 14 Arango G and Baxter I and Freeman P and Pidgeon C TMM Software Maintenance by Transformation Tutorial Software Reusability IEEE Computer Society Press 1987 15 Marcotty M and Ledgard H Programming Language Landscape Science Research Associates 1986 16 Yellin D Attribute Grammar Inversion and Source to Source Translation Lecture Notes in Computer Science Springer Verlag 1988 17 Glockenspiel Turbo Pascal To QuickC Translator Howto Documentation Microsoft Corporation 1987 18 Bergsten P PTC Imple
3. DTIC FILE COPY NAVAL POSTGRADUATE SCHOOL Monterey California AD A229 033 DTIC Pt LECTE D nAn an OV 2 7 1890 N i B a THESIS AN INVESTIGATION OF THE METHODOLOGY FOR SOFTWARE TRANSLATION FROM PASCAL TO C OF AN UNDOCUMENTED MICROCOMPUTER PROGRAM by Charles W Bell March 1990 Thesis Advisor LCDR Rachel Griffin Approved for public release distribution is unlimited JNCLASSIFIED CURITY CLASSIFICATION OF THIS PAGE REPORT DOCUMENTATION PAGE TREPORT SECURITY CLASSIFICATION 1b RESTRICTIVE MARKINGS Unclassified 1 SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION AVAILABILITY OF REPORT Approved for public release Distribution is unlimited DECLASSIFICATION DOWNGRADING SCHEDULE PERFORMING ORGANIZATION REPORT NUMBER S 5 MONITORING ORGANIZATION REPORT NUMBER S NAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 7a NAME OF MONITORING ORGANIZATION If applicable Naval Postgraduate School 037 Naval Postgraduate School 7b ADDRESSXCity State and ZIP Code ADDRESS City State and ZIP Code Monterey CA 93943 5000 Monterey CA 93943 5000 a NAME OF FUNDING SPONSORING ORGANIZATION Beense Systems Managemen Coliege c ADDRESS City State and ZIP Code Director DSS Directorate DRI S PROGRAM PROJECT TASK Defense Systems Management College ELEMENT NO NO NO Fort Belvior VA 22060 5426 1 TITLE include Security Classification A
4. An attribute grammar is an extension of a context free grammar and formally specifies context sensitive rules Attributes Attributes are context sensitive rules of grammar Attributes are directly associated with productions and are expressed in the form of conditions which must be evaluated Attribute values Attribute values are determined by evaluating attributes and associated productions A simple example illustrates the application of attribute grammar technology The following are two statements in a PASCAL program Statement 1 is a variable declaration and statement 2 uses the variable declared in an assignment statement statement 1 X char statement 2 X 1 Assume that only a context free grammar is available to analyze the two statements Statement 1 is first scanned 56 and parsed into tokens A sequence of four tokens is recognized variable name X operator keyword char and operator The analysis of this sequence of tokens determines that the statement with respect to the grammar is legal The same is done with statement 2 without taking into account the first statement already analyzed Statement 2 is also determined to be legal However compiling these two statements with a PASCAL compiler would cause statement 2 to be flagged as an error Variable X was declared to be of type char character but was assigned an integer value which is illegal in PASCAL How did the compiler recogn
5. discussed in the following sections 70 1 Structured English Structured English Ref 4 is a tool that combines plain English with simple structured programming constructs to describe program routines Structured English is written as short precise sentences describing data transformations and flow of control Structured English sentences are composed of imperative English verbs describing action data dictionary terms as the subject of the action and reserved words commonly used in structured programming to denote the logical flow of the program There is no universally accepted formal dialect for structured English This is an advantage because it allows the software maintainer to establish the compromise between rigid control and the readability that is right for a specific project Once that balance is reached consistency of use is the most important factor to keep in mind The structured English syntax suggested by Whitten Bentley and Ho Ref 4 provided the baseline for the dialect used in the case study In structured analysis and design data dictionary entries are used as the subject of structured English sentences In the inverse transformation methodology these terms are extracted directly from the source code the case study terms used as the subject in structured English sentences were added to the data dictionary as they were introduced Although this appears to conflict with the pattern of de
6. one type to another PASCAL limits type conversions to 44 explicitly called routines and mixed mode expressions containing integer and real variables C does not always require that variables be checked for type compatibility For example the language definition does not require that the types of actual and formal parameters be checked for compatibility Strongly typed languages such as PASCAL improve program clarity and reliability Loosely typed languages such as C encourage and support programmer flexibility Ref 12 2 Comparison of Features This section addresses the main differences in the two languages Ref 12 The purpose of this section is to highlight areas of concern for software translation It is assumed the reader has a basic familiarity with programming language concepts and the features of the two languages A complete description of the languages is not intended a Data Types PASCAL data types provide security from errors readability and reliability primarily attributable to consistency checking not required by C C data types allow addressing physical memory locations multiple precision arithmetic no restrictions on where pointers can point 45 address arithmetic and few restrictions on manipulating arrays b Statements The C and PASCAL languages use the semi colon in a slightly different manner In C the semi colon is used as a statement terminator In PASCAL the semi colon is used as a sta
7. the requirements analysis and design specifications SDLC phases because of the previously noted ripple effect that errors and oversights have on later phases Additionally developers want to be able to maximize the quality of their 13 high level development effort by reusing successful early development phases during the maintenance phase and with other projects The potential for software reusability can be improved by formalizing and standardizing the requirements and design phases of the SDLC Specific examples of this process are discussed in Chapter IV and include inverse transformations the transformation based maintenance model and attribute grammar technology There are a number of problems related to software reusability A software developer who desires to reuse software must be able to locate reusable products appraise their usefulness discover any modifications that are necessary to adapt the reusable product and evaluate the impact of using a reusable product on later phases of development None of these basic steps can be readily accomplished at present Although numerous libraries of reusable code are available there is no standardized method of identifying what the reusable product does or what restrictions it may have Trying to figure out what reusable code might be useful and what it does is similar to perusing a computer bulletin board for microcomputer programs There are thousands of programs avaiiabi
8. Based Maintenance Model 3 Attribute Grammar Technology 4 Manual Re implementation 5 Automated Source Code Translation a TPQC Features ws se w See Se b PTC Features D COMPARISON AND SELECTION V DESIGN STRATEGY AND TRANSLATION APPROACH A OVERVIEW oa A a a a a B REQUIREMENTS ANALYSIS C DESIGN STRATEGY 1 Structured English 2 Structure Chart 42 43 45 45 46 47 48 48 51 53 60 61 62 65 68 68 69 70 71 72 3 Data Dictionary D TRANSLATION APPROACH EEA 1 2 9 Step 1 Develop the Design Specification Step 2 Evaluate Screen Display Data Entry Development Section Step 3 Program the Screen Display Data Entry Development Section Step 4 Evaluate the Database Management Development Section Step 5 Program the Database Management Development Section Step 6 Connect Database Management and Screen Display Data Entry Prototypes Step 7 Evaluate the Print Routines Development Zcction Step 8 Program the Print Routines Development Section Step 9 Connect the Print Routines Prototype 10 Step 10 Test the Program 11 Step 11 Review the Tested Program 12 Step 12 Ongoing Translation Steps VI CASE STUDY APPLICATION viii 74 76 77 77 79 80 82 82 83 84 85 85 87 87 89 OVERVIEW a en StS So ae a TRANSLATION APPROACH APPLICATION 1 as 10
9. above statement 1 would be recognized as legal and not statement 2 because of the type mismatch However in C the type char is only another representation of the type integer In C integers can be assigned to variables of type char without error In reality both statements are legal in C The attribute grammar must reflect this properly An attribute grammar is very specific to the language it describes In order to use attribute grammars for language translations an intermediate language is needed to bridge the differences in the languages An attribute grammar is developed which translates the source language to this intermediate form and another attribute grammar is developed to translate the common intermediate form to the target language The intermediate form is devised in one of two ways the greater common devisor method and the least common multiple method When using the greatest common devisor the translator attempts to create an intermediate form that retains as much of the higher level functions of the two languages as possible In order to represent functions that exist in one language and not the other these high level functions are rewritten as a series of lower level functions that are common to both languages This causes inefficiencies and loss of program structure if not used carefully The greatest common divisor method works well with source languages that are closely related such as C and PASCAL It is
10. and more difficult to understand The conditions below if present in the source code can cause unexpected results PTC does not automatically flag potential problems The conditions below should be reviewed as possible sources of problems in compiling or running the translated code e Record Variants PTC uses a complex formula for determining the size of memory to allocate for variant records The memory allocated may not be adequate Pointers A pointer defined recursively e g type ptr ptr cannot be translated Procedure Scoping Rules PASCAL scoping rules for nested procedures are ignored Nested procedures dependant on PASCAL scoping rules must be modified Ref 18 64 D COMPARISON AND SELECTION Five software translation methodologies were reviewed Three of the methodologies were considered unsuitable for the case study for reasons cited below These methodologies were the transformation based maintenance model attribute grammar technology and manual re implementation The primary methodology selected for use in this case study was inverse transformation Additionally the automated source code translators were used on selected portions of the case study The transformation based maintenance model TMM is a complex methodology that requires a major investment in development time For small programs the development time of the DAG alone can be expected to exceed the time required to develop the program
11. another Ref 2 Expand this limited definition to include application generators An application generator is software that generates new code Therefore using an application generator more than once is software reuse Restricting software reusability to code is still too limiting The software development life cycle should not be excluded from consideration Every phase of development from the requirements analysis to implementation and maintenance should be examined Methodologies have been developed to reuse phases from one development effort in another effort This too is software reusability Where is the line drawn What is reusable and what is not If it is reusable then how and when should it be reused There are no definitive answers Given this broader scope applications of software reusability have been categorized in a number of ways Common categories are e Commercial software packages e Code fragments e Application generators e Requirements analysis e Design specifications The above list was compiled from articles by Horowitz Ref 2 and Jones Ref 1 and is not comprehensive Any computer language based software development tool used more than once meets this broad definition of software reusability Commercial software packages also called off the shelf software are not usually associated with the idea of software reusability However the use of off the shelf operating systems compilers and
12. any other automated system All data used by the program is manually entered by the user There is no requirement for a standardized database design or standard report formats There is no requirement to provide at the module level any interface or link with the integrated PMSS environment The single interface concern is between the GAT module source code and the supporting BTRIEVE database manager This interface is clearly defined in the BTRIEVE manual and can support the translation of the source code to C b Users The GAT module is not an operational module and is not provided as part of the integrated PMSS package The GAT module is provided on request to defense acquisition activities desiring to beta test the module The number of current users is unknown 24 c Functionality The GAT module maintains a database of tasking information which is keyed by a task number provided by the user When a new task number is added all information about that task is entered on the keyboard by the user Task information may be edited and tasks deleted whenever required Program commands are executed primarily by the use of function keys Most function key commands are listed in a menu which appears across the bottom of each screen Data about each task are displayed on a three screen worksheet The user can print a summary report of task information single screens of the task worksheet or the entire task worksheet using
13. establish the data paths between data entry screens and the database Each prototype used unique data naming conventions to maintain clarity about the status and origin of the data Write routines which assign data retrieved from the database to the data entry screen and roulines which assign data modified or added on the data entry screens to the database 82 Step 6B Program Routines Involving the Linked List Linked list connection routines include the display of the initialized and updated linked list updating link list data when the data changes highlighting a specific record for selection recognition when a specific record is selected _and managing varying numbers of records in the linked list Step 6C Test Connection Routines Test for the ability to manage a number of records ranging from zero to more than can be displayed at one time on the selection screen Test that the ordering of records on the linked list is maintained with the same criteria used by the database Manager Test for the smooth movement of the highlight bar from record to record and accurate selection of the highlighted record 7 Step 7 Evaluate the Print Routines Development Section The print routines development section was divided into two functional sections These two sections were report generation and quick printing The quick printing section required routines to print pre defined reports of all or part of the data in a single
14. executable program The user s manual is a better than average product which effectively teaches the user the operation of each program option The user s manual is straightforward and simple to use Problems with the manual were minor such as inaccurate information on the use of some keyboard keys and the lack of an index or a summary of available functions User operations documentation is non existent The requirements for the original program were collected by interviewing potential users No record of these interviews exist Other documentation that is not available for the GAT module is the programmer s manual and any documentation relating to the SDLC of the program Requirements analysis documentation and design specifications were not created when the program was developed and there is no documented record of any subsequent changes made 2 User s and Programmers No GAT module users were available for interview The programmers for the contracted software development organization which developed the GAT module were queried for general information on the development of the module Conversations with the organization revealed the lack of documented support for the module and the dependance on third party software to generate much of the code No major insights on the development were revealed and as expected little detailed information could be provided B PROGRAMMER AIDS Programmer aids refer to the s
15. from scratch For large programs capturing the information required from the source code to employ TMM matches the complexity and level of effort required to develop a compiler analyzer and may not be worth such an effort for one time use The major advantage predicted for TMM is the possibility of using abstractions from the VAG developed from one application for other program recovery efforts Ref 14 TMM is not suitable for one time application on relatively small programs such as the case study Attribute grammar technology also requires a significant investment in development time Grammars are required for both the source and target languages The intermediate language bridging the differences in the two languages is also required The investment in development time is not the most important drawback however Applying attribute grammar technology as a software translation tool yields only translated source code This methodology does not generate requirements or design documentation as output because neither are required as input For this case study the creation of this documentation is essential to support future maintenance efforts Automated source code translators translate directly from source code to source code without reference to life cycle documentation The translation problems such as those noted previously with the two specific automated translators reviewed illustrate further disadvantages Automated sourc
16. function commands Additionally a report generator is provided for developing and printing reports The report generator allows the user to select which data elements of the task will be included in the report The created report heading can be saved as a report format and the information requested can be printed for all tasks in the database 25 III UNDERSTANDING THE SOFTWARE A INFORMATION SOURCES Software that is not understood cannot be maintained Software maintainers commonly have little background or experience with the majority of programs they are tasked with maintaining It is imperative that software maintainers acquire detailed information about the target program Every source of information available must be examined in detail SDLC documentation is a very important source of information but the maintainer must exercise extreme care in reviewing this material The maintainer must determine how closely the documentation reflects the actual program and identify those portions of the documentation that are no longer accurate Programmer manuals and user s manual if available should be studied with a certain degree of skepticism It is unusual for the manuals to be updated when changes and modifications are made to the program and it is common practice for the manuals to be created after program completion with marginal regard for their accuracy Program source code is also a good source of information provide
17. general applications such as spreadsheets word processors and data base managers are intended to save development time dollars and programmer effort The software is developed once is centrally maintained and is immediately compatible to varying degrees on a variety of systems Any off the shelf software used in the development of more than one system can be considered reusable software Ref 1 Code fragments include library subroutines small and large subsystems and entire programs The use of subroutines and subsystems range from organization specific code that is reusable only on a particular system to generic routines that are independent of its environment High level languages such as ADA and C were designed to encourage the development and use of generic routines These routines can be included in any program written in that language Many of these routines are part of the standard library of functions commonly provided with that language s compiler Some organizations also maintain a database of local routines specific to that organization These routines are unique to the organization s particular software and hardware architecture Entire programs are reused when they are translated to a new environment Environmental changes generally entail translating a program to a new language or recompiling it for execution on different hardware The key is to preserve code in a form that can be reused Ref 3 An application
18. range of programming habits but the concern with this case study was the manner in which the program was organized The original program was not organized poorly It was organized consistently and was within the bounds of good structured programming practice However the form of the organization was different from the habits developed by the maintainer Since structured English is just one step removed from the source code this difference directly impacted the development of the structured English The maintainer was required to make a choice between following the style of the developer or adjusting the style to something more familiar Selecting the developer s style has the advantage of reinforcing the similarities between the original and translated programs and the disadvantage of working with a programming style that is 91 foreign to the maintainer Selecting the maintainer s style has the advantages of familiarity and the disadvantages associated with departing from a strict translation The decision made was to use the programming style of the maintainer The general functionality of the case study was well understood by the maintainer but there was uncertainty at the more detailed level about the advanced programming techniques used For this reason it was felt that maintaining a familiar programming style would yield more consistent understandable source code and would not detrimentally affect the overall functionalit
19. record Report selection is dependent on which display screen is currently visible when the print function key is used The report generation section required routines to create delete and print user defined reports No off the shelf software packages were used in the original program to aid in programming the print routines Routines specifically coded for the print routines were identifiable in the original program Direct reuse of the routines was selected as the primary translation method 83 Instead of recoding the routines manually the automated translator TPQC was selected as the means of translation TPQC was selected over PTC because of insurmountable problems recompiling the Unix based PTC program to run on the MS DOS operating system The use of a general automated code translator raised questions similar to those concerning the use of the more tailored Softcode code generator The translated C code must be compilable with only minimal additional effort by the maintainer The functionality of the translated C code must be identical to the functionality of the source code 8 Step 8 Program the Print Routines Development Section The following steps were defined to accomplish the programming of the print routines development section Step 8A Review the Source Code Documentation Document all source code routines thoroughly before doing the translation The automated translator adds no additional comments Thorough d
20. return gt key is used to transfer a selected data heading tc tue report generator but this is not shown in the menu e A predefined report format filename must always be manually entered even though a list of predefined report formats can be displayed Recommendation Completely redesign the report generator to correct the above inconsistencies This redesign is a significant departure from the original program and may not be applicable to this translation effort 4 Other The following remaining general inconsistencies were noted e Users are arbitrarily constrained to a limited number of lines describing the task e Saving changes to the task worksheet can be done only when quitting the program Recommendation Develop a design to allow unlimited except by available memory descriptions and include a save option in the bottom line menu which can be executed during add or edit operations 40 Iv SOFTWARE TRANSLATION METHODOLOGIES A OVERVIEW The software translation methodologies discussed in this chapter consist of four software reusability applications and one SDLC implementation Each methodology is described in terms of purpose functionality complexity and applicability to the case study An important aspect in determining the translation methodology which best fits the case study is the degree of commonality of the source and target language Languages are developed with certain strengths and weaknes
21. that the amount of code being produced was disproportionately large when compared with the size of the routines that actually used the data This approach appeared to be inefficient and an alternative was sought The maintainer contacted technical support personnel at Novell the makers of BTRIEVE for advice The Novell technical personnel could not provide a better method to streamline the conversion process or reduce the risk to the database They strongly recommended that the PASCAL created database be converted to the C format before beiny used by the translated C program Since the conversion program would only 98 have to be run once as part of the installation of the translated code and no user entered data from the original database would be lost the maintainer made the decision to convert the database The database conversion program proved relatively simple to build In retrospect the database conversion was the better of the two options No other significant departures from the planned steps were required for Step 6 7 Step 7 Evaluate the Print Routines Development Section The evaluation of the print routines development section was completed with no significant problems 8 Step 8 Program the Print Routines Development Section Two PASCAL programs were developed one for quick print routines and the second for the report generator from the original source code PASCAL programs were necessary in order to
22. the source code and making lists of routines and variables This method was lengthy tedious and resulted in many errors Tree Diagrammer was then used Tree Diagramner thoroughly mapped all calling routines in graphical format clearly described dependencies and flagged anomalies The level of nesting of other routines within each routine were also defined Tree Diagrammer proved useful because it pulled 31 essential data out of the source code and presented that data in an effective format This information facilitated understanding the control flow of the program Source Print aided the development of the data dictionary by providing a listing reporting the name and location used of every variable The use of software tools such as Tree Diagrammer and Source Print is not a panacea for the understanding of undocumented source code These tools automate certain processes that the maintainer must otherwise accomplish manually when preparing information needed to understand the source code Automated tools save valuable time It is still up to the maintainer to interpret and clarify the information generated into a clear picture of the program processes b Automated Source Code Translation Two automated translators were experimented with Specific information on the features of the automated translators are discussed in Chapter IV 2 Structured Systems Design Structured systems design Ref 5 is a well established metho
23. the Database Management Development Section The original source code used the software package BTRIEVE from Novell to perform database management functions BTRIEVE is a memory resident program that manipulates the database based on instructions provided by the source program 80 via a function call The database management evaluation was divided into two separate sections initial display and selection of database records and updating the database Updating the database required routines for adding deleting and updating database records The use of the BTRIEVE software simplified the routines required for these functions The best translation method for this section was determined to be direct reuse of original source code routines Initial display and selection of database records was managed in the original source code by copying all information about every record into an array which was modified concurrently with modifications to the database The reason for the lack of consideration of memory limitations is unknown It is possible that the number of records in the database was expected to remain small Additionally special routines were written to manage scrolling and highlighting among records a departure from the screen display methods used for other screens The translation method chosen for this section was a combination of use of the Softcode code generator to develop the selection screen and the addition of cer
24. use the automated translators which required as input an executable PASCAL program A prototype framework was built around the quick print routines and an executable PASCAL program was successfully developed Numerous problems evolved in attempting to translate the PASCAL program to C using To Minor idiosyncracies legal in PASCAL but confusing to TPQC were 99 changed to accommodate TPQC and translation was attempted several times TPQC continued to flag sections as unacceptable which were legal and compilable in PASCAL Most frustrating was the fact that the translation process aborted following the identification of each translation error There was no way to tell how many total errors would have to be corrected Error messages were sparse and left the maintainer guessing as to what the problem might Due to these problems and fading confidence in the ability of TPQC to produce acceptable C code the use of TPQC was abandoned for the quick print routines program The PASCAL code was reused by direct manual re coding which presented no difficulties TPQC did not get a second chance with the report generator routines The total size of the routines not counting the framework required to make it a complete program exceeded the 64 kilobytes program size limit required by Turbo PASCAL 3 0 An attempt was made to compile the code using version 4 0 but basic differences in the design of the two versions primarily the change from in
25. 11 12 Step 1 Develop the Design Specification Step 2 Evaluate Screen Display Data Entry Development Section Step 3 Program the Screen Display Data Entry Development Section Step 4 Evaluate the Database Management Development Section Step 5 Program the Database Management Development Section Step 6 Connect Database Management and Screen Display Data Entry Prototypes Step 7 Evaluate the Print Routines Development Section Step 8 Program the Print Routines Development Section Step 9 Connect the Print Routines Prototype Step 10 Test the Program Step 11 Review the Tested Program Step 12 Ongoing Translation Steps CORRECTION OF APPLICATION INCONSISTENCIES 1 Screen Movement 89 89 89 92 93 96 97 98 99 99 101 101 102 102 103 103 Function Key Use 3 Report Generator 4 Other VII CONCLUSION LIST OF REFERENCES INITIAL DISTRIBUTION LIST 103 104 104 105 107 109 I INTRODUCTION A DISCUSSION New software design and development costs are spiraling upward to the point where they will exceed the cost of the hardware When the costs of maintaining the software are also considered life cycle software costs constitute the largest portion of automated system costs Ref 1 The demand for more and increasingly complex software already outstrips the capability of programmers to produce it and the gap is expected to widen in th
26. 52 specification which supports the creation of the DAG and the identification of the LCA In the DAG the top node or root node represents the original specification and subsequent nodes represent correct but partial design decisions of the specification The DAG traces possible design decisions beginning at the root and ending when the last design decision is made The translator uses the DAG as a translation tool by searching backward up the nodes of the DAG toward the root until a node which encompasses both the original implementation and the desired implementation is found This node is the LCA The LCA becomes the new starting point on the DAG to trace the path to the desired implementation As the translator traces the path to the LCA he reverses the design decision at each node and identifies undesired portions of the original implementation The translator collects the undesired portions together as a single component for re implementation and traces a new path on the DAG from the LCA to the desired implementation 3 Attribute Grammar Technology The use of grammars to describe high level programming languages is an established instrument of programming lanquage theory and shows promise as a tool for source to source 53 language translation Attribute grammar technology is an extension of grammar based methodologies A synopsis of commonly used terminology Ref 15 is provided below to support the d
27. CKGROUND A RELATIONSHIPS AND DEFINITIONS 1 Software Reusability 2 Software Maintenance 3 Software Translation 4 Summary and Purpose B DESCRIPTION OF THE APPLICATION 1 Application Sponsor and Customers 2 Description of the Parent Application 3 The Government Activity Tasking GAT Module a Technical Description 1 Hardware 2 Software 3 Interfaces and Communications b Users c Functionality 15 16 17 20 20 20 22 23 23 23 24 24 25 III UNDERSTANDING THE SOFTWARE A INFORMATION SOURCES 1 Available Documentation 2 User s and Programmers B PROGRAMMER AIDS 1 Automated Tools a Deciphering Source Code b Automated Source Code Translation 2 Structured Systems Design C PROGRAM DETAILS 1 Structure 2 Control Flow 3 Variables lt 6 ase a 4 Input Sources 5 Output Destinations D APPLICATION INCONSISTENCIES AND RECOMMENDATIONS 1 Screen Movement 2 Function Key Use 3 Report Generator 4 Other IV SOFTWARE TRANSLATION METHODOLOGIES 26 26 28 30 30 30 31 32 32 33 33 35 36 37 38 38 38 39 39 40 41 OVERVIEW s gt ds fe ee a we a o B COMPARISON OF C AND PASCAL 1 Purpose and Goal of the Languages 2 Comparison of Features a Data Types b Statements c Program Structure C METHODOLOGIES REVIEWED 1 Inverse Transformation 2 Transformation
28. Computer memory limitations or other code optimization needs may have led to the modifications 77 STEPS IN THE TRANSLATION APPROACH 1 Develop the Design Specification 2 Evaluate the Screen Display Data Entry Development Area 3 Program the Screen Display Data Entry Display Area Develop a Prototype Identify Deficiencies Weigh Deficiencies Make Programming Decision Test the Programming Effort 4 Evaluate the Database Management Development Area 5 Program the Database Management Development Area a Develop a Prototype b Develop the Linked List 6 Connect the Database Management and Screen Display Data Entry Prototypes a Program Routines for a Single Record b Program Routines Involving the Linked List c Test Connection Routines 7 Evaluate the Print Routines Development Area 8 Program the Print Routines Development Area Review the Source Code Documentation Develop a Prototype Framework Perform the Automated Translation Test the Translated Code Make any Necessary Modifications 9 Connect the Print Routines Prototype 10 Test the Program Develop the Test Database Exercise all Program Functions Demonstrate Source Code Compilability Demonstrate the Use of a User Database Correct Discrepancies 11 Review the Tested Program a Delete Unproductive Code b Review Source Code Format c Review Embedded Comments 12 Ongoing Translation Steps a Revise Design Specifications as Necessary b Develop Upda
29. N INVESTIGATION OF THE METHODOLOGY FOR SOFTWARE TRANSLATION FROM PASCAL TO C OF AN UNDOCUMENTED MICROCOMPUTER PROGRAM 2 PERSONAL AUTHOR S Bet Charles W 3a TYPE OF REPORT 13b TIME COVERED 14 DATE OF REPORT Year Month Day 15 PAGE COUNT Master s Thesis FROM TO March 1990 120 6 SUPPLEMENTARY NOTATION j f j The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U S Government 17 COSATI CODES FIELD Software Maintenance Software Translation Inverse Transformation Methodology Undocumented Microcomputer Program Software Reusability 65 ABSTRACT Continue on reverse if necessary and identify by block number The purpose of this thesis is to investigate software reusability applications and the practical utilization of those applications in the performance of software maintenance The translation of a functioning program from one high level language to another was selected as the type of software reusability effort to be explored Five translation methodologies were investigated and the inverse transformation methodology was chosen to exercise the practical application of software reusability for a specific case study A design strategy and translation approach was developed based on the inverse transformation methodology The translation approach was followed in performing the translation of the case study The results of the applicatio
30. VIEWED 1 Inverse Transformation The inverse transformation methodology described by Sneed Ref 13 is based on the strategy of reversing the normal software cavelopment cycle Software is viewed at three levels which are an abstraction of the output of the structured analysis and design methodology Abstraction Levels are physical logical and conceptual Ref 13 and correspond to the source code design specification and requirements statement of structured analysis and design There are two steps in the process The first step applies reverse engineering techniques to retranslate the source code into an intermediate design schema The result of the retranslation is design documentation based on the intermediate design schema The second step applies standard software engineering principles to translate the intermediate design schema into a system specification The objective of the inverse transformation methodology is the creation of the requirements statement 48 Proponents of reverse engineering claim that viewing the software at this conceptual level improves software maintenance and reusability Ref 13 Inverse transformation is not the same as software restructuring Software restructuring is used to reduce maintenance costs by converting unstructured programs into structured programs Ref 13 The application of software restructuring does not require the recreation of the requirements statement or design specif
31. and processes of the system in existence at the time the new need was identified e The constraints affecting system developi ent such as budgets regulations and policies e The business objectives of the system to include definitions of the expected performance level and prioritizing the objectives e The criteria used to determine the degree of success of the development 69 e A general description of the inputs outputs and processes needed Without the above information the re creation of documents in the requirements statement such as the problem statement and data flow diagrams would not be accurate Therefore to avoid misleading program maintainers the requirements statement is not included in this case study C DESIGN STRATEGY Structured analysis and design tools described by Page Jones Ref 5 were used to develop the design specification for the case study The basic task of the inverse transformation methodology is to invert the normal design process by working backwards from the source code to the design specification Structured analysis and design defines the order in which each tool is created The inverse transformation methodology reverses that order which is described as follows Source code is used to produce structured English Structured English is used to produce the structure chart The structure chart is used to produce the data dictionary Each structured analysis and design tool is
32. anual coding were dcne modification of the generated code was selected as the best option The second problem concerned the method used by the code generator to calculate and display information computed from other fields on the screen The code generator required that the position of the decimal in a number had to be permanently assigned and hard coded into the program Variable decimal positions were not allowed Calculations based on decimal numbers were dependant on the pre defined position of the decimal The design specifications required that the user be allowed to use numbers with variable decimal positions that could be changed at the discretion of the user The problem was resolved by adding a new routine to handle decimal number data entry and revising the computation routines of the generated code The third problem was the lack of generated routines to manage function key and special keyboard key selection by the user to move between screens and perform special functions The code generator did provide shell routines to facilitate the manual coding process The largest manual coding effort for this development section was devoted to writing these routines 5 In evaluating code generation deficiencies the problem sections were not considered significant enough to warrant a decision to program the entire development section manually All problems were satisfactorily te olved and the resulting code conform
33. areas of the screen however on some screens the up down arrow combination is required and on other 38 screens the left right arrow combination is required The reason for this is not logically evident e Similarly the use of the PageUp PageDown keys also vary from screen to screen Recommendation Develop a consistent design which allows the use of all four arrow keys on every data entry screen Use the logical meaning of the PageUp PageDown keys for paging between worksheet screens 2 Function Key Use The following inconsistencies in function key use were noted e The meaning of function keys lt F5 gt in particular changes depending on the screen e Function key lt F9 gt used to change screen color does not appear in the bottom line menu and only works at certain places in program Recommendation Develop a design that consistently applies the same meaning to function keys Design bottom line menus which display all enabled function keys Add the lt ESC gt key to the menu for incremental backtracking to the main menu 3 Report Generator The following inconsistencies in the report generator were noted e The form generation routine assumes that 120 column print is always used requiring the user to do manual calculations to accommodate other sizes 39 e The size of individual data items can only be displayed one at a time complicating the process of creating a report heading e The lt
34. ch are not passed and minimize passing composite data if little of the data is actually used Revise the structure chart 4 Input Sources Input sources provide data not initialized or calculated by the program Input sources for the GAT module are user input from the keyboard database files and report format files Modules on the structure chart representing the 37 retrieval of this data are not as detailed as other modules because they use routines external to the program 5 Output Destinations Output destinations receive data for storage or display Output destinations for the GAT module are the monitor screen printer and disk drive Modules on the structure chart representing this data are not as detailed as other modules because they use routines external to the program D APPLICATION INCONSISTENCIES AND RECOMMENDATIONS Application inconsistencies noted in this section are the result of a review of the program strictly from a user friendliness and consistency of design point of view The decision to implement any or all of the following recommendations is based on the consideration of which requirement the status quo or the recommended change is most consistent with the design strategy adopted and takes the best advantage of the target language C 1 Screen Movement The following inconsistencies in screen movement were noted e Arrow keys are the primary method of moving the cursor to different
35. clude files to the use of units made this option infeasible The source code for the report generator used heavily nested procedures assembly language and frequent calls to hardware registers Manually recoding the original code was considered beyond the 100 experience of the maintainer The maintainer had a good understanding of the overall functionality of the report generator and could have coded the report generator manually from scratch but time did not permit this The translation of the report generator portion of the print routines development section was determined to be beyond the scope of this thesis 9 Step 9 Connect the Print Routines Prototype Skeleton routines were already available to link the quick print routines to the combined prototype Only minor difficulties were encountered in completing this step 10 Step 10 Test the Program In accordance with the testing procedure the test database was initialized with no records Records were added to test the record selection process and testing was conducted on function key and special keyboard key use During the testing of individual fields for correct error checking a major problem was discovered with the first worksheet screen used for data entry and update of single records No other screen was affected Previous testing of this screen had revealed no problems but only limited data entry into individual fields had been done Exercising additio
36. d by manually recoding the PASCAL routines into C for all major specifications open and close the database and add delete and update records Additionally special routines were created to manipulate strings without the terminating null character The linked list management routines were created manually because they did not exist in the original program The linked list routines were included in the prototype and testing was completed satisfactorily This approach failed during Step 6 The reasons for the failure are discussed in the next section Due to the failure the maintainer decided that it was not practical to use the original PASCAL created database The next best thing was to use the data stored in the original datakase to build a new C compatible database Step 5 was repeated with one additional step added A conversion program was written to convert the PASCAL database to its equivalent C compatible database Only slight revision to the prototype was required to accommodate the converted database and the special string Management routines were deleted 6 Step 6 Connect Database Management and Screen Display Data Entry Prototypes After the initial completion of step 5 conversion routines to manage the transfer of data between the database and the data entry screens were begun As the coding process continued the maintainer became aware that the conversion routines were taking up the bulk of the coding time and
37. d the need for a variable cross reference list Using this technique was possible because C supports variable naming conventions which are very similar to PASCAL and might not have been possible with certain other language combinations The original order of development for the design specifications should not be revised In general structured English is the first document that should be produced unless special circumstances as in this case study apply The development of the structured English constructs proved to be much more difficult than anticipated The maintainer s lack of experience with advanced programming techniques such as the use of overlays direct access of computer hardware registers and the use of inline assembly language was a large stumbling block These techniques were heavily used in the original source code and time constraints became a factor in researching and learning the techniques 90 Differences in personal programming style between the maintainer and the original developers were also a factor that was not initially considered Individuals develop programming styles that are familiar and comfortable and helps develop a habit of consistency In theory personal programming habits should not be a factor at the design level of development Ref 1 and even in practice may not be a problem for many programmers However it was a factor for the maintainer Personal programming style encompasses a wide
38. d the source code listing available for use correctly represents the executable program If possible the source code listing should come directly from the source code used to compile the program any other program listing may not reflect undocumented changes made to the executable program Other documentation that can resolve confusion or help make cryptic code more understandable is user operations documentation User operations documentation consists of all regulations instructions and policies of program users that pertain to the target program This is particularly true if the requirements documentation is not accurate or non existent User operations documentation normally provides much of the information the original programmers used in the requirements analysis Documentation is not the only source of information about a program If available for interview the original programmers and program users can provide important information The maintainer shouid not expect the original programmers to remember details about the program Typically a significant amount of time has passed since the programmers were directly involved with the target program and it is unlikely that the programmers can answer detailed questions about specific lines of code However questions about the general structure of the program and why certain decisions about that structure were made can be very revealing Program 27 users can add information to
39. de programmers with an efficient interface to computer hardware PASCAL was designed by Niklaus Wirth in 1969 Ref 9 for the following reasons e To provide a systematic and precise expression of programming concepts structures and development e To demonstrate that flexible language facilities can be implemented efficiently e To provide a good vehicle to teach programming by the inclusion of extensive error checking facilities The design goals of PASCAL and C were quite different PASCAL s restrictions were intended to encourage the development of reliable programs by enforcing a disciplined structure By strongly enforcing these restrictions PASCAL helps the programmer detect programming errors and makes it difficult for a program either by accident or design to access memory areas outside its data area In contrast C s permissiveness was intended to allow a wide range of applicability The basic language has been kept small by omitting features such as input output and string processing Ideally C was to be sufficiently flexible so that these facilities could be built as needed In practice this philosophy has worked well Ref 12 A prominent difference in the two languages is their treatment of variable types PASCAL is a strongly typed language C is not Strongly typed languages mandate that a variable can belong to only one type and that type conversion is accomplished by converting a variable value fiom
40. ded 4 Other The constraints on the number of lines of task description could not be eliminated but the number of lines allowed were increased Saving changes to the task worksheet can be accomplished any time upon exiting the task worksheet It is not required that the user exit the program to save changes to the task worksheet Additionally the user will always be asked if changes should be saved 104 VII CONCLUSION The purpose of this thesis was to investigate software reusability applications and the practical utilization of those applications in the performance of software maintenance The translation of a functioning program from one high level language to another was selected as the type of software reusability effort to be explored Five translation methodologies were investigated and the inverse transformation methodology was chosen A design strategy and translation approach was developed based on the inverse transformation methodology The translation approach was followed in performing the translation of the case study The results of the translation are encouraging The inverse transformation methodology provided the high level framework necessary to develop the translation approach From a practical viewpoint no significant departures from the steps described by the translation approach were necessary to satisfactorily complete the translation The additional advantage of this methodology was the creation of desi
41. des its definition components the elements which make up the components and its physical format Variable information is displayed in the data dictionary 36 The steps performed to determine variable information are Step 1 List all variables used in the program The utility program Source Print was usec to create tne list Step 2 Determine variables with composite data Identify the components and elements which make up the composite data Step 3 Describe each variables physical format Physical format is a description of the values that a variable may take on the number of characters allowed and the type of characters allowed This information is available in the declaration statement of the variable Step 4 Determine where variable values are assigned and used Add to module descriptions a list of variables used and variables changed by each module Variables used by a module unless created within the module are the module s inputs Variables changed by a module and subsequently used by other modules are the module s output Step 5 Update the structure chart Show communication between modules by tagging module connections with the variables input and output between modules Step 6 Correct module coupling problems Review the variables being passed between modules The following should be considered pass only variables essential to the module minimize the use of global variables whi
42. dology introduced to improve the development of reliable and maintainable software systems It is a methodology created specifically for software systems development and is the heart of the manual re implementation methodology described in Chapter IV The disciplined approach of structured design also served to provide a good framework in advancing the maintainer s understanding of the GAT module Structured design components such as structure charts pseudocode entity relationship diagrams and data dictionaries were used to represent the information gained from studying routines and variables The construction of these components completed the source code analysis C PROGRAM DETAILS 1 Structure Program structure defines the composition of the program by modules Modules are discrete blocks of code for which the inputs outputs and functionality can be described Modules are made up of other modules in a chain that begins with the program as a whole and ends with simple modules which cannot be further divided The division of a program into a modular structure and the relationship between modules is called partitioning and hierarchical organization Ref 5 The GAT module is constructed using overlays The purpose of overlays is to allow the creation of programs larger than the maximum that can be accommodated in computer memory The overlay procedure is complicated to execute but simple to explain Pro
43. e but there is m way tr know which ones are best or even useful The requirements 14 analysis and design specification phases of development are lacking even rudimentary libraries of reusable products although there is an abundance of successful software development efforts that would be invaluable if they could be effectively reused 2 Software Maintenance Software maintenance has been previously described as a critical phase of the software development life cycle that accounts for more than two thirds of the total life cycle cost While software reusability applications are useful development tools in the earlier phases of the SDLC it is during the maintenance phase that the most significant benefits can be gained Since software maintenance presupposes an existing implemented program by definition all software maintenance reuses software to some degree There are three basic kinds of software maintenance correcting program flaws upgrading programs with improved capabilities and translating programs into a new form Correcting post implementation flaws remove problems that detract from the program s basic functionality as defined in the requirements analysis The entire implemented program minus the flaws is reused There are no changes in the requirements or the operating environment Upgrading a program is adding functionality not addressed by the original requirements analysis There is no change to
44. e code translators are unsuitable as the primary methodology but are potentially valuable to speed the coding of certain portions of the source code The inverse transformation methodology is the only methodology reviewed that supports the evolution cf life cycle documentation and permits unrestricted determination of the 66 design development strategy The inverse transformation methodology uses the SDLC as the model for the software translation The output of this methodology includes both the translated source code and life cycle documentation to support future maintenance The methodology selected for the case study was the inverse transformation methodology Within the framework of this methodology automated source code translators were also used on portions of the source code as part of the design strategy Details of the design strategy employed are in Chapter V 67 V DESIGN STRATEGY AND TRANSLATION APPROACH A OVERVIEW This chapter describes the specific approach taken to develop the design strategy used with the inverse transformation methodology The design schema selected was structured analysis and design Step one of the inverse transformation methodology was the creation of the design specification The structured analysis and design tools used to create the design specification were structure charts data dictionary and structured English constructs The second step in the inverse transfo
45. e foreseeable future This bleak picture is the major motivating factor behind finding ways to reduce costs and make the most efficient use of limited programmer resources Software reusability addresses cost and resource limitations The reuse of already developed software has become an important area of research for software developers and is receiving more attention by software application purchasers B METHODOLOGY This thesis approaches the software translation case study in three steps e Understanding the program e Determining the translation methodology e Establishing the design specification This case study adheres to the strictest definition of software translation The case study does not include the correction of program flaws or upgrades to the program The need to correct program flaws or make program upgrades is often an overriding factor in decisions to initiate maintenance but for this case study it was assumed that the target program is both functional and useful in its present form The software translation was performed due to a change in the operating system requirements Further upgrade or modification of the program that may be desired is defined as a separate maintenance effort and is not addressed by this case study The first step in software translation is understanding the program This step assumes that the translator has no prior knowledge of or experience with the original application a common circum
46. e language for the modules The PMSS interface was written in C and is functionally capable of linking PMSS modules independently written in C without further modification of the module A standardized data format has not been formally addressed but is presently under consideration The PMSS is composed of twenty one modules which have reached at least the prototype stage and ten more modules in development or being planned The functions of these modules fall into one of seven categories program overview status program impact advisor functional analysis support information category data independent modules executive support and utilities The category of independent modules include all PMSS modules which have not been integrated The Government Activity Tasking module is an independent module that has been chosen as the target application for translation 3 The Government Activity Tasking GAT Module The phrase Government Activity Tasking refers to procedures for providing funding from one government agency to another government agency for the performance of specified project tasks The purpose of the GAT module is to provide the capability to track and manage project milestones tasks and funds assigned to other agencies It is intended as an executive or senior level manager module Ref 6 22 a Technical Description 1 Hardware The GAT module was programmed to run on the Zenith 248 microcomputer It requ
47. early understandable consist of modular routines and be documented with comments The code should perform the basic functions required by the design specification for the screen display data entry development section Step 3B Identify Deficiencies Compare the functionality of the prototype with the requirements of the design specification Identify as deficiencies design requirements that are not achieved by the prototype Step 3C Weigh Deficiencies Compare the programming effort required to make the prototype conform to the design specification with the effort of manual programming Take into account other factors beside the time required to do the programming Other factors are difficulty in Maintaining the code added complexity coupling and cohesion considerations and efficiency Step 3D Make Programming Decision Based on the evaluation of the code generation deficiencies make the decision to either modify the prototype or program the development section manually Complete the initial programming effort Step 3E Test the Programming Effort Test the program for conformance with the functionality required by the design specification For example numeric fields should not accept non numeric data entries fields which display computations based on other fields should be verified for correctness display only fields should not be modifiable etc Correct errors and retest until the program works correctly 4 Step 4 Evaluate
48. efinition to still fail the acceptance test Incomplete or poorly defined requirements and inaccurate design specifications lead to problems during acceptance testing Implementation involves the completion of user manuals the training of users and installation of the new program in the actual operating environment Software maintenance functions consist of correcting program flaws upgrading programs with improved capabilities and translating programs into a new form because of changes in hardware operating systems technology or language Software maintenance once was considered independent of the development life cycle The functions of maintenance were rarely addressed during development Once implementation was complete the products of development other than the completed program were not used to aid maintenance Software maintenance is now included as part of the system development life cycle for two reasons Foremost is the fact that more than two thirds of the total cost of a system from inception to scrap heap is spent on maintenance Ref 1 The inclusion of software maintenance as part of the development life cycle focused attention on these osts Second valuable information documentation and lessons learned from the other phases of development is being lost in the maintenance phase This information has proven useful in lowering the huge cost of maintenance Software developers are spending increasing time on
49. el These levels are further broken down into a number of more specific specification levels defined as entities structures and relationships The purpose of this breakdown is to reach a level of detail comparable to that of the design schema Once this is accomplished the translator links design elements and specification elements together into a set of assignment criteria that guide the retranslation from the design schema to the system specification The end result is a system specification that is an exact conceptual representation of the original source code The system specification serves as the baseline for system Maintenance and module reuse 2 Transformation Based Maintenance Model The Transformation based Maintenance Model TMM is a methodology that allows practitioners to recover abstractions and design decisions that were made during implementation Ref 14 TMM relies on the use of a prototype tool called Draco The Draco paradigm is based on the idea of a domain specific super language that would map onto a real software language Draco provides the methodology to abstract language dependant design decisions into a more generic form represented by nodes on a graph Design decisions that are dependant on prior design decisions are linked together and alternate methods of achieving the same design decision are shown as alternate paths on the graph This graph called a Directed Acyclic Graph DAG beco
50. ers Ref 4 For the case study the structure chart was developed from the structured English constructs This method was chosen for consistency with the design strategy Additionally a draft structure chart which excluded specific details of communication between modules was developed during the study of the source code The development of a draft structure chart is a technique that will improve the software maintainer s understanding of the program It is recommended but the maintainer should expect significant changes in the final product 3 Data Dictionary The data dictionary records information about data used in the program Each piece of data is given a name Each name is associated with specific information about the 74 range of values it may acquire and its physical format In structured analysis data dictionary entries are drawn largely from data flow diagrams in the requirements statement In the inverse translation methodology both the structure chart and structured English constructs are used as sources for data dictionary entries Data dictionary data can be one of two types composite data or data elements Composite data is data that can be divided into simpler components Composite data is defined in the data dictionary as the sum of its components Components of composite data can be either composite data or data elements Data elements are data which cannot or should not be subdivided into simple
51. esented by the hierarchical arrangement of the modules Control flow in the GAT module was traced by studying program execution and manually walking through the source code It is not required that control flow be represented in the structure chart For this case study 35 however control flow is defined in the structure chart and was determined concurrently with understanding program structure The following steps were performed concurrently with the same step number used to determine structure above Step 3 Bond related modules together by order of execution Define the order in which modules at the same level are executed Step 4 Define the order in which routines are executed and dependencies between routines Step 6 Correct module coupling problems Coupling is the degree of interdependence between modules Ref 5 Low coupling is desirable because modules should be independent of each other The following should be considered modules should not branch into the inside of another module and modules should not alter the statements of other modules Step 7 Develop the initial structure chart Show the modules and the connections between the modules Do not include the data communicated between modules at this point 3 Variables Variables are names used to refer to stored data The data stored may be a single element or composite data made up of more than one component Information about a variable inclu
52. eutenant Commander United States Navy B S United States Naval Academy 1978 Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN INFORMATION SYSTEMS from the NAVAL POSTGRADUATE SCHOOL harch 1990 g hii O LS Charles W Bell Author Approved by Rachel Griffin Thesis Advisor Daniel R Dolk Second Reader fe David Whipple Chairman Department of Administrative Sciences iii ABSTRACT The purpose of this thesis is to investigate software reusability applications and the practical utilization of those applications in the performance of software maintenance The translation of a functioning program from one high level language to another was selected as the type of software reusability effort to be explored Five translation methodologies were investigated and the inverse transformation methodology was chosen to exercise the practical application of software reusability for a specific case study A design strategy and translation approach was developed based on the inverse transformation methodology The translation approach was followed in performing the translation of the case study The results of the application of the methodology to the case study is described and the methodology is evaluated on its usefulness as a tool for software reuse iv II TABLE OF CONTENTS INTRODUCTION sss s e A DISCUSSION evs an A a Ge B METHODOLOGY BA
53. f the SDLC are performed 5 Automated Source Code Translation Automated source code translators take the source code of the original program as input and output source code for the translated program Automated translators are rated in four areas Effectiveness of syntactic conversion of like functionality e Degree to which unique functional differences are addressed e Efficiency in converting unique functions to similar constructs e Overall effectiveness of the translation Automated translators vary in the degree in which language differences are addressed A minimally successful automated translator must correctly convert all like functions between the source and target language and should flag code that the translator could not convert For example converting the assignment statement in PASCAL to the assignment statement in C requires changing the operator to These simple translators are effective only between very similar languages and on uncomplicated source programs For example in C there is an equivalent function to the 61 PASCAL repeat loop The do while loop in C can be directly substituted by the automated translator whenever the repeat loop is encountered Other differences such as the inability in C to pass parameters by reference are much more difficult to handle with an automated translator Differences that are not addressed should be flagged by the translator when such code is enco
54. generator is a software product used to generate other software programs Originally application generators were too complicated for non programmers and had very limited usefulness The code produced by early application generators was extremely inefficient and required additional manual programming effort before the code could be used Application generators are becoming increasingly sophisticated using non procedural languages to provide a non technical interface with the user They are also becoming more versatile in being able to create programs for a variety of requirements The user enters information into the system as prompted by the generator and then the application generator produces an executable program The created program is bug free eliminating the usual debugging effort and future modifications can be made using the application generator Programs created with application generators are still inefficient and there are few commercial systems capable of handling large complex software requirements The most important categories of software reusability are requirements analysis and design specification the first two phases of the software development life cycle To appreciate the importance of the reusability of these two phases a description of the software development life cycle is necessary The software development life cycle SDLC defines the steps to develop a software program beginning when a need is reco
55. gn specifications for the translated program which can be used in future maintenance efforts The use of one tool for software reusability the inverse transformation methodology created a second tool for software reusability the design specification Finally the versatility of the inverse transformation methodology which allows unrestricted determination of the design strategy permitted the use of additional reusability tools such as code generators Significant development time was saved despite the documented problems in using these tools 106 LIST OF REFERENCES 1 Jones T C Reusability in Programming A Survey of the State of the Art Tutorial Software Reusability IEEE Computer Society Press 1987 2 Horowitz E and Munson J An Expansive View of Reusable Software Tutorial Software Reusability IEEE Computer Society Press 1987 3 Freeman P Reusable Software Engineering Concepts and Research Directions Tutorial Software Reusability IEEE Computer Society Press 1987 4 Whitten J and Bentley L and Ho J Systems Analysis and Design Methods Times Mirror Mosby College Publishing 1986 5 Page Jones M The Practical Guide to Structured Systems Design Yourdon Press 1988 6 GAT User s Manual EG amp G Washington Analytical Services Center 1988 7 Phillips J Creating a Baseline for an Undocumented System Or What Do You Do With Someone Elses Code
56. gnized There is no standard universally accepted SDLC The SDLC presented in this thesis represents one approach The SDLC phases are e Requirements Analysis e Design Specifications e Coding and Testing e Implementation e Maintenance The specific steps within each phase are listed in Figure 1 Ref 4 2 SURVEY THE 2 DEFINE SITUATION RQMTS LIDENTIFY NEED REQUIREMENTS ANALYSIS A EVALUATE USERS ALTERNATIVE SOLUTIONS MAINTENANCE ONGOING 5 ORGANIZE 7 DEFINE RGMTS OPERATING CORRECTION OEFINITION ENVIRONMENT UPGRADE TRANSLATION DESIGN CO SPECIFICATIONS 6 PERFORM DETAILED DESIGN DEE 8 CODE PROGRAM LE 9 TEST PROGRAM pe ee l 10 ACCEPTANCE IMPLE TESTING MENTATION CODING AND TES AG Figure 1 Software Development Life Cycle 10 The requirements phase is initiated by identifying a need This need can address a problem an opportunity or a directive It can come from a user specified request a mandate by the organization or higher level authority or other source Once identified the need becomes a requirement The requirement must be carefully defined in terms of exactly what functions are required without getting into specifics on the type of hardware or software The requirement definition includes background on why it is needed the advantages development of this requirement would provide the resources that would have to be committed and the i
57. gram routines are collected together into subprograms Routines within a subprogram cannot call another routine in a different subprogram because only one subprogram can be present in memory at atime The GAT module has a main program and four subprograms The main program is always present and it ensures the appropriate subprogram is available in memory when required Ideally subprograms are functionally self sufficient and do not require any of the routines needed by other subprograms In the GAT module however several routines are needed by all the subprograms This need is accommodated by duplicating the desired routines in every subprogram that requires the routines The end result is effective but inefficient The GAT module runs successfully and stays within the memory limits imposed by the Borland PASCAL compiler but wastefully duplicates code and increases disk size Learning the structure of the program was accomplished using the following steps Step 1 Define the overall function of the program This module is the top level module of the structure chart Track Tasks was defined as the overall function of the GAT module Step 2 Describe obvious high level functions as modules Ask the question What does this do of user decision points in the program Menu items and function key selections are the best clues to use to gain a general idea of the main functions of the program 34 Step 3 Describe
58. grammed with skeleton routines for calling pre defined reports and the report generator Insert print routines into the combined prototype and add print routine function calls to the skeleton routines Test the function calls 10 Step 10 Test the Program Acceptance test criteria for the case study is limited to the following requirements e Retain as a minimum the level of functionality existing in the original program e The translated program must be compilable by the Microsoft C optimizing compiler e Current users must be able to utilize existing databases without requiring re entry of data 85 The development of a test database was required to exercise the functionality of the translated program No user or sponsor test database was provided Therefore the test database was developed by the software maintainer which restricted the effectiveness of the functionality test The following steps were defined to test the translated program Step 10A Develop the Test Database Develop the test database in concert with exercising program functions Begin with a database with no records The target size for the test database is twenty records Step 10B Exercise All Program Functions Add delete and modify records Exercise all function key options available ror each display screen Test the use of keyboard keys not defined as options to check for unexpected results Note discrepancies Step 10C Demonst
59. h development section will be independently evaluated to determine the best method of programming Rule 6 The priorities for determining the best programming method are from highest to lowest priority direct reuse of scurce code modules use of a software tool to generate code and manual programming from scratch Rule 7 Program coding must accurately reflect the design specification 76 Within the scope of the transformation rules the translation process was developed into a sequence of specific steps Steps in the translation process are summarized in Figure 4 The purpose and expected results of each step are described in the following sections The actual results and difficulties in executing each step is described in Chapter VI 1 Step 1 Develop the Design Specification See section C of this chapter 2 Step 2 Evaluate Screen Display Data Entry Development Section The original source code for this section was developed using a code generator to produce a skeletal framework The framework underwent major modifications which profoundly reduced the usefulness of the generated code This appears to be a duplication of effort for reasons which can only be surmised given the lack of development information available Possible reasons include The developers may have been unaware of the limitations of the code generator e User acceptance of the unmodified displays and data entry processes may have been poor e
60. ication The extent of restructuring done as part of the inverse transformation process is dependent on the rigor in which the original development was conducted Poorly designed and unstructured programs require more restructuring than well designed programs Defining the transformation rules required tc accomplish the first step in the inverse transformation methodology is dependant on the structure of the programming language as input and the structure of the design schema as output Ref 13 In other words the translator starts with the software language of the source code defines the design schema to be used and then determines the transformation rules Transformation rules are built by inverting the process of generating code from design documentation For 49 example if the design schema defined by the translator requires relational tables to describe a database then relational tables should be created from any database described in the source code The specific process to accomplish the first step transformation is left to the translator In the second step the translator takes the design documentation from the first step and creates the system specification based on the Entity Relationship E R model E R models are described by Whitten Ref 4 Two levels of abstraction are defined by the inverse transformation methodology micro and macro which represent the two levels of detail required in the E R mod
61. ication This step did validate the appearance of the display screens and data entry checking routines and should have been included as an independent step within Step 3 This step represents the maintainer s only departure from the steps defined within Step 3 There were no cases where the generated code incorrectly implemented a design specification There were however three specifications that were beyond the capability of the code generator A description of how the code generator manages data fields is required to explain the problem In general data fields are defined by the code generator as one of two types display only fields and fields that can be modified by the user Modifiable fields are highlighted when the cursor is placed on that field Display only fields which cannot be accessed by the user are coded in such a way that they were not very accessible to the maintainer The design specifications required that on one screen the user would highlight the field desired and select various options for action on the highlighted field The specifications further required that these fields could not be modified by the user The code generator was unable to produce a field that could be highlighted but not modified Two options were considered to resolve the problem modifying the generated code and manually coding the problem screen Since a significant amount of useful generated code would be 94 discarded if m
62. ires a minimum of 384 kilobytes of random access memory RAM and one floppy disk drive The module was designed for use with an Expanded Graphics Adapter EGA graphics hardware card with a color monitor the use of the module with other graphics adapters or a monochrome monitor is not guaranteed An Epson compatible printer is required to print reports 2 Software The GAT module was written in PASCAL and compiled on the Borland Turbo PASCAL compiler version 3 0 The program is broken into four chain files called in as overlays during program execution The purpose of breaking the program into smaller sections was to keep the size of each program segment below 64 kilobytes the maximum size limit of a PASCAL program written for the version 3 0 Borland compiler All database support for the GAT module is provided by an off the shelf software product called BTRIEVE by SoftCraft Inc BTRIEVE is executed as a RAM resident utility program which is called by the GAT module whenever access to the database is required BTRIEVE is automatically invoked when the GAT module is executed and is transparent to the user In addition to the database managed by BTRIEVE there are program generated files which contain the information on report formats created by the report generator 3 Interfaces and Communications Although the GAT module is one of many modules that make up PMSS there is no data sharing or other communication with PMSS or
63. iscussion Grammars Grammars formally specify the syntax of the language with a set of rules describing the set of all statements that are legal and correct in the language A grammar imparts no meaning to the constructs it describes only what is syntactically legal Statement A statement is a source code fragment For example the PASCAL fragment in brackets C A B is a statement A statement is comprised of a sequence of tokens Tokens A token is a string of characters that make up a portion of a statement Tokens are normally keywords arithmetic operators variable names etc Parsing Parsing is the process of analyzing a sequence of tokens and identifying the sequence with the correct language construct described by the grammar Productions Productions are the rules of grammar used when parsing to describe all the statements of the language Parse Tree A parse tree is a graphical representation of the grammar of the language and is used in the analysis of a program or any portion of a program such as a statement See Figure 3 54 Sentence to be Analyzed THE CAPTAIN COMMANDS THE SHIP Parse Tree sentence predicate article verb direct object article noun J l J COMMANDS THE SHIP Grammar l THE sentence subject predicate root node subject article noun nonterminal predicate verb direct object nonterminal direct object article noun nonterminal article THE termi
64. ize the error This is a context sensitivity issue Using attribute grammar technology attributes are inserted into the grammar which cause additional analysis of the sequence of tokens The analysis then includes steps that recognize statement 1 as a variable declaration checks a symbol table and returns a value that indicates if X has been previously declared In statement 1 X has not yet been used variable X is added to the symbol table and statement 1 is accepted as a legal construct When statement 2 is analyzed it is recognized as an integer assignment Evaluation steps are performed which checks the symbol table and returns a value indicating variable X is defined but not as an integer as required by the attribute grammar Statement 2 is flagged as an error The important difference is that in order for statement 2 to be evaluated properly prior knowledge about statement 1 was necessary An attribute grammar will not work with a language for which it was not specifically constructed Revising the above example the same two statements are written in C as follows statement 1 char X statement 2 X 1 Although the string of characters are largely the same the PASCAL operator has been replaced by the C equivalent operator and the sequence of tokens has changed A different context free grammar and attribute grammar is necessary to describe the language Using the same chain of logic described
65. less successful with languages in which the syntax is disparate because many low level constructs are needed to commonly represent the two languages A second method the 59 least common multiple method addresses this issue by requiring the development of attribute grammars for both the high level function and its low level constructs for every disparate function Although the least common multiple method minimizes translation inefficiencies in dissimilar languages there is a corresponding increase in the complexity and level of effort required to develop the attribute grammars Ref 16 There are some language constructs which cannot be represented by attribute grammars One example of this is complex pointer arithmetic commonly used in C Such non representable constructs are flagged without translation A different translation methodology for these constructions is necessary to complete the translation Ref 16 4 Manual Re implementation Manual re implementation is the development of the program as if no previous program existed The full software development life cycle is performed The requirements statement is generated from user defined requirements and a study of the current environment The source code of the previous program and all implementation decisions and other information arising from the development of the earlier program is ignored Based on the new requirements analysis the remaining steps o
66. m This ordering helps ensure one pass compilation of the program but reduces program readability In C order of appearance is much more flexible e Variable Visibility C provides very flexible methods of expanding or restricting the scope of variables encouraging the use of shared private variables to improve reliability PASCAL requires the use of non local variables or strict parameter passing to get information between routines e Passing Parameters In PASCAL parameters can be passed between routines by either value or reference Cc parameters can be passed only by value In C the address of a variable must be passed to achieve the same effect as passing by reference PASCAL requires that the number of variables passed equal the number of variables expected by the called routine C does not check that the number of actual parameters equals the number of formal parameters expected by the called routine Entry and Exit Points PASCAL routines must be entered and exited from the beginning of the routine and its end respectively In C specific control statements such as break and continue allow entry and exit from arbitrary places within a control structure 47 e External Routines and Variables C allows the use of external routines and variables encouraging the development of libraries of routines The version of PASCAL used in the case study does not support external routines or variables C METHODOLOGIES RE
67. m translation to be viable although these reasons are often the impetus for consideration 4 Summary and Purpose Figure 2 summarizes the relationship between software maintenance reusability and translation Software maintenance is the final phase of the software development life cycle Software maintenance receives particular attention because of the disparate percentage of life cycle costs associated with performing this phase Software reusability applications hold promise to reduce software maintenance costs In particular software reusability applications can be used to support software translations 17 SOFTWARE MAINTENANCE SOFTWARE CORRECTION SOFTWARE UPGRADE SOFTWARE TRANSLATION MANUAL RE IMPLEMENTATION S W REUSABILITY APPLICATIONS COMMERCIAL PACKAGES CODE FRAGMENTS APPLICATION GENERATORS REQUIREMENTS ANALYSIS DESIGN SPECIFICATIONS Figure 2 Relationships 18 The purpose of this thesis is to investigate software reusability applications and the practical utilization of those applications in the performance of software maintenance Of particular interest is the use of the design specifications phase of the SDLC as the primary vehicle for reuse The software translation of a previously developed microcomputer program from one high level language to another was chosen for the case study Of critical importance in the reuse of early SDLC phases is the thoroughness of req
68. mentation Note Holistic Technology AB 1987 108 INITIAL DISTRIBUTION LIST Defense Technical Information Center Cameron Station Alexandria Virginia 22304 6145 Library Code 0412 Naval Postgraduate School Monterey California 93943 5002 Director DSS Directorate DRI S Defense Systems Management College Fort Belvior Virginia 22060 5426 LCDR Charles Bell 326 Valley Road Etters Pennsylvania 17319 LCDR Rachel Griffin Code CS gr Naval Postgraduate School Monterey California 93943 5002
69. mer manuals 8A VI CASE STUDY APPLICATION A OVERVIEW The design strategy and translation approach described in the preceding chapter was applied to the case study The practical application of the case study is intended to test the validity of the approach Departures from the translation approach during the application of the case study are evaluated The results of the actual execution of each step and any difficulties encountered are described B TRANSLATION APPROACH APPLICATION Each step is numbered and titled exactly as in Chapter V 1 Step 1 Develop the Design Specification The development of the design specification required the creation of three documents in the following order structured English structure chart and data dictionary However it was more practical to produce the data dictionary first using the sof