Home
- HSR - Institutional Repository
Contents
1. 51 3 1 6 Traversing the AST 4 lt 4 a aaa 51 3 1 7 Modifying and Rewriting the AST 52 3 1 8 Dealing with global variables 53 3 1 9 Two step transformation 54 3 1 10 Default Refactoring 59 3 1 11 Extracting common code 61 3 2 Problems and Decisions 2 22244 ee eee 62 3 2 1 std string vs const std string 62 3 2 2 std string member functions vs algorithm func OO oe ok Sa a ee BN Ge Bh A a 63 3 2 3 Multiple rewrites in the same AST subtree 65 32A TOSS ou vce es a ow Be SP Ae we ek 66 3 2 5 Checking if a variable name exists 69 3 2 6 Exception and error handling 70 Contents 3 2 7 Marker position calculation 4 Refactoring real life code 4 1 4 2 4 3 Statistics x ad ooo S ee EESDREEEEL EGE Refactoring XBMC oe we see a are GS SC GE ee we S 4 2 1 First real life test 4 2 2 Second real life test Where the plug in needs manual corrections 4 3 1 How to refactor C string definitions 4 3 2 How to refactor C string assignments 4 3 3 How to refactor C string parameters 4 3 4 Known issues 002 5 Conclusion 5 1 5 2 Achievements 0 0 00 eee ee ee Future Work o A User manual A l A 2 A 3 Installation gt crias drid Usage and configuration lt v cy ci e
2. E 3 Implementation Applying the Char pointer refactoring to the code in Listing 3 4 results in the code shown in Listing 3 6 Listing 3 6 After step 1 int main const std string str my string char found strstr amp str begin ing if found nullptr int index found str c_str std cout lt lt Found substring at lt lt index lt lt std endl 5 Step 2 Char pointer cleanup refactoring In the second step the Char pointer cleanup refactoring searches C string function calls such as strstr strchr etc that are executed on std string objects These calls should mostly be the result from ex ecuting the Char pointer refactoring as in Listing 3 6 The CharPoint erCleanupChecker marks such function calls The programmer can then trigger the corresponding quick fix via the marker which starts the Char pointer cleanup refactoring The main job of the refactoring is to replace the C string function with a suitable std string member function Often the member function doesn t have the same return type as the C string function Thus the variable that holds the return value of the function call and its subsequent occurrences have to be refactored as well In the case of Listing 3 6 applying the Char pointer cleanup refactoring would lead to the code shown in Listing 3 5 Sometimes the Char pointer cleanup refactoring isn t as straightfor ward as in this example For exam
3. and rend can be used instead of the normal iterators to get the same behavior as the strrchr func tion There would be more benefit if the plug in refactors also the resulting char pointer This could be difficult because pointers can be used in a lot of different ways Task 1 Handling Null Values If a programmer uses the strchr or strrchr function to find out whether a character is inside a string or not he or she will check if the result is a null pointer or not The corresponding std string member function returns std string npos if the given character was not found in the string So the plug in should scan the code for corresponding null checks and change them For more details see the listings below I7 2 Analysis Listing 2 24 Before the refactoring Listing 2 25 After the refactoring int mano int main char sli mail std string s Qmail af Carrere ap O af e find firstlof 7O7 t Side aisitagainicesmeni pois ik contains sign contains sign y J 3 This refactoring can also be done with the std find function This function returns an iterator to the end of the string if the character is not found Listing 2 26 Before the refactoring Listing 2 27 After the refactoring int main 4 int main Y char s mail std string s Qmail af strony Caa 76 jo IEC sitdk f anidiGs ibegani iuas end OR s end 4 contains sign contains sign
4. std string s Hello char s Hello sizet l 5 sizet Sarco ULSA size Stain strlen s Stari cout sari 15 2 Analysis 2 4 2 strchr strrchr The functions strchr and strrchr have the following signatures strchr const char str int character Stach char str int character strrchr const char str int character strrchr char str int character They return a pointer to the first strchr or last strrchr occurrence of a given character in the C string str If the character could not be found in this string both functions return a null pointer The functions can be replaced with the member functions find first_of and find last_of of the std string class Both functions are over loaded several times Listing 2 33 shows the versions that best match the signatures of the strchr and strrchr function Listing 2 18 Signatures of the member functions find_first_of and find_last_of size_type find_first_of CharT ch size_type pos 0 const size_type find_last_of CharT ch size_type pos npos const These std string member functions have a different return type In stead of a pointer they return an index of type size_type that denotes the position of the character A simple way would be to convert the index back to a pointer and leave the rest of the program unchanged An example can be found in the listing below Listing 2 19 Before the refactoring Listing 2
5. However if the input value is invalid they behave differently There fore it may not be possible to simply replace the std atof function with the std stod function like that For example it may be nec essary to catch the exception and adapt the error handling accord ingly 2 6 2 atoi atol atoll The functions atoi atol and atoll are very similar See their function signatures below atomilconst char str atol const ehar str long long atoll const char str These functions take a C string and convert it into the data type int long or long long respectively The converted value is returned if the conversion was successful If the conversion fails the integer value 0 is returned If the converted value is out of range the return value is undefined Similar functions can also be found in the lt string gt header They are called stoi stol and stoll The signatures of these functions are shown in Listing 2 106 Listing 2 106 Signatures of member functions stoi stol and stoll int stoi const std string amp str size_t pos 0 int base 10 long stol const std string amp str size_t pos 0 int base 10 long long stoll const std string amp str size_t pos 0 int base 10 Also these functions return the same value as their corresponding C string function if the conversion was successful However if the con version could not be performed an std invalid argu
6. EEE Pe belololeefo lo After the concatenation it again looks like Figure 2 1 2 1 3 Char buffer on the heap Sometimes the size of a string is not known at compile time Such strings can be dynamically allocated on the heap using malloc as shown in Listing 2 5 Listing 2 5 String allocation on the heap char duplicateString const char str char copy char malloc strlen str 1 strcpy copy str return copy int main char str duplicateString A string do something with str free str 10 2 Analysis In this case clients of the function duplicateString have to free the resulting string after they are done with it because strings that are allocated with malloc aren t freed automatically 2 2 C strings vs std string 2 2 1 Memory management If a programmer wants to concatenate two C strings he or she has to make sure that there is enough space reserved in the destination buffer to hold the contents of both strings as well as the terminating 0 character If the sizes of the strings are known at compile time this can be done by defining a char array on the stack as shown in Listing 2 6 const char stri const char str2 char str3 14 Heddon woman iie SCCIS EES y exe 5 strcat str3 str2 do something with str3 y However often the sizes are unknown at compile time In the book The C Programming Language by Bjarne Stroustrup Str97 there is
7. enail c_str found 1 print domain part of email print domain part of email address address print domain_part print domain_part c_str ii With a call to the member function c_str a std string can be con verted back to a C string However this C string is const and cannot be modified 2 4 4 strcmp The C string member function stremp has the following signature Listing 2 44 Signature of function strcmp int strcmp const char stri const char str2 The function compares the strings str1 and str2 If both strings are equal the return value is zero If the return value is greater than zero it indicates that the first C string is alphabetically after the second string otherwise the return value is lower than zero This function can be replaced with the compare member function of the std string class The function signature that best matches can be found below Listing 2 45 Signature of member function compare int compare const CharT s const 22 2 Analysis See an example of this refactoring in the code below Listing 2 46 Before the refactoring Listing 2 47 After the refactoring int main int main char al Apple std string a Apple char b Banana char b Banana std cout lt lt strcmp a b std cout lt lt a compare b dy 2 4 5 strncmp The function strncmp has the following signature Listing 2 48 Signature of functio
8. Listing 3 23 Example code with macro include lt iostream gt define HI Hello World int main char s HI char pointer with macro std cout lt lt g lt lt sid 7 endl i Name Value v m locations IASTNodeLocation 2 id 153 v a lo ASTFileLocation id 156 m fLength 5 gt fLocationCtx i LocationCtxFile id 167 m fOffset 94 va li ASTMacroExpansionLocation id 157 gt m FContext LocationCtxMacroExpansion id 163 m fLength i 1 m FOffset 0 Figure 3 11 IASTNodeLocation array of Listing 3 23 When a macro is used in the middle of a node one can just use the offset of the first IASTNodeLocation object to get the start position of the node The end position of the node can be calculated by building the sum of the offset and the length of the last IASTNodeLocation object But this calculation will not work if the macro is at the end of the node In this case the last IASTNodeLocation object can not be used to calculate the correct end position because it has wrong offset and length values 12 3 Implementation A workaround to calculate the correct end position of the node is to take the offset of the first location and add to it the length of the node s RawSignature So the node will be marked and not the whole line that contains the node The code for this workaround is shown in Listing 3 24 Listing 3 24 Calculate positions of node IASTNodeLocation nodeLocations node
9. equals test 4 reportProblem PROBLEM_ID name iy return PROCESS_CONTINUE 3 1 7 Modifying and Rewriting the AST Eclipse CDT comes with a set of classes that build the infrastructure for modifying code by describing changes to AST nodes The AST rewriter collects descriptions of modifications to nodes and translates these descriptions into text edits that can then be applied to the origi nal source It is important to note that this does not actually modify the original AST That allows to for example show the programmer the changes that will be made by a quick fix Listing 3 3 shows a bit of sample code that replaces a node in the AST collects the description of the changes in a Change object and finally performs the change on the original AST AST14 52 3 Implementation Listing 3 3 AST rewrite example ASTRewrite rewrite ASTRewrite create ast rewrite replace oldNode newNode null Change c rewrite rewriteAST CEVA c perform new NullProgressMonitor marker delete catch CoreException e e printStackTrace 3 1 8 Dealing with global variables The C string refactoring has to be able to deal with global variables Those do have a node structure in the Abstract Syntax Tree that is different from the node structure of local variables A local variable is defined as a DeclarationStatement node in the AST Inside this DeclarationStatement is a nested SimpleDeclaration
10. erminator plug in but there is still room for improvement Further optimization would be worthwile and there are other refactorings that could be added in addition to the existing ones such as e Refactoring of strings that are allocated on the heap e Refactoring of string parameters e Refactoring of string return values Declaration of Authorship We declare that this bachelor thesis and the work presented in it was done by ourselves and without any assistance except what was agreed with the supervisor All consulted sources are clearly mentioned and cited correctly No copyright protected materials are unauthorizedly used in this work Place and date Toni Suter Place and date Fabian Gonzalez VI Contents 1 Task description 4 1 1 Previous work acc 2 ee ale se a ee he ae Bx 4 1 2 Proble lt i ar sde s poaa EE poe ee A 4 1 3 Solution aaa Re A ee OR ewe HE 5 LA Our goals Abe ee eee eee Bk ee ca ERS Ok 5 1 4 1 Features ek 4k ee Coe RE ke eS 5 1 4 2 Additional refactorings 6 1 5 Time management ica A 7 1 6 Final release ae a Bd ee es A ES 2 Analysis 8 2 1 The structure of C strings 2442 4456246444 0 8 2 1 1 Const string literal ooa aaa 8 2 1 2 Char array on the stack 2 66464444 9 2 1 3 Char buffer on the heap 10 2 2 C strings vs std string ooa 6445 Be eR 11 2 2 1 Memory management 11 2 2 2 Performance 2 2 4 24 aaa a 12 2
11. node Global variables do not have a DeclarationStatement node Their SimpleDeclaration node is a direct child of the root node Transla tionUnit See Figure 3 5 for an example Figure 3 5 AST structure Global vs local variable a K ICPPASTTranslationUnit MyTest cpp a 9 ICPPASTFunctionDefinition main int main 7 ICPPASTSimpleDecISpecifier int Aa ICPPASTFunctionDeclarator a 5 IASTCom oundStatement int pi 3 14 7 ASTDeclarationStatement gt Y IASTSimpleDeclaration pi 53 3 Implementation 3 1 9 Two step transformation Consider the code in Listing 3 4 Listing 3 4 Before refactoring int main const char str my string char found strstr str ing if found nullptr int index found str std cout lt lt Found substring at lt lt index lt lt std endl ii When a programmer uses the plug in in order to convert the C string str into a std string object this would ideally result in the code shown in Listing 3 5 const std string str my string std string size_type found_pos str find ing if found_pos std string npos int index found_pos std cout lt lt Found substring at lt lt index lt lt std endl This refactoring would involve a lot of changes some of which the programmer might not expect For example the refactoring of the strstr function means that the type of the variable
12. A WA ioa 5 heh cae ak ck Gece oe amp Ge eG A 2 2 Configuration esco bee ee OR ER RO De installation 2 ee 74 74 75 76 83 83 83 84 84 85 87 87 88 1 Task description This section outlines our bachelor thesis and our goals for it 1 1 Previous work This bachelor thesis builds on the results of our term project Point erminator Gon13 The main goal of that project was to improve the quality of existing C code by getting rid of pointers First we did an analysis of the various ways pointers can be used in C Then we developed an Eclipse CDT plug in that refactors and replaces pointers automatically Specifically the plug in is capable of doing the following refactorings e Replace C strings with std string objects e Replace C arrays with std array objects e Replace pointer parameters with reference parameters 1 2 Problem The Pointerminator plug in refactors C style strings to std string ob jects However it doesn t do much more than that There are several standard C functions that are commonly used to analyze and modify C strings For example the function strcat can be used to append one C string to another These functions tend to have bad perfor mance This is because C strings are just pointers to an array of ASCII characters that is terminated with a 0 character and the size of the string isn t stored anywhere Because of that the size has to be recalculated each time s
13. With forty occurrences in the top 100 C projects strpbrk is not used very frequently The function is typically used inside assignments The following example from the file xbmc filesystem iso9660 cpp shows an assignment and a condition that could be refactored successfully with the plug in 81 4 Refactoring real life code Listing 4 23 Before the refactoring Listing 4 24 After the refactoring char pointer std sstring pointer lls pointer char filename pointer char filename while strpbrk pointer while pointer finde first son CNNI l std string npos std string size_type pointer_pos pointer find_first_of pointer pointer_pos std pointer strpbrk pointer string npos amp pointer Pd pointer_pos nullptr 1 strcspn The strcspn function is also used sparely in the top 100 C projects There are two occurrences inside the code of XBMC None of them could be refactored correctly because in both cases there are pointer operators that modify the content of the C string pointer strspn Strspn is only used fourteen times in the top 100 projects typically inside an assignment Only one occurrence of the function strspn could be found found inside the XBMC code Because the pointer variable is manually modified using pointer arithmetic the plug in was unable to handle this case memchr With a bit more than hundred occurrences in the top 100 repositories m
14. const void ptr2 size_t num The memcmp member function compares the first num bytes of memory blocks of the two pointers The function will return a zero if both blocks are identically Otherwise it returns a greater or lower value than zero depending on the lexicographical order of the first value The compare member function of the std string class has the same behaviour The function signature of Listing 2 49 can be used for this refactoring Because both functions have the same return value the refactoring just need to change the function call An example can be found in the listings below Listing 2 53 Before the refactoring Listing 2 54 After the refactoring int main A int main char a google co std string a google co char b google ch char b google ch std cout lt lt std cout lt lt memcmp a b 6 a compare 0 6 b 0 6 2 4 7 strpbrk The function strpbrk has the following signature It finds the first character in the C string dest that is also in C string str and then returns a pointer to that position in dest If no such character exists the functions returns NULL 24 2 Analysis In the standard header lt algorithm gt there is a function find _first_of that works similarly Listing 2 56 Signature of function find_first_of template lt class InputIt class ForwardIt gt apura ds Mo Apis np puti ti asti ForwardIt s_first ForwardIt s_last Inst
15. memmove Memmove is a function that is often used in seperate statements In the XBMC code the memmove function is mostly used with buffers that don t represent strings These cases can t be handled by the CharWars plug in memcpy Also this function is used mostly as a seperate statement One occur rence that is used to copy C strings can be found in the file lib lib modplug src sndfile cpp See an example of the refactoring below Listing 4 15 Before the refactoring Listing 4 16 After the refactoring char sztmp 40 std string sztmp sztmp reserve 40 memcpy sztmp sztmp replace 0 32 m_szNames nSample 32 m_szNames nSample 0 32 strchr The strchr function is typically used inside assignments or if statement conditions The following example that could successfully be refactored can be found inside the file xbmc lib timidity timidity m2m cpp Listing 4 17 Before the refactoring Listing 4 18 After the refactoring char program_str 20 std string progran sti U program_str reserve 20 if strchr program_str la progrand stred aul al A AND l std string npos strrchr The strrchr function is also often used inside assignments An occurrence that shows the typical usage and could be refactored correctly is inside the following file xbmc linux LinuxTimezone cpp The char pointer cleanup refactoring has not been performed because the variable p is afterwards modified wit
16. y J ly dy If the pointer is passed to a function or in other special cases where the pointer can not be replaced the plug in should still be able to produce a valid pointer The first example shows how this is done with the find first_of member function of the class std string Listing 2 28 Before the refactoring Listing 2 29 After the refactoring int main 4 int main Y char s mail std string s Qmail size_t pos s find_first_of 0 const char p strchr s const char p pos SECIS vrs Y Be eee a pos nullptr puintidpy e print p H ir The following example uses the std find function to refactor the same code 18 2 Analysis Listing 2 30 Before the refactoring Listing 2 31 After the refactoring int main int main char sii Woran IMi std string s naili auto pos std find s begin s end Q const char p strchr s Q const char p pos s end amp pos nullptr principii paginiG Gps i 2 4 3 strstr The function strstr has the following signature Listing 2 32 Signature of function strstr const char stratr const char stri const char str2 It returns a pointer to the first occurrence of the substring str2 in the string strl If str2 is not a substring of strl the function returns a null pointer The class std string has several overloads of a member function called find that does a similar
17. 2 3 Readability ado A AR 13 2 3 Pointers vs iterators esos ias 13 2 4 Analyzing C string functions 15 241 Bild eses ea E e e a ERE ee a 15 2 4 2 strchr strrehr e ds ete a ERs 16 Sao AMB e A A ER 19 2AA EME ara a a Sa ek 22 240 SIC y ce shes AH A e 23 2 4 6 memcemp a o ras Red 6 el A ee 24 2AT SUPOTE oaks ek Se BS A ee Be 24 2 BIE x ec e ed ee ea ea es eG ek 25 2 4 9 SUP gue sora bo EE Bae Ee eee oy 26 e s aga bh ee he be eee RE OS 27 Contents 2 5 Modifying C string functions lt 44 664 26k 4 28 2 5 1 streat STGO os s ee ee due e 28 2 0 2 EQUIP 4 4 4 4 6 ARA 29 po s weg oe ei Poe o e a ss 30 2 5 4 Strnepy 244464 44 a e he ae 31 Zo MEMIMOVE 4 sa crs De ara GK a De A Pee 32 2 5 6 M MiCPy III 33 2 6 Converting C string functions lt lt esse a 34 20l AOL 24 riei we 2S eee bee aa A 34 2 6 2 atoi atol atoll ci ek doe aa 35 2 6 3 strtol strtoll sg enue Ca aye ee Ee OES 36 264 strtoul strtoull oops 37 2 6 5 strtof strtod strtold 4 2 4 2 ees ae ae 37 2 6 6 strtoimax strtoumax 38 2 7 Refactoring example o 39 Implementation 46 3 1 Overall architecture and functionality 46 3 1 1 The refactoring cycle 46 3 1 2 Parser and Abstract Syntax Tree AST 47 3 1 3 Bide on ko dd AAR ESS HEED EOS A9 3 1 4 The index oo eb eo oe BBadwadess 50 3 1 5 The plug in components
18. C code some markers can not be resolved These markers were omitted for the creation of the statistics Due to the fact that resolving all markers would exceed the scope of this thesis only the first 150 have been checked All markers have been tested without changing anything manually If the code compiled afterwards without errors the marker counted as solved otherwise it counted as unsolved Table 4 2 shows the amount of resolved and unresolved markers Table 4 2 Refactoring statistics Markers set Markers tested Solved Unsolved 776 150 72 48 78 52 In the following subsections there are some examples of C string func tions that have been found inside the XBMC code and could be refac tored correctly with the CharWars plug in To provide for as many functions as possible an example sometimes some small code changes have been taken before applying the refactoring strlen The strlen function is used in a wide variety of contexts Many calls are inside If statement conditions and assignments The function is also often used for index calculations asserts and function arguments If strlen is used to calculate the length of a string literal it can not be refactored with our plug in The code of the following example that could be successfully refactored can be found in the file lib UnrarXLib pathfn cpp inside XBMC s code 76 4 Refactoring real life code Listing 4 1 Before the
19. Severity po TO Appearance A BS gt A totoola Suggested parenthesis around expres amp Warning gt 5 PluginTests gt Build Suspicious semicolon 2 Warning outline is not Code Analysis Unused function declaration 2 Warning ailable gt Code style Unused return value A Error gt Debug Unused static Function ae Warning gt Editor Unused variable declaration in file sco amp Warning File Types Use reference parameters instead of 2 Warning mirar Use std array instead of C Array 2 Warning Language Mappings Use std string instead of C Strings 2 Warning gt NewC C Project W Use std string member Functions inst 2 Warning gt Profiling y O Security Vulnerabilities gt Property Pages Setti Format String Vulnerability A Warning Task Tags Customize Se Template Default Va ChangeLog Restore Defaults Apply Q cei D 2 Problems Tasks El Console X Properties 1 Call Graph x amp mele ry lt terminated gt PluginTests C C Application home hsr workspace PluginTests Debug PluginTests 6 5 1 hello World Figure A 6 Deactivate marker 94 A User manual A 3 De installation To de install the plug in the following steps need to be performed First press on Help and select About Eclipse File Edit Source Refactor Navigate Sea mo IIS ov Per E Bac c E Project Explorer B B Tog p BS 7 7 gt 5 PluginTests An outline is not t Bug vancement available Abou
20. a good example that shows how much code can be involved to achieve a relatively simple thing The example is shown in List ing 2 7 11 2 Analysis Listing 2 7 Before the refactoring char address const char iden const char dom int iden_len strlen iden int dom len strlen dom char addr char malloc iden_len dom_len 2 strcpy addr iden addr iden_len strcpy addr iden_len 1 dom return addr int main char email address someone gmail com do something with email free email ii The function address returns a new C string that contains the email address built from the identifier and the domain part If the program mer uses std strings instead the code becomes much more elegant and readable This is shown in Listing 2 8 Listing 2 8 After the refactoring std string address const std string amp iden const std string amp dom return iden dom 5 int main 4 std string email address someone gmail com do something with email The class std string takes care of memory management and releases the memory once the variable email goes out of scope Therefore the call to the function free is not necessary anymore 2 2 2 Performance As shown in section 2 1 C strings have a compact structure and take up very little space While this can be an advantage in computing environments where memory is scarce e g in embedde
21. f char s l 42 std stringis Wes char pEnd long n long n std strtol s amp pEnd 10 Sta stow ce star cout lt lt n std cout lt lt ny i 36 2 Analysis 2 6 4 strtoul strtoull Both of these functions are similar to strtol and strtoll They also set the out parameter str_end to the position up to which the conversion could be performed successfully Only the return type is different Listing 2 112 Signature of function strtoul and strtoull unsigned long strtoul const char str char str_end int base unsigned long long strtoull const char str char str_end int base These function can be refactored with the stoul and stoull func tions from the lt string gt header The signatures of both functions are listed below Listing 2 113 Signature of function stoul and stoull unsigned long stoul const std string amp str size_t pos 0 int base 10 unsigned long long stoull const std string amp str size_t pos int base 10 The following listings show how the function strtoul could be refac tored Listing 2 114 Before the refactoring Listing 2 115 After the refactoring int main int main char SiL SAO std string s 42 char pEnd unsigned long n unsigned long n std stoul s Side es ba bo nOs erp Emde Ons stadii icout lt lt mi Stan Cout lt S Ds 3 2 6 5 strtof strtod strtold The strtof strtod and
22. functions using a com bination of std string member functions and functions from the stan dard header lt algorithm gt Analyzing C string functions e strlen Determines the length of a C string strcmp Compares two C strings strncmp Compares n characters of two C strings memcmp Compares two blocks of memory strstr Searches a substring inside a C string 1 Task description memchr Searches a byte inside a block of memory strchr Searches a character inside a C string strrchr Searches a character inside a C string in reverse order strpbrk Returns a pointer to the first occurrence of any character from the second C string inside the first C string strcspn Returns the length of the initial part of the first C string not containing any of the characters that are part of the second C string strspn Returns the length of the maximum initial segment of the first C string that contains only characters from the second C string Modifying C string functions strcat Appends one C string to another strncat Appends n characters of one C string to another strcpy Copies a C string into an existing char buffer strncpy Copies n characters of a C string into an existing char buffer memcpy Copies one block of memory into another If the blocks overlap the behaviour is undefined memmove Copies one block of memory into another The blocks may overlap strdup All
23. getNodeLocations IASTNodeLocation firstLoc nodeLocations 0 int start firstLoc getNodeOffset int end firstLoc getNodeOffset node getRawSignature length 13 4 Refactoring real life code This section describes how the plug in performs in real life situations and which C string functions are frequently used It also shows in which context the functions are normally used 4 1 Statistics The top 100 C repositories on Github Git14c have been used in May 2014 to create the statistics The repositories have been sorted according their Github star rating This list of repositories contains well known projects such as node webkit texmate mongo db xbmc and fish shell The repositories were scanned to find occurrences of the various C string functions that the plug in supports Afterwards the context in which each function is used was analyzed and categorized according to certain patterns The CharWars plug in only supports these functions if they are used with C string arguments If a function like memchr is used to search a byte in something other than a C string it can not be refactored As shown in Table 4 1 we differentiated between the following con texts e If statement The function call happens directly inside an If statement condition e Assignment The return value of the function call is assigned to a variable e Return value The result of the functio
24. mostly as a single statement Out of three occurrences that could be found inside the XBMC source code none of them could be refactored correctly strdup This function is frequently used inside assignments and as return value In the XBMC source code it is often used as return value which can t be handled correctly by the CharWars plug in strcpy With more than a thousand occurrences in the top 100 repositories the strcpy function is used primarily on its own in a seperate statement The following example that can be found inside lib libmodplug sr c load_pat cpp shows how this function is refactored by the plug in Listing 4 11 Before the refactoring Listing 4 12 After the refactoring Sci ir tomidicyct e ea 200 static stae Stranen eima dy cis eE strcpy timiditycfg p timiditycfg p strncpy Like the strcpy function this function is also used mainly as a seperate statement The following occurrence that could be successfully refactored is lo cated inside tools TexturePacker SDL_anigif cpp Listing 4 13 Before the refactoring Listing 4 14 After the refactoring char version 4 std string version version reserve 4 strncpy version char buf 3 3 version replace 0 3 char buf gt De SE version 31 907 version 3 0 if strcmp version 87a 0 if version 87a amp amp strcmp version 89a 0 4 amp amp version 89a 4 79 4 Refactoring real life code
25. pooling const char stri Hello World ine mann Al const char str2 Hello World std cout lt lt std boolalpha lt lt stri str2 lt lt std endl The above program outputs true Because the strings are immutable and stored in global static memory the compiler can optimize by storing strings that have the same value only once All char pointers that are initialized with the same string literal then point to the same location in memory However GCC does have an option fwritable strings to disable string pooling This option also makes the strings mutable 2 1 2 Char array on the stack To create a mutable C string the programmer can declare a char array and initialize it with a string literal as shown in Listing 2 3 Listing 2 3 Char array on the stack int main char stiri Hello Norrani do something with str This string has the same representation as shown in Figure 2 1 How ever the string is mutable and stored on the stack Therefore the 2 Analysis allocated memory automatically gets freed at the end of the array s scope Char arrays can also be partially initialized leaving room to append another string to the first one as shown in Listing 2 4 char str iisi helio streatiistr World do something with str Before the call to the function strcat the array buffer looks like this ES 2 2 Structure of a C na NOR 10 11 12
26. std string searchStr ch auto found std search url begin url end searchStr begin searchStr end if found url end f found 1 a found 2 e y Stadt cout lt lt uri lt lt gta endi 14 2 Analysis Whether it is better to use a std string member function or a function from the standard header lt algorithm gt depends on what the char pointer is used for in the original code 2 4 Analyzing C string functions This section contains the analysis of different C string functions Most of the analyzed refactorings can also be used to refactor wchar_t strings 2 4 1 strlen The function strlen has the following signature Listing 2 13 Signature of function strlen Size t strlen const char str gt i This function returns the length of a C string The length is calcu lated from the beginning of the string to the null character without including it All C strings are terminated with a null character The class std string has a member function called size that also calculates the length The signature of this member function can be found in Listing 2 14 Listing 2 14 Signature of member function size std string size_type size const Most of the time size_type is the same as size_t so the two functions are very similar The following example shows how a simple use of the strlen function could be replaced Listing 2 16 After refactoring int main
27. that holds the return value of that function call changes Then the refactoring may also change the name of that variable in order to reflect its new type and adapt subsequent occurrences of that variable Since the programmer initially just wanted to convert the C string into a std string object this can be confusing Thus the plug in performs this refactoring in two steps each of which have to be triggered by the programmer 54 3 Implementation Step 1 Char pointer refactoring In the first step the CharPointerChecker marks C string variables that can be refactored into std string objects When a programmer applies the refactoring through a marker the CharPointerQuickF ix starts by replacing the C string definition with the definition of a std string variable Then it uses an ASTVisitor to find subsequent occurrences of the variable In order to handle the different refactoring cases there is a set of sub classes of the abstract StringRefactoring class Each subclass can per form a different refactoring For example there is a StrlenRefactoring class that can replace a call to the strlen function with a call to the size member function Table 3 1 shows all the StringRefactoring subclasses and how the C string functions are mapped into functions from the lt string gt lt algorithm gt headers For each occurrence of the variable the visitor tries to find an instance of an applicable StringRefactoring subclass and then
28. then responsible to solve the problem by modifying and rewriting the AST After the refactoring is done the quick fix deletes the marker and returns 3 1 6 Traversing the AST Checkers need to be able to traverse the AST in order to find prob lems in the code Similarly quick fixes traverse the AST to find all occurrences of the refactored variable to do additional adjustments The AST is built to be easily traversable using the Visitor pattern Gam94 Eclipse CDT comes with a few predefined visitors that can be sub classed to override the visit methods Only the visit methods that differ from the subclass need to be overridden Here is an example of a simple checker that uses a visitor to find variables with the name test and marks them with a marker When the user edits a file Codan automatically calls the checker s processAst method which 51 3 Implementation starts the traversal of the AST using the visitor implemented as an inner class For more details see the example in Listing 3 2 Listing 3 2 Visitor example class MyChecker extends AbstractIndexAstChecker public final static String PROBLEM_ID ch hsr pointerminator problems ExanpleProblem Override public void processAst IASTTranslationUnit ast ast accept new ExampleVisitor class ExampleVisitor extends ASTVisitor public ExampleVisitor shouldVisitNames true Override public int visit IASTName name if name toString
29. thing The signature of the overload that is the closest match to strstr is shown in Listing 2 33 The main difference between the two functions is the type of the return value While strstr returns a pointer find returns the index of the substring within str1 A conservative way of dealing with this problem would be to imme diately convert the index back to a pointer and leave the rest of the program unchanged Listing 2 34 and Listing 2 35 show an example 19 2 Analysis Listing 2 34 Before the refactoring Listing 2 35 After the refactoring int main int main 4 char s 100 Stare SCENES inl a ucin 20 DELE Sele Breakin 222 bs const char p const char p tratas Oi S caste 3s find or do something with p do something with p y UF The index can be converted back to a pointer by adding it to the char pointer returned by the member function c str However because the pointer returned by c_str is const this only works if the pointer is not used to modify the contents of the string Ideally the plug in would refactor not only the call to strstr but also the resulting char pointer and the subsequent code that uses this pointer This can be difficult because pointers can be used to do a lot of different things Often it is easier to use a function from the standard header lt algorithm gt that returns an iterator as described in section 2 3 Pointers vs iterators In the context of the strs
30. uses it to refactor that occurrence Finally after all occurrences have been refactored the quick fix adds the necessary include statements and completes the refactoring by performing a rewrite of the AST The process of the Char pointer refactoring is shown in Figure 3 6 in the form of a flow chart 55 3 Implementation Table 3 1 StringRefactoring subclasses StringRefactoring C string lt string gt lt algorithm gt subclass function function StrlenRefactoring strlen size StrcmpRefactoring stremp compare StrncmpRefactoring strncmp compare MemcmpRefactoring memcmp compare StrcatRefactoring strcat StrncatRefactoring strncat append StrcpyRefactoring strepy replace StrncpyRefactoring strncpy replace MemcpyRefactoring memcpy replace MemmoveRefactoring memmove 0 StrstrRefactoring strstr find StrchrRefactoring strchr find_first_of StrrchrRefactoring strrchr find last_of MemchrRefactoring memchr std find StrcspnRefactoring strespn find first_of StrspnRefactoring strspn find first_not_of StrdupRefactoring strdup StrpbrkRefactoring strpbrk find first_of ConvertingFunction atof atoi std stod std stoi Refactoring atol atoll std stol std stoll NullRefactoring Default Refactoring 56 3 Implementation Figure 3 6 Flow chart of the Char pointer refactoring No No
31. void destination const void source size_t num This function copies the first num bytes from the source to the des tination Source and destination can be overlapping The destination buffer has to be large enough to hold num bytes The memmove function can be replaced with the std string member function replace which has the following signature Listing 2 90 Signature of member function replace basic_string amp replace size_type pos size_type count const basic_string amp str size_type pos2 size_type count2 While using the memmove function one has to manually make sure that a 0 is also copied The replace function always ensures that the resulting string is valid An example of this refactoring can be found below Listing 2 91 Before the refactoring Listing 2 92 After the refactoring int maino A int main char s good goal std string s good goal memmove s st 5 4 s replace 0 4 s 5 4 StA Cout ss a Stan COUT lt SS eye i i 32 2 Analysis 2 5 6 memcpy The function memcpy has the following signature Listing 2 93 Signature of function memcp void memcpy void destination const void source size_t num This function copies the first num bytes of the source to the des tination Source and destination can not be overlapping otherwise it will lead to undefined behaviour and the size of each of them needs to be at least as big as the given paramet
32. 20 After the refactoring int main f int maino 4 char s Hello Salas ratas E Hello const char p const char p s c_str aeren Go LANs 6 aG aee O LE By calling the member function c str a const pointer to the first char of the string is returned By adding the index to the pointer it points to the correct position of the character However this refactoring doesn t take into account that it may be possible that the character is not part of the string in which case this calculation would be wrong 16 2 Analysis Instead of using a std string member function it is also possible to use the std find function of the standard header lt algorithm gt to find the first or last position of the located character This function uses iterators as input and returns an iterator The following listing shows its signature Listing 2 21 Signature of member function std find InputIt find InputIt first InputIt last const T value Using this function we benefit from the iterator return type that allows us to do a simpler conversion to a pointer An example can be found in the listings below Listing 2 22 Before the refactoring Listing 2 23 After the refactoring int main int main 4 char s World staristalns iones char ptr auto ptr sitd find sitrchr si 107 bora sr end On 0 ptr 7A eptr RS Siti Cout pb std cout lt lt amp ptr By The reverse interators rbegin
33. 28 Code to refactor char name person getName char name name person getName std cout lt lt Welcome lt lt name std cout lt lt Welcome lt lt name 4 3 3 How to refactor C string parameters To be able to refactor C string parameters one also needs to make some manual changes First one has to make sure that the function is never called with a NULL argument After that one needs to tem porarily rename the parameter and add a local C string variable with the original parameter name The refactoring is then performed on this new variable After the refactoring the new variable can be re moved and the parameter can be turned into a std string object with its original name 84 4 Refactoring real life code Listing 4 29 Original code Listing 4 30 Code to refactor void printString char s void printString char tmp_s char tis ui atO Ucat SSIS ENGL A RACINE SSIS h 3 4 3 4 Known issues Problems that may occur while using this plug in are described in this section Position of includes The correct position of the includes that will be added during the refactoring can not be calculated correctly if de code contains if direc tives like if Felse or Fendif The position will also not be calculated correctly if there are includes between the code In such cases it is recommended to add the includes manually before the refactoring is performed The plug in ch
34. BACHELOR THESIS SPRING TERM 2014 CharWars Replace C String Library calls with C std string Operations PaIFS AUTHORS Toni Suter amp Fabian Gonzalez INSTITUTE FOR SOFTWARE SUPERVISOR Prof Peter Sommerlad m HSR HOCHSCHULE F R TECHNIK m E RAPPERSWIL FHO Fachhochschule Ostschweiz Bachelor Thesis CharWars Rise of the fallen strings Fabian Gonzalez Toni Suter Spring Term 2014 Supervised by Prof Peter Sommerlad Abstract C strings are still in heavy use in C programs Additionally stan dardized C functions such as strcpy and strstr are often used to modify or analyze the content of the strings Unfortunately because of the fact that a C string is just a pointer to a zero terminated character array those functions have a lot of drawbacks regarding performance safety and readability The std string class from the C standard library and its mem ber functions provide a lot of the same functionality without these downsides Building on previous work from our term project Pointer minator we extended the existing Eclipse CDT plug in so that it helps a programmer to find and automatically refactor pieces of code that use C strings in an unfavorable way We started with an analysis of the various ways C strings and their related C functions are used in practice Based on that analysis we defined possible refactorings for a subset of the standardized C string functions We th
35. MemberFunctionCallExpression replaceNode isStringliteral nemNposExpression replace isFunctionCallArgument nenDeclarationStatement remove hasCStringType nemConditionalExpression insertBefore hasStdStringType newCompoundStatement transformToPointerOffset 3 2 Problems and Decisions This section lists the various problems that occurred during the im plementation of the refactorings and describes how we solved them 3 2 1 std string vs const std string Whenever the plug in replaces a C string definition with a std string definition it has to decide whether to make the variable const or not The main goal is to preserve the constness of the original code Since C strings are actually pointers they can have four states of constness char strings A C string variable that is defined as char is not const in any way The characters of the string can be changed and the variable can be repointed to another array of characters Thus it only makes sense to make the variable a non const std string const char char const strings On the other hand if a variable is defined to be either a const char or a char const this means that the pointer can be repointed to another array of characters but that the characters themselves can t be changed Therefore the decision whether to make the std string const or not is not as straightforward as before Howeve
36. alled Software Installation History Features Plug ins Configuration Name Version ld 2 charwars 1 0 0 201406050943 ch hsr charwars feature feati gt Eclipse IDE For C C Developers 2 0 2 20140224 000 epp package cpp This package contains the Charwars plugin to refactor pointers pe E EEE E Er E mel ri v lt terminated gt PluginTests C C Application home hsr workspace PluginTests Debug PluginTests 6 5 1 hello World Figure A 9 De install plug in 97 Bibliography AST14 edt 14 C14 Fel14 Fin14 Gam94 Git14a Git14b Git14c Gon13 Jen14 oP14 Pro14a Class ASTRewrite Class astrewrite July 2014 https www cct 1su edu rguidry eclipse doc36 org eclipse cdt core dom rewrite ASTRewrite html cdttesting ch hsr ifs cdttesting July 2014 https github com IFS HSR ch hsr ifs cdttesting Static Analysis for CDT Static analysis for cdt July 2014 https wiki eclipse org CDT designs StaticAnalysis L Felber Howto Develop CDT Refactorings 2014 FindBugs Findbugs find bugs in java programs July 2014 http findbugs sourceforge net R Helm R Johnson J Vlissides E Gamma Design Patterns Elements of Reusable Object Oriented Software 1994 Git Git July 2014 http git scm com HSR Git Scm manager July 2014 https git hsr ch GitHub Github July 2014 https github com T Suter F Gonzalez Pointerminator 2013 Jenkins Jenkins ci J
37. an testing framework CDT Testing seems to be more stable and reliable Testing checkers All unit tests for the checkers inherit from an abstract base class that defines the two methods configureTest and runTest The first method loads the value of the markerPositions property which is 66 3 Implementation defined in a seperate rts file see below This property contains the positions of the markers that ought to be set by the checker In the runTest method the unit test checks whether the markers at this po sitions have actually been set Listing 3 16 shows the implementation of the runTest method Listing 3 16 A unit tests for a checker Override Test public void runTest throws Throwable if markerPositions null assertProblemMarkerPositions markerPositions toArray new Integer markerPositions size Y else 4 assertProblemMarkerPositions IF The unit test classes load the corresponding rts files which contain the actual unit tests using a Java annotation They also override the method getProblemId to determine which checker should be tested An example of a unit test class for a checker can be found below Listing 3 17 A unit tests class for testing a checker RunFor rtsFile resources Checkers CharPointerChecker rts public class CharPointerCheckerTest extends BaseCheckerTest Override protected String getProblemld return CharPointerChecker PROBLEM_ID y Inside
38. are applicable to target environment EZ Contact all update sites during install to Find required software lt Back Next gt Cancel Finish Figure A 2 Install plug in Press next to go through the wizard and install the plug in At the end a prompt will ask you whether you want to restart Eclipse Click Yes After the restart you should be able to use the CharWars plug in 90 A User manual A 2 Usage and configuration This section shows how the plug in can be used and how parts of it can be deactivated A 2 1 Usage The CharWars plug in sets problem markers inside Eclipse Markers can be selected with a left click on the bug icon or with a corresponding short cut Ctrl 1 or Cmd 1 depending on your operating system when the cursor is inside the marked code This opens a new popup that shows the possible quick fixes that can be applied C C PluginTests src PluginTests cpp Eclipse mo S Srv 5 21 94 Wity Sy Gy Cvri yr OvG rQriaeaowv be al Soucy Q E c c Eb Project Explorer B 6 PluginTests cpp E isl Coni p v include lt iostream gt A jR B include lt cstring gt ERNO gt PluginTests int main char s exampleGhsr ch 2 u iostream Refactor C String into std string a cstring 42 main int if strstr s hsr ch std cout lt lt HSR address El Console X Ca su xe GSS terme lt terminated gt PluginTests C C Applicati
39. bout the drawbacks of C strings or because they have to work with an existing code base that already uses C strings Goal The main goal of this bachelor thesis is to extend the functionality of the Pointerminator plug in so that C strings and their related C functions can be replaced with std string objects and their member functions We first analyze the various ways C strings are used in practice and define possible refactorings It is important that these refactorings cover all sorts of edge cases so that the tool is reliable enough to be used in an existing C code base In the implementation phase we add the new functionality to the Pointerminator plug in Finally the plug in is tested with an exist ing C project This helps us to find problems and optimize the refactorings Results The results of our bachelor thesis can roughly be divided into three parts First we analysed the different use cases of C strings and their related C functions Based on these use cases we decided to put our focus on the C string functions shown in the following picture strstr strlen strchr strcat strchr strncat memchr strcmp strspn strncmp strcspn memcmp strpbrk strcpy strdup strnepy memmove memcpy C string functions that can be refactored by the CharWars plug in In the second phase we extended the functionality of the Pointermina tor plug in so that it can replace calls to those C
40. but important difference in the return values of the two functions When the string dest only consists of char acters that are also contained in the string src the function strspn returns the length of dest The function find_first_not_of returns the constant value std string npos instead Listing 2 65 and Listing 2 66 show how the refactoring could still be done Listing 2 66 After the refactoring int main std string s Bucle Galan S222 E chars oO gtd e cin 2 gt 87 size_t n scrspn Csh Qui size_t found Sernada tiesto not or Ca is size_t n found std string npos So Simao Found do something with n do something with n 3 2 4 10 memchr The function memchr has the following signatures Listing 2 67 Signatures of function memchr const void memchr const void ptr int value size_t num void memchr void ptr int value size_t num The function memchr searches through the first num bytes of the memory pointed by the prt argument for occurrences of the given value The function returns a pointer to the first occurrence of the value or a null pointer if the value is not found 27 2 Analysis With the std find function a similar behaviour can be achieved By adding the num value to the begin iterator we make sure that only the given characters are passed to the function For more details see the example below const char
41. checker reports it back to Codan Codan in turn sets a marker in the editor to make the programmer aware of the problem 46 3 Implementation Figure 3 1 Refactoring cycle 4 1 Triggers quick fix 1 Modifies code PS A A Source code 2 1 Detects changes Programmer 3 2 Sets marker 1 5 2 Fix code t l 4 2 Triggers quick fix Refactoring 2 2 Notifies checker 3 1 Reports problem KA 5 1 Fix code 4 3 Triggers quick fix oa H 4 The programmer can then select the marker and trigger the cor responding quick fix 5 Finally the triggered quick fix modifies the code in order to fix the problem Codan writes those changes back to the editor 3 1 2 Parser and Abstract Syntax Tree AST When a cpp file is opened in an Eclipse CDT editor the parser cre ates a tree representation of the code which is called the Abstract Syntax Tree AST The AST consists of nodes that all implement the IASTNode interface Each node has one parent node and an ar ray of child nodes The AST can be used by static analysis tools such as the CharWars plug in to traverse the code and find problems Most refactorings can be done by simply modifying and rewriting the AST Listing 3 1 and Figure 3 2 show an example of what the AST looks like for a short program 47 3 Implementation Listing 3 1 AST example int main int side 2s 5 int area side side Figure 3 2 AST tree of Listi
42. cker This way we can be sure that the pointer isn t just a pointer to a single character With a small change one can also refactor a C string that is initialized later First one needs to be sure that the pointer does actually point to a C string Then the definition can temporarily be changed into an initialization with an empty string literal After that the plug in 83 4 Refactoring real life code marks the string and the automated refactoring can be performed Finally the manual changes can be undone Listing 4 25 Original code Listing 4 26 Code to refactor char gender char gender if isMasculine 1 if isMasculine 1 masculine masculine gender feminine feminine 4 3 2 How to refactor C string assignments If a C string is initialized with a function call or another variable it won t be marked because the assigned value could be NULL or a pointer to a character instead of a C string If the programmer feels certain that the C string is always initialized with a valid string the plug in can still be used To be able to refactor such variables one needs to do the following First add a statement that defines and initializes the variable with an empty string literal Change the old definition into an assignment below the new definition Now the code can be refactored with the plug in After the refactoring the temporary changes can be removed again Listing 4 27 Original code Listing 4
43. colon char componentStart environmentValue char match strstr componentStart searchValueWithColon bool foundAnyMatches match NULL Because the componentStart pointer is used afterwards for itera tion over the characters it can be replaced with an iterator Also the strstr function call can be replaced with a std search function call that takes iterators as arguments The calculation of the bool 41 2 Analysis value needs to be changed because the std search function returns an iterator and not a pointer Listing 2 132 Possible refactoring auto componentStart environmentValue begin auto match std search environmentValue begin environmentValue end searchValueWithColon begin searchValueWithColon end bool foundAnyMatches match environmentValue end Listing 2 133 Example code to refactor while match NULL 4 Update componentStart to point to the colon immediately preceding the match char nextColon strstr componentStart while nextColon amp amp nextColon lt match componentStart nextColon nextColon strstr componentStart 1 The strstr function calls can be replaced with calls to the correspond ing std find function that takes iterators as arguments Because the variables match and nextColon are now iterators and not pointers anymore the checks have to be adapted accordingly as well Listing 2 134 Possible refa
44. ctoring while match environmentValue end 4 auto nextColon std find componentStart environmentValue end REDE while nextColon environmentValue end amp amp nextColon lt match componentStart nextColon nextColon std find componentStart 1 environmentValue end D E gt o 42 2 Analysis Listing 2 135 Example code to refactor Copy over everything right of the match to the current component start and search from there again if componentStart 0 Y If componentStart points to a colon go ahead and copy the colon over strcpy componentStart match searchLength y Gus Al Otherwise componentStart still points to the beginning of environmentValueBuffer so don t copy over the colon The edge case is if the colon is the last character in the string so match searchLengthWithoutColon 1 is the null terminator of the original input in which case this is still safe strcpy componentStart match searchLengthWithColon y match strstr componentStart searchValueWithColon Strepy calls can be replaced with the replace member function of the std string class The std search function can be used for the strstr call Listing 2 136 Possible refactoring if componentStart 0 environmentValue replace componentStart environmentValue end match searchLength environmentValue end else 4 environmentVal
45. d systems it also comes with a performance penalty String functions like strlen or strcat have to find out the length of the string to perform their task This is shown in a blog post by Joel Spolsky Spo14 in which he 12 2 Analysis shows how strcat the function which appends one string to another may be implemented Listing 2 9 Example from Joel on Software Back to Basics void strcat char dest char src while dest dest while dest x src It is easy to see that this code has O n complexity and therefore isn t very efficient Since the length isn t stored anywhere and there is no information about the buffer size the function has to walk through the string looking for its null terminator every time it is called Some times compilers may be able to optimize performance for literals at compile time but often this is not possible e g if a string is read from std cin The std string class has a member function size that has constant complexity according to the C 11 standard indicating that the size of the string is stored in internal state 2 2 3 Readability The examples in the subsection 2 2 1 Memory management show how much the readability can be improved under certain circumstances This not only makes the code easier to read but also lowers the risk for a programmer to introduce bugs when he or she has to modify the code 2 3 Pointers vs iterators C strings are of
46. d version of the refactoring is easier to read and easier to implement the plug in uses mostly std string member func tions to refactor C string functions As shown in Table 3 1 the only refactoring that uses functions from the lt algorithm gt header instead is the MemchrRefactoring which replaces calls to the memchr function with calls to std find 64 3 Implementation 3 2 3 Multiple rewrites in the same AST subtree As mentioned above after the Char pointer refactoring replaces the C string definition it loops through all the occurrences of the variable and tries to find an applicable StringRefactoring for each occurrence However this sometimes led to an issue if there were multiple occur rences in the same AST subtree For example consider the code in Listing 3 15 char filename myfile txt strnucpy filename strlen filename 3 doc 3 Figure 3 8 shows a compact version of the Abstract Syntax Tree of the second statement in Listing 3 15 Figure 3 8 Abstract Syntax Tree of Listing 3 15 The first occurrence of the string variable is handled by the StrncpyRefac toring and the second one is handled by the StrlenRefactoring The plug in uses the built in ASTRewrite class to modify the Abstract Syn tax Tree The way this class works is that it lets you record changes 65 3 Implementation to the AST and then performs them all at once when its rewriteAST method is called In the above exampl
47. e the StrlenRefactoring would first record a change in which the call to strlen is replaced with a call to the size member function Then the StrncpyRefactoring would record a second change in which the call to strncpy is replaced with a call to the replace member function Unfortunately it turned out that the ASTRewrite class can t handle this refactoring correctly be cause the subtree at the strlen node is affected by both recorded changes which caused one change to overwrite the other In order to avoid this limitation the plug in now changes the nodes in each statement directly without using the ASTRewrite Once all occurrences of the variable in the statement have been refactored the ASTRewrite class is used to replace the complete statement at once 3 2 4 Testing The Codan fC14 testing framework has been used to test the Point erminator plug in which was the result of our term project Unfor tunately there were problems with randomly failing tests even if no changes have been done to the code This seems to happen due to race conditions in the Codan testing infrastructure Because of that an alternative testing framework called CDT Testing cdt14 has been used to test the CharWars plug in CDT Testing has the following benefits e The tests check the entire program code not just certain parts of it e The code that will be tested is separated from the unit test for better readability e In comparison to the Cod
48. e name of the new variable is found_pos However it could be that a variable with the same name in the same block already exists This would cause an error to occur after the refactoring is done because two variables with the same name can t be defined in the same block If a variable with the same name is just used but not defined within the same block this would also lead to problems because the new variable would shadow the old one Therefore the plug in has to scan the current block to find out whether a variable with the same name is used or defined in it It does so using a visitor as shown in subsection 3 1 6 If the variable name is already in use the plug in modifies the name by appending an index number to the name and then scans the block again If the new name is taken as well it increments the index number and tries again until it finds a free name for the variable So for example found_pos first becomes found_pos2 then found_pos3 and so on 69 3 Implementation 3 2 6 Exception and error handling If a known exception occurs that can not be corrected by our plug in it will be logged to the internal error log of Eclipse This can be done with the built in logger functionality An example of such code can be found in Listing 3 21 Listing 3 21 Logging to internal error log Activator activator Activator getDefault activator getLog log new Status Status ERROR Activator PLUGIN_ID Statu
49. eactivate these four markers individually The following needs to be done to deactivate or reactivate a marker First you need to press on Windows and select Preferences File Edit Source R or a IS ov wor Bac c E Project Explorer X lm Hox 4 BS 7 z gt tS PluginTests An outline is not available Preferences E Console tH p xg En a EE By rr lt terminated gt PluginTests C C Application home hsr workspace PluginTests Debug PluginTests 6 5 1 hello World Figure A 5 Deactivate marker In the settings window open the section C C in the left panel After that you need to press on Code Analysis This view shows a list with all markers that are set by plug ins or CDT itself All problems listed there can be deactivated and reactivated individually The markers of the CharWars plug in are activated by default So there is no need to activate them when you use the plug in for the first time 93 A User manual The surrounded four problems that can be found in Figure A 6 are the ones that come from the CharWars plug in To deactivate one of theses problems one just needs to uncheck the corresponding checkbox To reactivate a deactivated problem one just needs to check the checkbox again By clicking Apply and then OK the settings are saved Preferences Code Analysis D O gt General Problems P Project Explorer 33 C C Nama
50. ead of a pointer it returns an iterator Listing 2 57 and List ing 2 58 show an example refactoring Listing 2 57 Before the refactoring Listing 2 58 After the refactoring int main int main chars TOO Sala aa E Sucio 22 El Sula Cibao 83 char nr strpbrk s 02468 std string search 02468 auto nr side fi nid etalk sit moe Sabe pa s end search begin search end toe Gene A if mr s end 4 Eve Comin lt lt me E PECSS Cen GK me BobogiaiO s I i In order to be able to use the find_first_of function the string 02468 needs to be assigned to a seperate std string variable In practice the plug in needs to make sure that the name of that variable doesn t interfere with other variables in the same scope 2 4 8 strcspn The function strespn has the following signature Listing 2 59 Signature of function strcspn size_t strcspn const char dest const char src Its functionality is very similar to the one of strpbrk It returns the length of the initial segment of C string dest that consists only of characters that are not in C string src 25 2 Analysis This C string function can be replaced by the std string member func tion find_first_of which does a similar thing The signature of the member function find first_of is shown in Listing 2 60 There is a small difference in the return values of the two functions When the string dest only consists of charact
51. ecause there still is a pointer variable named found 3 1 10 Default Refactoring As described in 3 1 9 the Char pointer refactoring tries to find a StringRefactoring subclass that is applicable for every occurrence of the string variable More precisely there is a for loop that loops through an array that contains an instance of each StringRefactor ing subclass The method isApplicable is called on each instance The corresponding StringRefactoring then checks whether it is able to handle the occurrence of the string variable and returns an integer The reason why the return value is an integer and not a boolean has to do with the fact that a single StringRefactoring can have multiple 59 3 Implementation sub refactorings each of which would then be denoted with a different integer value Internally each class defines an enum which describes the specific sub refactorings However since the StringRefactoring classes have different enums they return an integer instead A re turn value of 0 means that the StringRefactoring is not applicable Every other value means that the StringRefactoring can be applied Once the for loop has found an applicable StringRefactoring it calls its apply method and breaks out of the loop The order in which the StringRefactoring subclasses are tested doesn t matter because they are mutually exclusive That means that it isn t possible for two StringRefactoring subclasses to be applicable for the same
52. ecks if the includes already exist and will not include them Global variables Global variables that are defined as extern inside header files will also not be refactored correctly because the data type of the external def inition also needs to be changed This change has to be performed manually It can be done before or after the refactoring Pointer operators This plug in will fail to correctly refactor C string pointers that are manipulated with pointer operators In these cases a manual rewrite of the program logic is necessary Resource allocation If a C string is allocated on the heap and is used across multiple blocks as a shared resource the CharWars plug in can t refactor it correctly In this case the refactoring has to be performed manually 85 4 Refactoring real life code C files Files containing C code are automatically scanned by Codan There fore these files could also contain some markers from the CharWars plug in Because std string only works in C the refactoring doesn t work and these markers can t be resolved In this case the markers can be ignored or some components of the plug in can be deactivated NULL checks While a C string can be a nullptr and it makes sense to compare it against NULL a std string can not be a nullptr Therefore all NULL checks of the string will not be needed any more The programmer may need to change some parts of the logic or use std optional to achieve the same b
53. ehaviour as the original program 86 5 Conclusion This chapter describes the results of the CharWars bachelor thesis It also describes how this project can be continued and the plug in can be extended and improved With 65 percent of successfully refactored C strings inside XBMC xG14 many cases of the C string functions are covered by the plug in With some manual changes before or after triggering the refactoring even more C strings could be refactored There are only a few cases where the code can t be refactored even after making some manual changes 5 1 Achievements The following achievements were made during the bachelor thesis e The C string functions have been analyzed and compared to corresponding std string member functions e Refactorings for the C string functions have been implemented and continuously tested with unit tests e For special C string functions a second refactoring has been pro grammed to provide more flexibility and compatibility e A refactoring for a subset of the converting C functions e g atol has been programmed e The plug in has been tested with a real life project and the re sults have been documented 87 5 Conclusion 5 2 Future Work The CharWars plug in is an improvement over the existing Pointer minator Gon13 plug in It provides a lot more functionality and is well tested However there is still plenty of room for improvement Here are some of the features that co
54. emchr is used more often It can mainly be found inside assign ments None of the three occurrences in the XBMC project could be refac tored mainly because the function wasn t used to search inside a C string 82 4 Refactoring real life code 4 2 2 Second real life test In the first round of tests many occurrences could not be refactored because the string variables were defined at namespace or class level Because of that the CharWars plug in was unable to refactor them Therefore we improved the plug in to support these cases and cre ated the statistics a second time Again 150 occurrences have been tested and the amount of successfully refactored occurrences by the CharWars plug in increased 17 percent The result can be found in Table 4 3 Table 4 3 Refactoring statistics Markers set Markers tested Solved Unsolved 776 150 98 65 52 35 4 3 Where the plug in needs manual corrections This section describes how in some cases the plug in doesn t have enough information to determine whether a variable is a C string or not Sometimes it is then possible to do some manual adjustments that cause the plug in to behave correctly It also describes in which cases the plug in may fail to get a correct result 4 3 1 How to refactor C string definitions To avoid producing code that doesn t work only C strings that are de fined and initialized in the same statement are marked by the che
55. en added this functionality to the existing plug in wrote corresponding unit tests and documented its architecture Fi nally we tested the plug in in the code base of an open source C application called XBMC The results of these tests allowed us to op timize the plug in and to fix some of the problems that we discovered during testing Management Summary This bachelor thesis builds on the results of our term project Point ermintator Gon13 The main goal of the term project was to write an Eclipse CDT plug in that is able to eliminate pointers in existing C code In our bachelor thesis we want to extend the functionality of the Pointerminator plug in to allow the replacement of C strings and their related C functions strepy strcat etc with std string objects and their member functions Motivation In C a string is just a pointer to a zero terminated array of characters Many existing C projects still use C strings along with standard C functions such as strcpy and strstr that are used to manipulate and analyze the string contents Unfortunately extensive use of C strings can lead to unreadable inefficient and unsafe code The std string class from the C standard library is a modern alter native to C strings Replacing C strings with std string objects can improve the safety performance and readability of the code How ever programmers often don t use std string objects either because they don t know a
56. er num There is a replace member function in the std string class that pro vides similar functionality The signature of this function is shown in Listing 2 94 Listing 2 94 Signature of member function replace basic_string amp replace size_type pos size_type count const basic_string amp str size_type pos2 size_type count2 Listing 2 95 and Listing 2 96 show how a call to the memcpy function can be refactored into a call to the replace member function Listing 2 95 Before the refactoring Listing 2 96 After the refactoring o mea O E int main 4 char al Hello std string a Hello memcpy a Ha 2 a replace CO 2h Hall O 2h Be If memcpy is just used to copy a complete C string one can just ini tialize a new std string with the same value as the source string The example below demonstrates this case Listing 2 97 Before the refactoring Listing 2 98 After the refactoring int main q int main Y char s copy sStdkkistring S copy char ar KIE STAR RISCEINEArES h memcpy r s 4 Std cout SAn std cout SS T iy 33 2 Analysis 2 6 Converting C string functions This section contains possible refactorings for C string functions that convert a string into another data type Because all of these func tions use a const char as parameter they can also be used with std string objects because there is a member function called c_str which converts the std str
57. ers that are not contained in the string src the function strcspn returns the length of dest The function find_first_of returns the constant value std string npos instead Listing 2 61 and Listing 2 62 show how the refactoring could still be done Listing 2 61 Before the refactoring Listing 2 62 After the refactoring int main f ine maino sf char s100 Sala Ea E Emelec gt gt B7 Suc cin 2 gt G size_t m size_t found Burcana E MOTIE Ss find first of 01 size_t n found std string npos Sta Found do something with n do something with n ly di 2 4 9 strspn The function strspn has the following signature Listing 2 63 Signature of function strspn size_t strspn const char dest const char src It searches for the first character in dest that isn t contained in sre and then returns the length of the prefix up to that character For example if dest is 123hello and src is 0123456789 then strspn would return 3 because the first 3 characters in dest are all containted in sre 26 2 Analysis The class std string has several overloads of a member function called find first not_of that does a similar thing The signature of the over load that is the closest match to strspn is shown in Listing 2 64 Listing 2 64 Signature of member function find_first_not_of size_t find_first_not_of const char s size_t pos 0 const Unfortunately there is a subtle
58. ething with name something with name The function strncat can be used to append just a part of src to dest The programmer can specify the start index by adding a number to the argument for the src parameter and the number of characters using the count parameter Listing 2 73 and Listing 2 74 show how the refactoring can be done using the append member function Listing 2 73 Before the refactoring Listing 2 74 After the refactoring int main int main Y const char url const std string url www google com www google com char s TLOOT 2 ina es stditistring e Mins Us stracat s url 10 4 s append url 10 4 do something with s do something with s 3 2 5 2 strdup The function strdup creates a mutable copy of an existing C string Listing 2 75 shows the signature of the function Listing 2 75 Signature of the function strdup char strdup const char s First it allocates enough memory to hold the contents of the C string s and the terminating 0 character Then it copies the contents of s to the new string and returns it The code that uses this function has to make sure that the memory for the new string gets freed after it is not used anymore 29 2 Analysis Listing 2 76 shows how strdup is used as a simple way of creating a mutable copy of a const C string The same thing can be achieved by simply creating a std string and initializing it with the const C string as
59. eturn setenv environmentVariable c_str environmentValue c_str 1 45 3 Implementation In the Analysis section we described the disadvantages and the use cases of C strings We also looked at ways to refactor C strings and the standardized functions that are commonly used to analyze or mod ify them In this section we write about how we built an Eclipse CDT plug in that can apply those refactorings automatically and the prob lems we had to solve along the way 3 1 Overall architecture and functionality The following subsections describe the functionality and architecture of the CharWars plug in The subsections 3 1 1 3 1 2 3 1 3 3 1 4 3 1 5 3 1 6 3 1 7 and 3 1 8 have been taken out of the Pointerminator Gon13 documentation 3 1 1 The refactoring cycle To implement its functionality the CharWars plug in relies heavily on Codan fC14 Codan is a C C Static Analysis Framework for Eclipse CDT It provides basic components to build and test a plug in that does static analysis Each refactoring in turn consists of a checker and a quick fix The typical refactoring cycle is illustrated in Figure 3 1 1 The programmer modifies the source code 2 Codan fC14 detects those changes and notifies all active check ers 3 Each checker is responsible for a specific problem e g unused variables After a checker is notified by Codan it analyzes the code If it finds an occurrence of its problem the
60. h pointer operators which can t be handled by the CharWars plug in 80 4 Refactoring real life code Listing 4 19 Before the refactoring Listing 4 20 After the refactoring char timezoneName 255 std string timezoneName timezoneName reserve 255 timezoneName rlirc 0 timezoneName rirc 0 char p strrchr timezoneName char p strrchr amp timezoneName eye peci noni strstr The strstr function is frequently used inside if statement conditions and assignments To get a working example one needs to manually change an if state ment that does a NULL check The code is located inside xbmc xbm c cores dvdplayer DVDInputStreams DVDInputStreamHTSP cpp Listing 4 21 Before the refactoring Listing 4 22 After the refactoring const char method Stdepisin ingame o A if strstr method if method find channelAdd channelAdd std string npos CHTSPSession ParseChannelUpdate CHTSPSession ParseChannelUpdate msg m_channels msg m_channels else if strstr method else if method find channelUpdate channelUpdate 1 lo ars arras mos CHTSPSession ParseChannelUpdate CHTSPSession ParseChannelUpdate msg m_channels msg m_channels else if strstr method else if method find channelRemove channelRemove std string npos CHTSPSession ParseChannelRemove CHTSPSession ParseChannelRemove msg m_channels msg m_channels strpbrk
61. he following signature Listing 2 83 Signature of function strncp char strncpy char destination const char source size_t num It is similar to the strcpy function In addition it takes a num ar gument that specifies the number of characters that should be copied from source into destination The strncpy function can best be re placed with the std string member function replace The signature of this function is shown in Listing 2 84 Listing 2 84 Signature of member function replace basic_string amp replace size_type pos size_type count const basic_string amp str size_type pos2 size_type count2 An example of how a call to strncpy could be refactored into a call to replace is shown in the following listings Listing 2 85 Before the refactoring Listing 2 86 After the refactoring int main int main char a Hello std string a Hello Sama as ME 2D are place 02 a OLE If ir Another way to refactor this code is to use the std copy n func tion 31 2 Analysis Listing 2 87 Before the refactoring Listing 2 88 After the refactoring int main int main chars igoa mus anclas E a Urcclla char ris1 Eclesiales StrOCPY CEREN 2D std copy_n s begin 2 A 2 NOs std back_inserter r Saco LTS ISE 83 SCOS COUL SS T EST de 2 5 5 memmove The function memmove has the following signature Listing 2 89 Signature of function memmove void memmove
62. imax strtoumax The C char functions strtoimax and strtoumax have the following signatures Listing 2 120 Signature of function strtoimax and strtoumax std intmax_t strtoimax const char nptr char endptr int base std uintmax_t strtoumax const char nptr char endptr int base The functions take as many characters as possible from a byte string and convert them into an integer or unsigned integer number With the base one can define the range of numbers that are used in the byte string to represent the integer The out parameter str_end returns the position to which the conversion could be performed successfully 38 2 Analysis Both member functions can be refactored with stoll or stoull The signature of these functions can be found in Listing 2 113 and List ing 2 106 An example of this refactoring can be found in the following listings Listing 2 122 After the refactoring int main std string s 123456 char sD 112345600 char pEnd std intmax_t n std strtod s amp pEnd gta cout lt E n Longe Llong n tbdes as to lis SARA O MESE ly d 2 7 Refactoring example This section contains a possible refactoring of a function from the WebKit Open Source Project Prol4b More information about this project can be found under www webkit org This example shows how the C strings in this function could be refactored to std string objects Listing 2 123 Example c
63. ing into a const char Listing 2 99 shows an example Listing 2 99 Before the refactoring Listing 2 100 After the refactoring int main f ime meso sf char ef xo otu Scan e a MU Olds double n double n Sid ator Cs etd vatot Cs cletr tj 2 6 1 atof The function atof has the following signature Listing 2 101 Signature of function atof double atof const char str This function converts a given C string into a double It will return the converted value If the converted value is out of range the return value is undefined If the string can t be converted into a double the function returns 0 0 In the C standard library there is a function called stod that converts a std string into a double If no conversion can be done a std invalid_argument exception will be thrown A std out_of_range exception is thrown if the converted value falls out of range If a valid input value is provided the function returns the converted double The signature of this function can be found below In the case of a successful conversion the two functions behave the same An example of a simple refactoring can be found below 34 2 Analysis Listing 2 103 Before the refactoring Listing 2 104 After the refactoring int main int main 4 char sil WO OM arce rama E se OR Odi double n Stdiciacon Gspn double n std stod s Stat icout lt lt Di Stai Cout lt lt is p
64. izardOpenOperation op new RefactoringWizardOpenOperation refactoringWizard Cy op run mui Eno no Cc cure die IA 3 2 7 Marker position calculation To set a marker a checker needs to pass a problem location back to Codan Based on this location the problematic code will be marked in the editor Get an example from Figure 3 10 include lt iostream gt int main char s Hello World std cout lt lt s lt lt std endl Figure 3 10 Problem marker IASTNode objects have a method called getNodeLocations that allows a programmer to get the location of a node This method returns an array of IASTNodeLocation objects Each IASTNodeLo cation consists of an offset and a length Normally the array only contains one IASTNodeLocation object which fully describes the lo cation of the node In special cases there are more than one IASTNodeLocation to de scribe the full location of the node For example if macros are used inside a node there is one IASTNodeLocation object that describes the location of the code before the macro another one that describes the location of the code after the macro and a third one to describe 71 3 Implementation the location of the macro itself Unfortunately this last AST NodeLo cation object always has an offset of 1 and a length of 0 An example of this case is illustrated in Figure 3 11 It represents the locations of the node s HI in Listing 3 23
65. ling its c_str member function Otherwise it uses the iterator returned by the begin member function and converts it to a char pointer Therefore refactoring the str variable in Listing 3 9 leads to the code in Listing 3 10 Listing 3 10 After refactoring void print const char s gta cout lt s a lt lt stds cendiles 5 void makeUppercase char s Ga i O i lt Erandio sl s i std toupper s il p int main std string str Hello world prat Str e pue makeUppercase 4 str begin prin uds tocas tao 3 1 11 Extracting common code The checkers quick fixes and the StringRefactoring classes of the CharWars plug in require a lot of common code This code can be divided into three main categories For each of those categories there is a seperate class that consists solely of public static methods e ASTAnalyzer to analyze a node or a subtree of the AST e ExtendedNodeFactory to create new nodes or trees of nodes e ASTModifier to modify the AST Figure 3 7 is a class diagram of those three classes with some of their methods Since a lot of these methods are used both by checkers and quick fixes which don t belong into the same class hierarchy it wasn t possible to just put them in a common base class 61 3 Implementation Figure 3 7 Class diagram ASTAnalyzer ExtendedNodeFacto ASTModifier isCString newFunctionCal LExpression includeHeaders isArrayO nen
66. ment exception 35 2 Analysis is thrown An std out_of_range exception is thrown if the resulting value is out of range The following listings show an example Listing 2 107 Before the refactoring Listing 2 108 After the refactoring int main int main char s std string s UPM double n aC ONSI double n std stoi s Sides cout lt lt N Stadi CONG SSN h de 2 6 3 strtol strtoll The function signatures of the strtol and strtoll functions are shown in the listing below Listing 2 109 Signatures of functions strtol and strtoll long strtol const char str char str_end int base long long strtoll const char str char str_end int base The functions strtol and strtoll convert a byte string into a long or long long The integer value 0 is returned if no conversion can be done The out parameter str_end returns a pointer to the position in the string up to which the conversion could be performed successfully For example if the input string is 123abc this pointer will be pointed to the position of the letter a It is possible to refactor these functions with the stol or stoll func tions from the lt string gt header The signature of these functions can be found in Listing 2 106 In the listing below an example of this refactoring can be found Listing 2 110 Before the refactoring Listing 2 111 After the refactoring int main int maino
67. n 3 1 4 The index Parsing and binding resolution is a slow process Therefore Eclipse CDT stores the binding information in an on disk cache called the index To build the index all the code has to be parsed and all the bindings have to be resolved The index is then updated every time the programmer edits a file Figure 3 3 shows how everything fits together oP 14 Figure 3 3 How everything fits together 1 Modifies code Programmer l 2 Detects changes 3 Updates AST AST 4 Resolves bindings Bindings 5 1 Update Index 5 2 Updates Index 50 3 Implementation 3 1 5 The plug in components The CharWars plug in consists of a set of checkers and quick fixes Each time a file is changed by the programmer Codan starts the checkers Each checker traverses through the AST and searches for a specific problem For example there is a CharPointerChecker that searches for C strings that could be refactored to std string If a checker reports a problem a marker is placed in the editor When the programmer hovers over the marker with the mouse a description of the problem appears Figure 3 4 Plug in components Sint main char Problem description Char pointer found str The programmer can choose to apply the refactoring or ignore it If the programmer applies the refactoring Codan triggers the corresponding quick fix in the CharWars plug in The quick fix is
68. n call is returned from another function e Single statement The function is just called in a seperate statement The return value is not captured e Other Everything that is not recognized by a pattern 714 4 Refactoring real life code Table 4 1 Ocurrency statistics Function If Assignment Return Single Other name statement value statement strlen 164 155 4 0 349 strcmp 1507 39 105 0 283 strncmp 559 53 50 1 158 memcmp 447 90 137 36 387 strcat 6 1 0 383 23 strncat 1 0 0 67 1 strdup 8 349 34 0 85 strcpy 18 4 1 1168 56 strncpy 22 1 16 594 12 memmove 3 0 6 403 72 memcpy 8 7 7 1446 108 strchr 133 613 17 0 192 strrchr 3 254 0 0 24 strstr 292 250 24 2 121 strpbrk 9 27 0 0 11 strcspn 0 13 0 2 5 strspn 2 9 0 0 3 memchr T 59 4 8 42 For the functions that have a star next to their name in the table there exists a two step refactoring as described in subsection 3 1 9 4 2 Refactoring XBMC The XBMC repository has been used to test the CharWars plug in We took a snapshot of the application s source code in May 2014 from Github xG14 and tried to apply as many C string refactor ings as possible More information about XBMC can be found under xbmc org 15 4 Refactoring real life code 4 2 1 First real life test The plug in added 776 std string markers in total Because the XBMC source code also contains C code and the plug in can t differentiate between C and
69. n strncmp int strncmp const char stri const char str2 size_t num The function compares the first num characters of the strings str1 and str2 If the compared characters are equal the return value is zero Otherwise is the return value greater or lower than zero depend ing on the alphabetical order of the strings This function can also be replaced with the compare member func tion of the std string class This function has a signature that takes arguments to define the characters that should be compared The function signature can be found below Listing 2 49 Signature of member function compare int compare size_type posi size_type counti const basic_string amp str size_type pos2 size_type count2 const Both functions have the same return values so we just need to change the function call The parameters posl and pos2 are always zero in this case So the comparison starts from the beginning of the strings An example is shown in the listings below Listing 2 50 Before the refactoring Listing 2 51 After the refactoring int main q int main char al google co std string a google co char b google ch char b google ch Stari cCout lt lt stddu count lt lt strncmp a b 6 a compare 0 6 b 0 6 23 2 Analysis 2 4 6 memcmp The function memcmp has the following signature Listing 2 52 Signature of function memcmp int memcmp const void ptri
70. ng 3 1 ICPPASTFunctionDefinition main ICPPASTSimpleDeciSpecifier int v a ICPPASTFunctionDeclarator amp jIASTName main Y ASTCompoundStatement Y ASTDeclarationStatement Y lASTSimpleDeclaration side Y ICPPASTSimpleDeciSpecifier v a ICPPASTDeclarator lASTImplicitNameOwner JIASTName side gt ASTEqualsinitializer v5 IASTDeclarationStatement Y 3 IASTSimpleDeclaration area 7 ICPPASTSimpleDeclSpecifier int v a ICPPASTDeclarator IASTImplicitNameOwner JIASTName rea v 4 IASTEqualsinitializer vi ICPPASTBinaryExpression vi IASTidExpression ICPPASTExpression IASTName side vi IASTIdExpression ICPPASTExpression JIASTName side 48 3 Implementation 3 1 3 Bindings Every C identifier e g variable function class is represented as a node of type IASTName in the Abstract Syntax Tree Each such node has a reference to its binding object Each occurrence of that identifier references the same binding object For example if a program has a function called func then there will be a single binding object that represents func This binding object stores all the information about the func identifier including the locations of the declaration the definition and all the places where the function is called The algorithm used to compute the bindings is called Binding Resolution Binding resolution is performed on the AST after the code has been parsed 49 3 Implementatio
71. ocates a new buffer and copies a C string into that buffer 1 4 2 Additional refactorings If there is enough time at the end of the project the plug in will also include the following refactorings atof Converts a C string into a double atoi Converts a C string into an int atol Converts a C string into a long atoll Converts a C string into a long long strtol Converts a byte string into a long strtoll Converts a byte string into a long long 1 Task description strtoul Converts a byte string into an unsigned long strtoull Converts a byte string into an unsigned long long strtof Converts a byte string into a float strtod Converts a byte string into a double strtold Converts a byte string into a long double strtoimax Converts a byte string into std intmaxt strtoumax Converts a byte string into std uintmax t 1 5 Time management Our project started on the 17th of February 2014 It will end on June the 13th 2014 at 12 00 p m which is when the final release has to be submitted completely 1 6 Final release The following items will be included in the final release of the project 4 printed exemplars of the documentation 1 colored Poster for presentation Management Summary and Abstract 2 CD DVD with update site that contains the plug in project resources documentation virtual machine with operational Eclipse CDT with plug in installed e 1 CD for archi
72. occurrence of the string variable However there is one exception The DefaultRefactoring is a special StringRefactoring subclass that should always be the last one to check in the for loop It never returns 0 from the isApplicable method and therefore acts as a fallback refactoring for string variable occur rences that can t be refactored by any of the other StringRefactoring subclasses In those cases the DefaultRefactoring has to convert the std string variable back to either a char pointer or a const char pointer depending on the context in which the variable is used For example in Listing 3 9 the string variable is passed as an argument to two cus tom functions The print function simply prints the string on the standard output The makeUppercase function on the other hand modifies the contents of the string Listing 3 9 Before refactoring void print const char s std cout lt lt a lt lt std rendis void makeUppercase char s Tor Gnt al ES OS st lt lt telen CS s i std toupper s i I Jp int main T char str Hello vora Tue prints tr makeUppercase str prima am 60 3 Implementation The DefaultRefactoring checks whether the function to which the string variable is passed as an argument expects a char pointer or a const char pointer and adapts the variable accordingly If the corre sponding parameter is a const char pointer the std string variable can be converted by cal
73. ode to refactor include config h include EnvironmentUtilities h include lt wtf text CString h gt void stripValuesEndingWithString const char environmentVariable const char searchValue ASSERT environmentVariable ASSERT searchValue The C string parameters can be replaced with const references to std string objects since the parameters are not modified inside the function body The ASSERT statements can be removed because it is not possible to pass NULL as an argument to a function that expects a reference parameter 39 2 Analysis Listing 2 124 Possible refactoring include lt cstdlib gt include lt string gt include lt algorithm gt void stripValuesEndingWithString const std string amp environmentVariable const std string amp searchValue Listing 2 125 Example code to refactor Grab the current value of the environment variable char environmentValue getenv environmentVariable if environmentValue environmentValue 0 0 return The function getenv can return NULL In C constructing a std string object with char pointer that is NULL is undefined be haviour Therefore the variable environment Value can t be directly converted into a std string object Listing 2 126 Possible refactoring char tmp getenv environmentVariable c_str if tmp tmp 0 0 return std string environmentValue tmp Listing 2 127 E
74. on home hsr workspace PluginTests Debug PluginTests 6 5 1 HSR address Found C String s Writable Smart Insert 5318 Figure A 3 Resolving a problem marker 91 A User manual Pressing on the corresponding quick fix will start the refactoring pro cess of the CharWars plug in After the refactoring is done one can review the code and save the changes Sometimes the code can still be improved by doing some manual changes The changes can be reverted by pressing Undo C C PluginTests src PluginTests cpp Eclipse CS S dr wie se E nM d Ov Gy Cvi y Or Gr Qriadg y Pl gly Bor Q i E Fa c c Eb Project Explorer B A PluginTests cpp X um FOR p a amp gt include lt iostream gt j s include lt cstring gt Be xo gt 5 PluginTests include lt string gt gt u iostream Sint main u cstring std string s example hsr ch gt E if s find hsr ch std string npos 4 string std cout lt lt HSR address e main int F E El Console X es ii Call x amp mege E y 13 y lt terminated gt PluginTests C C Application home hsr workspace PluginTests Debug PluginTests 6 5 1 HSR address Writable Smartinsert 7 1 Figure A 4 Resolving a problem marker 92 A User manual A 2 2 Configuration The CharWars plug in contains four checkers One is used to set mark ers on C arrays one for reference parameters and two for C strings You can deactivate and r
75. ore the refactoring Listing 4 6 After the refactoring char magic 4 std string magic magic reserve 4 if strncmp if magic compare 0 magic sizeof magic c_str XBTF_MAGIC XBTF_MAGIC 0 sizeof magic 0 4 sizeof magic c_str 0 1 return false return false memcmp Memcmp is a function that is often used inside If statements It is also frequently used as a return value or in an assignment to a variable The following example can be found inside xbmc guilib AnimatedGif cpp To successfully refactor it one needs to change the definition of the string into an initialization After the refactoring has been done one can remove the initialization again Listing 4 7 Before the refactoring Listing 4 8 After the refactoring char szSignature 6 std string szSignature szSignature reserve 6 if memcmp szSignature GIF 2 fif szSignature compare 0 2 GIF OD E Oo 2 YS T strcat This function is typically used on its own in a seperate statement An occurrence that can be refactored with the CharWars plug in could be found inside lib libmodplug src load_pat cpp Listing 4 9 Before the refactoring Listing 4 10 After the refactoring static char tinmiditycfgell2s u static stadi stringi tiniditycige 417 strcat timiditycfg timiditycfg cama dirty g oneal Amilcar e 78 4 Refactoring real life code strncat This function is used sparely It is used
76. ple consider the code in List ing 3 7 const std string str my string char found strstr amp str begin ing func found 58 3 Implementation The main problem is that the strstr function and the find mem ber function behave differently when the second string is not a sub string of the first one While the strstr function returns a nullptr the find member function returns the constant std string npos In Listing 3 6 the code had an if statement that verified that the return value captured in the variable found was not NULL This meant that the refactoring was able to directly convert from the index returned by the find member function back to a pointer that is equivalent to the pointer returned by strstr Unfortunately the code in Listing 3 7 doesn t contain such an NULL check Therefore the refactoring has to make sure that the pointer passed to the function func stays the same after the refactoring even if the second string is not a substring of the first one This leads to the code shown in Listing 3 8 const std string str my string std string size_type found_pos str find ing char found found_pos std string npos amp str found_pos nuliptr func found The refactoring added a temporary variable that holds the result of the find function call and uses it to immediately convert back to a pointer Thus the subsequent code can be left unchanged b
77. r consider the code in Listing 3 11 62 3 Implementation const char email examplei hsr ch Ui oo email example2 hsr ch This is valid code which makes it clear that the resulting std string object can t be const because the reassignment of a const std string is not possible char const strings If a variable is defined as char const this means that the variable cannot be pointed to another array of characters However the char acters within the string can be changed because the variable is a const pointer to char Therefore the resulting std string object can t be const because it is not possible to change the characters of a const std string const char const char const const strings Lastly a C string that is defined as const char const or char const const cannot be repointed to another string and its characters can t be changed either Therefore this is the only situation in which the variable can safely be refactored into a const std string 3 2 2 std string member functions vs algorithm functions As described in section 2 3 both std string member functions and functions from the standard header lt algorithm gt could be used to refactor C string functions However during the implementation it became clear that std string member functions are usually the better choice For example consider the code in Listing 3 12 Listing 3 12 Before refactoring int main con
78. refactoring Listing 4 2 After the refactoring char clllegalChars std string clllegalChars Le a Sy MINE pf Ee Nie aie e te unsigned int illlegalCharSize unsigned int illlegalCharSize strlen clllegalChars clllegalChars size strcmp This function is mostly used inside If statement conditions The following code that is located inside xbmc linux PosixMount Provider cpp contains several strcmp calls that can be refactored cor rectly with our plug in Listing 4 3 Before the refactoring Listing 4 4 After the refactoring const char fs std string fs fsStr if stremp fs fuseblk i fs fuseblk DIM strenpiCtsh viat fs ON aerate IMstrenpiCtsi Mec 2 fs MESA IM stxempiCisi ext 3 fs Vexts stremp fs reiser fs reiserfs I ll l II I p strcmplfs xfs fs MES strecnp Cisr abs 3 fs Mt SS e strcmp fs i fs iso9660 strcmpl s fs exfat strcmpl s fs fusefs hfs strcmp fs hfs fs strncmp Like stremp this function is also used mostly inside If statements It is not used as frequently as strcmp Below is an example of a successfully refactored example that can be found inside the file xbmc guilib XBTFReader cpp To be able to refactor this code one needs to change the declaration of the C string into an initialization After applying the quick fix this initialization can be removed again T7 4 Refactoring real life code Listing 4 5 Bef
79. s RunFor rtsFile resources QuickFixes CharPointerQuickFix rts public class CharPointerQuickFixTest extends BaseTest Override protected String getProblemld return CharPointerChecker PROBLEM_ID J Override Test public void runTest throws Throwable IMarker firstMarker getFirstMarker runQuickFix firstMarker new CharPointerQuickFix assertEquals getNormalizedExpectedSource getNormalizedCurrentSource F ii All tests are defined inside the rts file that is referenced in the quick fix unit test class A test is identified by its name First there is a section that contains the code before the refactoring After that there is a section with the code that is expected after the refactoring is done An example is shown below in Listing 3 20 68 3 Implementation Listing 3 20 A quick fix test CharPointerString main cpp int main char str Hello World include lt string gt int main i string str Hello World 3 2 5 Checking if a variable name exists In the description of the Char pointer cleanup refactoring 3 1 9 List ing 3 7 and Listing 3 8 showed that it is sometimes necessary to intro duce a new variable Since the new variables hold position values the plug in takes the name of the original pointer variable and appends pos to it So for example in Listing 3 7 the pointer variable is called found which means that in Listing 3 8 th
80. s DK Unable to delete marker e If an exception doesn t impact the process of the refactoring like a failed removal of a marker only this logging will take place An error dialog will be shown to the user for exceptions that cause the refactor ing to fail so the user knows that something went wrong A screenshot of the dialog that is shown to the user can be found in Figure 3 9 a Failed to run C String quick fix Figure 3 9 Error dialog box Because quick fixes don t have a way of showing a popup to the user the class Refactoring is used This class shows user feedback automat ically when an error is occurred In our case the Refactoring class is only used to show the error dialog box Therefore it only creates an error message during the initial condition check that will then auto matically be shown to the user A Refactoring class can not be created without a RefactoringWizard Because the RefactoringWizard will not be shown if the initial condi tion check of the refactoring fails it doesn t need to have any content The RefactoringWizard can be started with a 70 3 Implementation RefactoringWizardOpenOperation Fel14 The code that is used to create the error dialog box can be found in Listing 3 22 Listing 3 22 Show error dialog box to user ErrorRefactoring refactoring new ErrorRefactoring getErrormsg ErrorRefactoringWizard refactoringWizard new ErrorRefactoringWizard refactoring 0 RefactoringW
81. shown in Listing 2 77 The call to the function free at the end of the program is not necessary anymore char str strdup Hello string str Hello do something with str do something with str free str 2 5 3 strcpy The function strcpy has the following signature The strepy member function copies the characters from a source string into a destination buffer The destination buffer needs to be at least as large as the source string including its terminating 0 character One way to get the same behaviour with std string is to initialize the destination string directly with the contents of the source string A simple refactoring example is shown in Listing 2 79 and Listing 2 80 Listing 2 80 After the refactoring int main std string s HSR stdivstring T char s HSR char rial Sic NT Gens SILARASNC OIE lt a T r s Sivas COMIC lt lt T It is also possible to use the std copy function to refactor this code Keep in mind that the function std back_inserter is inefficient when using it for inserting really long strings 30 2 Analysis Listing 2 81 Before the refactoring Listing 2 82 After the refactoring int mano q int main Q char shi HSRU Stakis tring e Misas char r Us Gels Steals Mee SHE yeti 5 std copy s begin s end std pack inserter Grp ss atA cout sS r BCLS COUT KS T 2 5 4 strncpy The function strncpy has t
82. sil Worial t string s World auto v std find s begin Sloe 0 char ptr char ptr v s begin 3 char memchr s o 3 amp v nullptr jonealinis oer 3 print ptr p 2 5 Modifying C string functions This section contains possible refactorings of C string functions that modify a string 2 5 1 strcat strncat The functions strcat and strncat have the following signatures Listing 2 70 Signature of functions strcat and strncat char strcat char dest const char src char strncat char dest const char src std size_t count They append the content of C string src to C string dest The buffer for dest must have enough space to hold dest src and the terminating null character Both functions return a pointer to dest However in practice the return value is often ignored The std string class has an append member function to concatenate strings but it also overloads the operator to do basic concatenation which leads to conciser code See Listing 2 71 and Listing 2 72 for a simple refactoring example 28 2 Analysis Listing 2 71 Before the refactoring Listing 2 72 After the refactoring int main int main 4 char name 100 std string name char last_name 100 std string last_name std ii cin gt gt name std cin gt gt name gt gt last_name gt gt last_name strcat name a We strcat name last_name last_name do som
83. st char email example hsr ch if strstriGemail bsr ch std cout lt lt UHSR emai address d lt lt std endi 63 3 Implementation One possibility to refactor this code would be to use the search function from the standard header lt algorithm gt This function takes 4 iterators The first two iterators delimit the string to be searched through while the other two define the string to search after In most cases the second argument to strstr will either be a C string variable or a literal as in Listing 3 12 Therefore the plug in would have to either refactor that C string variable into a std string object or create a new std string variable from the literal that is passed to strstr The resulting code is shown in Listing 3 13 Listing 3 13 After refactoring with search int main const std string email example hsr ch const std string str Ohsr ch if search email begin email end str begin str end email end 4 std cout lt lt UHSR email address U SSA std k endi In contrast the same refactoring could be accomplished in a much simpler way using the std string member function find This is shown in Listing 3 14 Listing 3 14 After refactoring with find int main const std string email example hsr ch denia nd Oscar std tai npos A std cout lt lt HSR email address lt lt std endl Because this secon
84. string functions with calls to corresponding std string member functions The CharWars plug in analyzes the code that is being written If it finds a problem it sets a marker in the editor The programmer can then trigger an appropriate refactoring through the marker which causes the plug in to apply this refactoring The following page shows screen shots of the CharWars plug in in action a 8 TestProject cpp 5 hb include lt iostream gt Refactor C String into std string if strstr email hsr ch std cout lt lt HSR email address Found C g email Writable e TestProject cpp 53 include lt iostream gt email example hsr ch if Cemail findC ehsr ch std string npos std cout lt lt HSR email address Writable Smart Insert 13 Refactoring the C string function strstr Finally to optimize the plug in we tested it with an existing open source C project called XBMC xG14 In total the CharWars plug in found 776 C strings and marked them accordingly To check if the plug in works correctly we applied the refactoring for 150 of those C strings and verified the results The CharWars plug in was able to correctly refactor 65 of the C strings as shown in the following table Markers set Markers tested Solved Unsolved 776 150 98 65 52 35 Further work The CharWars plug in is a nice improvement over the existing Point
85. strtold functions have the following signatures Listing 2 116 Signatures of functions strtof strtod and strtold float strtof const char str char str_end double strtod const char str char str_end long double strtold const char str char str_end They convert a byte string into a corresponding floating point data type If the conversion fails they return in case of an out of range value an error and in case no conversion can be performed the value 37 2 Analysis 0 The out parameter str_end returns a pointer to the position to which the conversion could be performed successfully These functions can be refactored with the corresponding conversion functions from the lt string gt header Those are called stof stod and stold Listing 2 117 Signatures of functions strtof strtod and strtold float stof const std stringk str size_t pos 0 double stod const std string amp str size_t pos 0 long double stold const std string amp str size_t pos 0 While the return value of a successful conversion remains the same when using these functions their behaviour differs if the conversion fails See an example refactoring below Listing 2 118 Before the refactoring int main char sii 3s 6 en 2 char pEnd double n std strtod s amp pEnd sStdrrcout lt lt n eela a anna E Steele s double n std stod s starir cout lt lt m dy 2 6 6 strto
86. t Eclipse E E Console 33 stat x amp Bele Y e lt terminated gt PluginTests C C Application home hsr workspace PluginTests Debug PluginTests 6 5 1 hello World Figure A 7 De install plug in 95 A User manual In the newly opened window press on Installation Details to open the details about the current Eclipse installation E Sr rai dB EBRIO A AAA mov Q i E E Project Explorer H 2 O a y gt c PluginTests About Eclipse c c Ss o feo o gt An outline is not Eclipse IDE For C C Developers available Version Kepler Service Release 2 Build id 20140224 0627 c Copyright Eclipse contributors and others 2000 2014 Allrights reserved Visit http eclipse org ig GSeS8S Installation Details tl E Console a x amp ele Y 8 gt 13 lt terminated gt PluginTests C C Application home hsr workspace PluginTests Debug PluginTests 6 5 1 hello World Figure A 8 De install plug in 96 A User manual Under Installed Software in the Installation Details window all installed plug ins are shown Select the CharWars plug in and then press the Uninstall button For more information see Figure A 9 Navigate with the Next button through the de installation wizard and finish the de installation CT Br amp r wis May Sy Gv Cv isFy OvevrQaries vr wor a e Eb c Eclipse Installation Details gt ES PluginTest Inst
87. ten used along with functions that can be used to ana lyze or modify the string s contents Some of those functions return a char pointer that points to a position inside the string For example the function strstr takes two C strings and returns a pointer to the first occurrence of the second string inside the first string Listing 2 10 shows an example 13 2 Analysis Listing 2 10 C string function that returns a pointer int main char url i100 SeGlas alr 2 gt urli char found strstr url och if found found 1 bate ler found 2 e y Stati cout lt lt UA lt lt gtd endi Once the C string has been refactored to a std string the function strstr also needs to be replaced by some other means One way is to use one of std string s member functions as shown in Listing 2 11 Listing 2 11 Example with std string member function int main std string url gtd pala 2353 mol Std size t found url find Ct cht if found std string npos uri pounds ae url found 2 e Y SACOS RUSS Ed Sometimes it is better to use one of the functions from the standard header lt algorithm gt because they often return an iterator which is conceptually similar to a pointer Listing 2 12 shows an example using the search function Listing 2 12 Example with function from standard header lt algorithm gt int main 4 std string url star ucin gt uri
88. the rts file one provides the code that will be used to test the checker An entry is identified by its test name First there is a config section that is used to define the markerPositions property Then there is a section that contains the actual code Listing 3 18 contains an example Listing 3 18 A rts file entry for a checker test expecting a marker in line two CharPointerString 0 config markerPositions 2 main cpp int main const char sth Herlon World ica 67 3 Implementation Testing quick fixes The quick fix unit tests also inherit from a base class The base class contains a method that returns the first marker that was found in the code It also has two methods to remove all line breaks from the actual and the expected code inside the assert call This workaround is used because it s hard to configure the formatter for adding the line breaks at the correct position Also if the project is imported into another Eclipse instance one would need to configure the formatter correctly before running the tests because otherwise some tests may fail The unit test classes have one method to get the problem id of the corresponding checker and another method that runs the test by exe cuting the corresponding quick fix with the marker The path to the rts file that contains the test cases is defined as well In Listing 3 19 an example of a quick fix unit test is shown Listing 3 19 A quick fix unit test clas
89. ther componentStart points to the original string or the last colon putting the null terminator there will get us the desired result componentStart 0 0 foundAnyMatches true In these two strstr calls only one character is searched inside the 44 2 Analysis string Therefore it can be replaced with a std find function call that searches for a single character The corresponding conditions need to be adapted as well Listing 2 140 Possible refactoring if match environmentValue end 4 auto nextColon std find componentStart environmentValue end 27 5 while nextColon environmentValue end amp amp nextColon lt match componentStart nextColon nextColon std find componentStart 1 environmentValue end OR J componentStart 0 0 foundAnyMatches true Listing 2 141 Example code to refactor If we found no matches don t change anything if foundAnyMatches return If we have nothing left just unset the variable if environmentValue 0 10 unsetenv environmentVariable return ap setenv environmentVariable environmentValue 1 Because setenv and unsetenv take C string parameters the std string objects are converted back into C strings using the c_str member function Listing 2 142 Possible refactoring if foundAnyMatches return if environmentValue 0 NO unsetenv environmentVariable c_str r
90. there is no need to calculate it Task 3 Manipulating the string If strl is not const it is possible to modify it through the pointer returned by the function strstr Listing 2 40 Before the refactoring Listing 2 41 After the refactoring int main int maino char url 1001 Stak a Ea urli gtd e Bala gt gt urie stare cin e urls std string s de char tld_ptr auto tld_ptr std search strstrturl ides url begin url end s begin s end tld_ptr 1 eens Glee stelle icin a Guilel ptr E 2 A secildi pti LS AO do something with url do something with url Listing 2 41 shows how the same thing can be achieved using the search function from the standard header lt algorithm gt This func tion returns an iterator which can be used in the same way as the pointer 21 2 Analysis The subsequent code didn t have to be changed because iterators can be used just like pointers to modify the contents of a string However an additional variable to hold the value of the search string had to be introduced Task 4 Passing the pointer to a function Listing 2 42 shows how the pointer could also be passed to a func tion Listing 2 42 Before the refactoring Listing 2 43 After the refactoring int main f O char email 100 std string email Sinden came gt gt email std cin gt gt email char domain_part strstr email auto const found email find 0 O Es std string domain_part
91. tr function the pointer is often used to perform one or more of the following tasks Task 1 Performing a Null Check Often the programmer uses the strstr function to find out whether str2 is a substring of strl The exact value of the pointer is of no interest All the code does is to check whether it is null or not Listing 2 36 shows an example Listing 2 36 Before the refactoring Listing 2 37 After the refactoring int main 4 int main q char url 1001 Salsa Era urli gtd ecin 2 gt uri STANCI urls EACS Ere Cr GUTI cont af Corl find com fp Mal 2 EX 4 Sew ISA Siria ip os seed JA e E yan H de 20 2 Analysis The same thing can be achieved using the find member function but because it returns an index and not a pointer the return value has to be compared with the constant std string npos instead of null Task 2 Calculating the index Sometimes the programmer is interested in the index of substring str2 inside of str1 This value can be calculated by doing pointer arithmetic as shown in Listing 2 38 Listing 2 38 Before the refactoring Listing 2 39 After the refactoring O int main 4 char email 100 std string email sta cin 2 gt epail etar cin gt email int prefix_len strstr email int prefix_length email find gmail com email gmail com do something with prefix_len do something with prefix_len The find member function returns the index directly so that
92. uch a function is called Additionally these functions have difficult to understand names such as strpbrk and strchr which lead to code that is hard to understand The Pointer minator plug in did not improve that situation Instead of replacing 1 Task description the string functions it just tries to make the new std string object work with the existing code 1 3 Solution Objects of the class std string store the size of the string in internal state Therefore it should be possible to improve the performance and the readability of the code by replacing C string functions with a combination of std string member functions and functions from the standard header lt algorithm gt 1 4 Our goals In our bachelor thesis we will first analyze the various C string func tions and how they are used in existing C code Then we try to define refactorings for each function that allow us to replace the C string function with a std string member function or a function from the standard header lt algorithm gt After that we extend the existing Pointerminator Gon13 Eclipse CDT plug in to add the new function ality The overall goal is to develop a plug in that can improve the quality of existing C code by performing a set of well defined refac torings In the end we test the plug in with a well known C open source project and try to optimize it as much as possible 1 4 1 Features The plug in will replace the following C string
93. ue replace componentStart environmentValue end match searchLengthWithColon environmentValue end match std search componentStart environmentValue end searchValueWithColon begin searchValueWithColon end 43 2 Analysis Listing 2 137 Example code to refactor Search for the value without a trailing colon seeing if the original input ends with it match strstr componentStart searchValue while match NULL 4 if match searchLength 0 break match strstr match 1 searchValue Again the strstr calls to search for the corresponding variable can be replaced with calls to the std search function The check in the while statement needs to be adapted as well Listing 2 138 Possible refactoring match std search componentStart environmentValue end searchValue begin searchValue end while match environmentValue end 4 if match searchLength 0 break match etd search match 1 environmentValue end searchValue begin searchValue end Listing 2 139 Example code to refactor Since the original input ends with the search strip out the last component if match 4 Update componentStart to point to the colon immediately preceding the match char nextColon strstr componentStart while nextColon amp amp nextColon lt match componentStart nextColon nextColon strstr componentStart 1 Whe
94. uld be added to the plug in in a future project e Refactoring of strings that are allocated on the heap e Refactoring of string parameters e Refactoring of string return values 88 A User manual This chapter describes how to de install the CharWars plug in how to use it and how some parts of it can be deactivated A 1 Installation The CharWars plug in requires the Eclipse CDT IDE preferably the Kepler release or newer and at least Java 1 6 installed on the sys tem To install the plug in first click on Help and select Install New Software Refactor Nav S Es Project Explorer X BS bd E sor vag El Problems X 2 Oitems Description Resource Path Location T Figure A 1 Install plug in 89 A User manual Enter the plug in url under Work with and check the check box that is shown next to the plug in name Available Software I Project explored Check the items that you wish to install 5 Es p z Work with http sinv 56051 edu hsr ch 8080 v Add Bot Find more software by working with the Available Software Sites preferences a Name Version v m CharWars plugin amp charwars 1 0 0 201406050943 SelectAll Deselect All 1item selected Details This package contains the CharWars plugin Show only the latest versions of available software MW Hide items that are already installed Group items by category What is already installed C Show only softw
95. uly 2014 http jenkins ci org Overview of Parsing Overview of parsing July 2014 http wiki eclipse org CDT designs Overview_of_ Parsing Apache Maven Project Maven welcome to apache maven July 2014 http maven apache org 98 Bibliography Pro14b The WebKit Open Source Project Environmentutilities cpp Red14 Spol4 Str97 xG14 March 2014 https github com WebKit webkit blob e7207313fed4b7a2140c39f 65d45e0f 441731735 Source WebKit2 Platform unix EnvironmentUtilities cpp Redmine Overview redmine July 2014 http www redmine org Joel Spolsky Back to basics July 2014 http www joelonsoftware com articles fog0000000319 html Bjarne Stroustrup The C Programming Language 1997 xbmc xbmc GitHub Xbmc main repository May 2014 https github com xbmc xbmc 99
96. ve with the documentation and abstract without personal informations 2 Analysis This chapter contains an analysis of C strings and shows their draw backs in comparison to std string objects It also contains a descrip tion of several standard functions that are often used to analyze or ma nipulate C strings and demonstrates different refactorings that could be applied by the plug in 2 1 The structure of C strings In C a string is just a pointer to an array of characters that is termi nated by a 0 character No additional information about the length of the string is stored anywhere There are several ways to create a C string which have different effects on the mutability and the memory location of the string 2 1 1 Const string literal One way to create a C string is to initialize a char pointer with the address of a string literal as shown in Listing 2 1 Listing 2 1 Const string literal int main const char str Hello World do something with str i By default the GCC compiler allocates 14 bytes 13 ASCII characters one 10 character in the global static section of the memory This is shown in Figure 2 1 2 Analysis Figure 2 1 Structure of a C string o 1 2 3 4 5 6 3 7 8 9 10 11 12 In addition the string is read only This allows the compiler to do an optimization called string pooling Listing 2 2 shows an example Listing 2 2 String
97. xample code to refactor Set up the strings we 1l be searching for size_t searchLength strlen searchValue if searchLength return Because we changed the type of the searchValue variable the size member function of the std string class can be used to get the length of the string 40 2 Analysis Listing 2 128 Possible refactoring auto searchLength searchValue size if searchLength return Listing 2 129 Example code to refactor Vector lt char gt searchValueWithColonVector searchValueWithColonVector grow searchLength 2 char searchValueWithColon searchValueWithColonVector data size_t searchLengthWithColon searchLength 1 memcpy searchValueWithColon searchValue searchLength searchValueWithColon searchLength searchValueWithColon searchLengthWithColon 0 Because the vector is just used for the initialization of a C string there is no need for it while using the class std string The whole content of the string searchValue is copied into this C string so a direct initialization of a std string with the correct value does the same Listing 2 130 Possible refactoring std string searchValueWithColon searchValue auto searchLengthWithColon searchLength 1 searchValueWithColon append Listing 2 131 Example code to refactor Loop over environmentValueBuffer removing any components that match the search value ending with a
Download Pdf Manuals
Related Search
Related Contents
off!® regions sauvages® chasse-moustiques en User Guide - SiteDesigner Technologies, Inc. Guide de démarrage rapide Samsung 空净蓝旋风系列,等离子空气净化器,CARD值高达690m3/h,白色 AX60J7006WT 用户手册 Honeywell HE120 Humidifier Copyright © All rights reserved.
Failed to retrieve file