Home
        Reading and Modifying Code
         Contents
1.   duction system  unscathed by any code modifica   tions  can also be thought of as a last resort code  backup for your test system     3 2 Before the Change    Like baking  sex  and brain surgery  code modifi   cation requires some preparation to ensure a suc   cessful outcome     Back up the Code    It   s impossible to overstate the importance of  backing up the code  Before you make any modi   fications  save a copy of the original as insurance   It   s possible that your code modification may not  work as planned  and by the time you realize this     21    you may have made a lot of changes to the code or   worse  you may not remember what things you   ve  changed    The same principle applies when you   re mak   ing a series of modifications to code  too  Once  one part is functioning and stable  saving a snap   shot leaves you with a fallback position in case of  trouble later on    As a bonus  by comparing the current code and  a saved copy  you can easily determine what has  been changed  Tools are available that compare  files and directories and output a summary of the  changes     This is useful both for remembering  where you left off  but also for constructing source  code patches    There are several ways of performing back   ups  The crudest is to simply make a copy of  the code  file by file or en masse using an archiv   ing program  A more sophisticated way is to use  a revision control system to track changes  this  approach has a learning curve  but allows
2.   take code as input and reformat  it without changing its operation  by adding and  deleting whitespace  Good formatters are highly  configurable  and will permit you to tailor the  code style to one which you will find easy to read    Editors are also tools that can assist with the  readability of code     Syntax highlighting    editors  will automatically highlight parts of the source  code  like comments and reserved words  using  color  brightness  and font changes  For code  with complicated nestings of braces  brackets  and  parentheses  editors can show how pairs of these  symbols match up     Searching    Unlike normal prose  code is a very nonlinear  form of communication to read  it jumps around  from place to place  Fortunately  it usually does  so in a fairly controlled  logical fashion because  of how people tend to write code    For this reason  search tools are the bread and  butter of the code reader  Most editors have some  search capability  but often you will want to search  for a word in the entire code body  not just the  files that happen to be currently open in the editor   Some useful tools are     Multi file search tools  Tools to search through a  set of files for a specified term usually come  standard with an operating system  because    they are generally useful even to users who  don   t read code    The primary distinguishing characteristic of  these search tools is how sophisticated a term  they will look for  Some will be limited to  fixed te
3.  40 Okay  Sir Arthur Conan Doyle really wrote it  but it   s still  a good quote  From The Sign of the Four    41 Bourne  2004     42 Jeffries et al   1981  examines the role experience plays in  design  and how experts and novices differ in their approach to  code design    43 Tn case it   s not immediately obvious  the code breaks long  input lines  This was Paul Heckbert   s 1987 winner in the Inter   national Obfuscated C Code Contest  Don Dodson   s English  to Pig Latin translator was an artistic 1995 contest winner        62    oe Unfortunately  as summarized by Oman and Cook  1990    formal studies of indentation effects have produced mixed re   sults    45 The optimum line length for readability is one and a  half times the length of the lowercase alphabet  Arnold  1981   pages 33 34  Turnbull and Baird  1975  page 67   Assum   ing a monospace font  which is commonly used for code  the  optimum line length for code would be 39 characters  not in   cluding leading whitespace     46    Meaningful    names do not imply excessively long  names  however    47 An idea first proposed by Knuth  1984     48 Studies indicate that adding comments of any sort  even  good comments  decrease the readability of code  Brooks   1995  page 224      63    Bibliography    E  C  Arnold  Designing the Total Newspaper   Harper  amp  Row  1981     R  C  Bell  Monte Carlo debugging  a brief tuto   rial  Communications of the ACM  26 2  126     127  February 1983     J  Bentley  More Programm
4.  It also makes  a reader laboriously figure out what differences  if  any  exist between various copies of the code     Laziness    Give a reader less code to read  sometimes the  best code is no code at all  A keen sense of lazi   ness  the desire to avoid writing lots of code  is key  to identifying and exploiting self similar struc   ture  Factor out the dissimilar parts of otherwise  similar code into a table  and have one piece of  code do the job of many pieces by simply indexing  into the table  In more complicated cases  the ta   ble may require a little code    engine    to interpret  it properly  but even the combination of code plus  table can be much shorter than the naive  brute   force code    Or  give a reader simpler code  Be lazy and  solve a problem in multiple simple steps rather  than one complex step  There are also times when  it   s easier to write code to do something poorly  and then write code to fix up the result  For ex   ample  writing a compiler that produces good out   put in one step would be next to impossible  The  code would be far too complicated  It   s far easier  for a compiler to generate bad but correct output     52    then apply multiple simple transformations to fix  up the bad output     6 4 Documentation    Part of documentation is in the code itself  Using  meaningful variable names  constant names  and  subroutine names are all important cues to some   one reading code     The use of    magic values      numbers and other litera
5.  M  E   Atwood  The processes involved in designing  software  In J  R  Anderson  editor  Cognitive  Skills and Their Acquisition  pages 255 283   Lawrence Erlbaum Associates  1981     66    D  E  Knuth  Literate programming  The Com   puter Journal  27 2  97 111  May 1984     D  E  Knuth  The errors of TeX  Software     Prac   tice and Experience  19 7  607 685  July 1989     J  R  Levine  T  Mason  and D  Brown  lex  amp  yacc   O   Reilly  second edition  1992     D  C  Littman  J  Pinto  S  Letovsky  and  E  Soloway  Mental models and software main   tenance  In Soloway and Iyengar  1986   pages  80 98     B  Maas  Using Palm OS Emulator  PalmSource   2003     K  B  McKeithen  J  D  Reitman  H  H  Rueter  and  S  C  Hirtle  Knowledge organization and skill  differences in computer programmers  Cogni   tive Psychology  13 307 325  1981     M  K  McKusick  K  Bostic  M  J  Karels  and J  S   Quarterman  The Design and Implementation  of the 4 4 BSD Operating System  Addison   Wesley  1996     B  Meyer  Object Oriented Software Construc   tion  Prentice Hall  second edition  1997     B  P  Miller  L  Fredriksen  and B  So  An em   pirical study of the reliability of UNIX utili   ties  Communications of the ACM  33 12  32     44  December 1990     G  A  Miller  The magical number seven  plus  or minus two  Some limits on our capacity for    67    processing information  The Psychological Re   view  63 2  81 97  March 1956     A  Mohan and N  Gold  Programming style  changes i
6.  estate are  location  location  and location  In code reading   writing  and modifying  the three most important  things are practice  practice  and practice  The ad   vice in this book doesn   t magically help  unfor   tunately  it   s only a starting point for developing  your skills   Happy reading     57    Notes    1 For example  anti virus researchers may need to partially  reconstruct legitimate code when determining how malicious  code operates      Understanding the design when reading code has some  overlap with the design of new code  Section 6 2   What sepa   rates the two is that  when reading code  you   re looking at the  end product  not the means used to get there    3 There is a famous quote by Brooks  1995  page 102      Show me your flowcharts and conceal your ta   bles  and I shall continue to be mystified  Show  me your tables  and I won   t usually need your  flowcharts  they   ll be obvious     4The standard design pattern reference is Gamma et al    1995   known as the    Gang of Four    or    GoF    book     gt The BSD filesystem code  for example  is an object   oriented design trapped in a body of C code  See Vahalia   1996  page 236  and McKusick et al   1996  page 205     6 Brooks  1983  theorizes that programmers understand  programs using a top down approach  making and refining hy   potheses  He suggests that evidence for hypotheses is gathered  by looking for    beacons    in the code whose presence signals  certain data structures or operat
7.  fine   grained version tracking  even in the final exe   cutable code  and  depending on the package  can  scale to permit multiple programmers to work on  the same body of code concurrently     Build the Code    The next step is to figure out how to build the  code  converting it into some executable form   This tends to be a very language  and operating  system specific task  and in extreme cases may  require a great deal of arcane system administra   tion knowledge  Ultimately the code must be run     22    however  so this step is a necessary evil   Some common problems at this stage include     Different tools  Commonly used tools for build   ing programs include compilers  assemblers   and linkers  It   s easy to recognize a tool which  is completely absent  but more subtle problems  can arise if the tools you have installed are not  the same as those used by the code   s author  For  example  a different version of a compiler may  accept a slightly different language or contain  different bugs  In extreme cases  building the  code may require installing a different version  of tools first     Different environment  Your environment may  be different from that of the code   s author in  other ways besides the tools you have  Often  pathnames need to be changed or environment  variables need to be set  These are usually quite  easy to fix    A much more difficult problem is where your  version of the operating system is different from  one the code supports  Obviously  
8.  for it will sim   plify both testing and debugging  For example   imagine a program using a pseudo random num   ber generator  whose initial seed is the current  time  The program can have code allowing the  generator   s initial seed to be specified  yielding  the same pseudo random sequence each time     The Impossible    Some conditions simply    can   t happen    in code   While the impossible is rather trying to test  it is  wise to at least guard such cases using assertions   An assertion is a check placed in code that causes  it to fail in a controlled fashion should the con   dition ever arise when the program is executing   Program conditions thought impossible when the  code is written are known to arise occasionally as  a result of code modifications        4 4 Tools    Various tools exist to help with code testing     Code coverage  Some code profiling tools can  dynamically determine code coverage when a  program executes     Memory  Languages prone to memory problems  can benefit from testing with memory analy   sis tools  Such tools may watch for allocated    35    memory areas being exceeded  look for mem   ory leaks  or spot memory which goes unused  for suspicious amounts of time     Noise generators  Often the best test input is not    a human devised one  Noise generators pro   duce long  random program inputs which can  be fed to programs to watch their behavior un   der unusual circumstances     More sophisti   cated methods being researched also includ
9.  i jj1   m  u v   e amp  amp A 10     gCj m char j    m   if j gt  r B m j  2 s   isdigit  j      j  46 amp  amp isdigit  j 1     for  h j  h lt r h   if    isalnum  h  amp  amp  h  958 amp   s     h  46  amp  amp   s   h  1    101 amp  amp h    1    69     cc    h   break if h gt j B m h  0  x 1   jjh lt r amp  amp C    t n    h   h     if h gt j h   x 3 if      39  for  h j 1 h lt r amp  amp     j sh    if   h  92 h    for  y amp  amp strncemp y  j 2   yt  2   if  y h j 1 if  strncmp      2    h j 2 while    h  42        h  47   x 4   m h 1 B x                          Hard style guidelines  Although more the ex   ception than the rule  some programming lan   guages enforce certain style guidelines  Python   for instance  groups statements together using  indentation    There may be other constraints which are not  fixed  but may be difficult or time consuming  to work around  Some programming tools  like  editors  may support a specific code layout by  default which may not be ideal  However  es   pecially when working with a group of people   there may be a tradeoff involved between the  perfect layout and having everyone reconfigure  their tools     Soft style guidelines  Established languages are  likely to have established coding style guide   lines   More likely  they will have several com   peting guidelines   If you devise a    better    cod     49    ing style  you run the risk of rendering your  code unreadable by others  simply by virtue of  being different  Anot
10.  idea is nicely captured by the aphorism    Each  new user of a new system uncovers a new class of bugs       Kernighan  as quoted in Bentley  1988  page 60      27 Brooks  1995  page 55  and Myers  1976  page 191  make  the argument that testing is an inherently destructive process   and the creator of some code isn   t really going to want to de   stroy it  especially when finding flaws in the code may reflect  on the skill and ego of the programmer    28 Or handles an error in some other way  like catching an  exception    29 Some programmers use assertions only for testing and de   bugging  then disable them in the production version of a pro   gram  Whether or not this is wise can be debated at length    30 For example  a noise generator has been used to give  Unix utilities a workout  Miller et al   1990   This technique  is also referred to as    Monte Carlo    debugging  Bell  1983       fuzzing     Sutton et al   2007  or  touchingly  Gremlins  Maas   2003     31 Chan et al   2004  describes such a system being used to  test a commercial computer game  for instance    32 This record can also be analyzed to gain insight into pro   gram design  bugs  and debugging  Knuth  1989   for instance   dissects the log book he kept for ten years    worth of TeX de   velopment    33 One spurious bug in a program was found by running the  program repeatedly with a script  The bug  on average  showed  up once every 100 times the program was executed    34 The author once found a c
11.  license  5  26  54  open source  56   line length  51   literate programming   53 54    72    mental model  4  method   see subroutine  middle out  7  design  47  module  6  7  47  defined  2    name  defined  2  noise generator  36    object oriented  7  15  off by one error  33  overloading  15    patch  22  23  Perl  14  48  postcondition  42  PostScript  48  precondition  42  pretty printer  9  20  procedure    see subroutine  production system  21  Python  49    regression test  29   regular expression  10  reverse engineering  5  17  revision control system  22    searching  9 12  26  43  56  security auditing  4  Sherlock Holmes  44  side effect  14  43  spaghetti code  17  static scope  14  stress test  31  41  stub  43  subroutine   defined  2  syntax highlighting  9    tags  10  test  harness  32  program  14  suite  24  27  29  31  system  21  testing  3  24  27  28   31 36  43   see also regression test   see also stress test  top down  7  design  46  47  tracing  39 40    variable  14  15  17  39  42   53     see also name    white box testing  32    73    
12.  nications of the ACM  A more critical look is provided by Con   stantinides et al   2004      8 For example  the Lex and Yacc compiler tools  Levine  et al   1992     19 For a more complete list  see Collberg et al   1997     20 As immortalized by the slogan    There is no    I    in    team         There is no    I    in    moose    either  but this is probably coinci   dence    21 Mohan and Gold  2004  have done a study of how code  style changes over time with maintenance programming  i e    code modification    22 Like diff on Unix systems    23 A graduate student at the author   s university did this in the  early 1980s  he changed the output of the    ls    program which  caused a backup script to quietly fail  the problem was not  discovered for several months afterwards  when a file needed  to be restored    24 Littman et al   1986  studies two strategies used by pro   grammers for a code maintenance task  systematic  where the  programmer would study the code extensively before making  changes  as needed  where the programmer would take a lazy    60    approach to studying the code  In their study  only systematic  programmers were successful  They point out  however  that  the key is constructing a strong mental model of what   s hap   pening in the code    25 The    best    way to design and implement code often de   pends on the context  For instance  engineering tradeoffs are  commonly made between simplicity and efficiency  or between  time and space    26 This
13.  of forming  and testing a hypothesis     One Change at a Time    Complex pieces of code can interact in complex  ways  When you make a change to code  you need  to ensure that it has the desired effect  and that any  change in the code   s behavior is due to the change    24    you just made in the code  If you make multi   ple changes to the code  there is always the dan   ger that the changes will interact in some unpre   dictable and hard to debug fashion  Part of tak   ing a scientific approach to code modification is  that you must understand exactly the effect of each  change     Check Context    You should always be aware of the context in  which your modifications will take place  For ex   ample  if you change the output format of a pro   gram  and other programs rely on that format  then  you can break a lot of code in one fell swoop      No code exists in a vacuum     X Marks the Spot    Code modification should be a precise operation   Using your code reading skills  carefully pinpoint  the areas you must change to get the desired ef   fect  Take your time  A carpenter is supposed  to measure twice and cut once  your advantage is  that  unlike the carpenter  your changes can almost  always be undone  The tradeoff is the amount  of time you spend thinking ahead of time versus  the amount of time you spend debugging after   wards     For hard to understand code  it may be  helpful to use a debugger ahead of time to step  through the code and unravel its meaning    W
14.  or constants     e    Input    is used in a general sense to include  all sources of input  like files  keyboards  and network connections  as well as event  sequences in a windowing environment     e    Editor    includes both text editors and edit   ing facilities in integrated development en   vironments     Only technical issues facing individual pro   grammers are considered  Situations like pro   gramming in groups also involve communication  and social issues which are outside the scope of  this book    You won   t understand everything in this book  the first time through  This is intentional  As  you grow as a programmer  this book will grow  with you  and increasingly more of the advice will  make sense  Just like code  it is meant to be read  and re read     2  Reading Code    Code is a specialized form of communication  from human to computer  but also from human to  human  Just like other types of specialized com   munication     legal documents  recipes  patent ap   plications     code takes practice and experience to  properly interpret     2 1 Have a Purpose    When you read a book or magazine  you have a  specific goal in mind  This may include entertain   ment  education  reference  or simply killing time   Your goal determines what details you focus on  and retain while reading    You should have a goal in mind when reading  code too  for the same reason  You may  for in   stance  be interested in the flow of control in the  program  or you may be acutely 
15.  these shared resources will require special atten     17    tion to fully understand what the code is doing in  relation to other threads  Apart from those trouble  spots  it   s safe to begin with the assumption that  the code you   re reading operates independently of  all other code  the assumption you would usually  make when reading code     Interrupts    Code using interrupts     especially asynchronous  interrupts  which can happen at unpredictable  times     can also be difficult to read  Code in an  interrupt handler can cause the program state to  suddenly change in ways which are not obvious  from reading the rest of the code  It   s a good idea  to identify interrupt handlers when reading code   to determine what they do and when they are trig   gered     2 6 Practice    Good code reading skills are developed only  through practice  A good way to start is by read   ing code for design comprehension  Fortunately   there is lots of source code readily available via  the Internet  you can pick some application of in   terest to you and begin reading    Different types of application and different pro   gramming paradigms will read differently  Graph   ical user interface code will be different from  operating system code  functional programs will  be different from imperative ones  A good code  reader will be experienced in them all to some de   gree     18    3  Modifying Code    Good code modification is a disciplined  scientific  process  which can be approached 
16. Reading  and Modifying  Code    John Aycock    Any trademarks used in the text are the property  of their respective owners  The code on pages 49  and 62 is used with permission of the IOCCC     Copyright    2008 John Aycock   All rights reserved   ISBN 978 0 9809555 0 7    For Cliff    Contents    Preface    1    7    Introduction   Reading Code   Modifying Code   Testing Modified Code  Debugging Modified Code  Writing Readable Code    Summary    Notes    Bibliography    Index    Vil    19    31    37    45    57    59    65    71    Preface    If you already know how to read and modify code   this book is not for you  Go buy a good novel  instead    This book is intended for people who already  know how to program  primarily at the univer   sity level  Code reading and modification is not a  skill which is always taught  even in higher level  computer science courses  There are few good re   sources on this topic  In any case  pointing stu   dents to some mighty tome is often counterpro   ductive  This book is meant to fill the gap  by pro   viding a language independent  low cost  easy to   carry guide  which can be used as a supplementary  course text for programming courses    Thanks to Darcy Grant  Nigel Horspool  Shan   non Jaeger  Cliff Marcellus  Joe Newcomer  Craig  Schock  and Jim Uhl for reading and commenting  on various drafts  Rob Walker was the friendly  neighborhood authority on aspect oriented pro   gramming  and Margaret Nielsen pointed me to  some interest
17. alue upon an increment  Im   plicit boundary conditions should be tested as  well     33    Ask for Help    Have other people test your code  Other program   mers as well as ordinary users are all valuable in  terms of testing  because they bring a fresh per   spective which may be wildly different from your  own     It is also possible  as a programmer  to be   come unable or unwilling to see obvious flaws in  code  especially where fixing the flaws is hard to  do        4 3 Test Friendly Coding    Error Conditions    Many system calls and library subroutines return  an error status  You cannot properly test your code  unless it checks for errors  because otherwise  parts of your code may be failing silently  All er   ror return values should be checked and handled  appropriately    When an error is detected  a detailed  unique  diagnostic should be produced  Certain pro   grams  especially concurrent programs and pro   grams which interact with others in complicated  ways  may only produce an error under unusual  conditions which are hard to duplicate  The more  information available in these situations  the bet   ter     Determinism    Code being tested with the same inputs  in the  same environment  should do the same thing each    34    time it   s run  Unfortunately this is not always pos   sible  concurrency  for instance  may be a nec   essary part of a program   s design  If there are  sources of nondeterminism that can be disabled  temporarily  writing code to allow
18. an   t  be coerced by an attacker into doing anything  it   s not supposed to  This requires specialized  skills and is beyond the scope of this book     Reverse engineering  Again requiring special   ized skills  reverse engineering takes an exist   ing piece of executable code and works back   wards to reconstruct how it works  Reverse en   gineering typically relies upon tools  like disas   semblers and decompilers  It is somewhat of a  legal quagmire  because some software licenses  strictly prohibit reverse engineering  yet there  are often compelling reasons to do so      Design comprehension  Understanding code de   sign means reading the code with a high level  perspective  you want to discover how all the  different pieces of the code fit together and call  one another  Design comprehension is often a  prelude to other types of code reading  It can  also be used for    design recovery     when deal   ing with old  legacy code whose original design  has been lost or altered beyond recognition     Documentation  Code may need to be read while  writing documentation in order to verify de   tails of its operation  Internal documenta   tion  like comments  tends to be closely linked  to the code  external documentation  on the  other hand  may require reading the code for  behavioral rather than implementation details   For example  an external document describing  an API would probably omit implementation   specific information     Maintenance  Reading for code maintenan
19. ast winners have included  PostScript  Forth  and APL    This is a somewhat unfair designation for a  programming language  for three reasons     1  Code written in such a language is quite    meaningful to an expert who is regularly  immersed in the intricacies of the lan   guage  Such experts are not the norm   however       Some languages require different ways of    thinking about programs  Going from one  language paradigm to another  for exam   ple  is not necessarily an easy task       It   s possible to write bad code in any lan     guage  There are even contests to write    bad and or obfuscated code  here is one  prize winner for a C obfuscation con   test      include  lt ctype h gt     include  lt stdio h gt     define _ define    _ A putchar    _ B return    _ C index   char r  c  300001     d   gt   lt          amp  amp   gt      gt  gt  lt  lt     i  1  j    m  k  n  h  y e u 1 v w  f 1 p s x main a  b  cha    p   a gt 1 atoi b 1    79 r c read 0  j l i c   300000    v g j  amp m      for  k m  v  2  j k m n  v w  k m   w g k   amp n    if  v  1 amp  amp m j    1 amp  amp   j  35 e amp  amp A  10    e f 0  if    f amp  amp v  3 amp  amp  char  C j   10    lt m A 10   e 0 f 1l else if v gt 2 amp  amp  u   w  amp  amp       u  amp  amp  1 i gt 1      i  61   n k gt 1   CC    amp    k   continue else if v  3 if   f amp  amp e 1 n k gt p amp  amp e A 10   e 0 else A 32  e   else if f amp  amp e   m j gt p amp  amp e  A 10    e 0  e  m j  k j  while  kam  A  k      
20. ce night   mare      Also  the chance of your changes being  adopted by the original code author diminish con   siderably if you don   t adopt their coding style    Tools immediately enter into coding style de   bates  A common argument is that a particular  editor doesn   t support the code   s style by default   the counterargument is that a professional should  learn how to operate and configure their tools   Pretty printers can reformat code  and in theory  code can be written in any style  then automat   ically reformatted to the project   s coding style   Unfortunately  pretty printers are not always able  to perfectly reformat and may make a mess of  code in certain circumstances     it is safest not to  rely on them    A related issue is coding consistency  Your  modified code should be consistent with the orig     20    inal code in terms of the libraries and subroutines  it calls to perform specific tasks  and the idioms  it uses  for the same reasons that you follow the  project   s coding style     Production vs  Test Systems    A production system is a system which is in   stalled  running  and relied upon by people  Never  directly modify a production system  Instead  you  should set up a private test system which you can  modify with impunity without affecting anyone  else  The test system should mimic the production  system as closely as possible  Eventually  once  your changes have been made and tested  they can  become part of the production system  The pro 
21. ce pur   poses is done with a specific question in mind   where do I need to change the code so that it  does X  Maintenance may involve debugging  too  where do I need to change the code so that    it stops doing X  You need to read the code  to find the target location  as well as to under   stand the target location   s context and connec   tion with the rest of the code     Some types of reading are naturally more neb   ulous than others  The difference depends on  whether you   re looking for the known  e g   a re   producible bug  or the unknown  e g   any poten   tial bug      2 2 Understanding the Design    Even if you   re not reading for design comprehen   sion purposes  a basic understanding of the code   s  design will be of tremendous use     Generally  you  will be trying to identify three things     Modules  You need to find the largest basic     chunks    or building blocks in the code  This  is an initial level of abstraction when reading  code     Dependencies  Once you   ve found the modules   you must determine how they fit together  In  other words  how do modules use and interact  with one another  There are actually two types  of dependency  inter module dependencies are  between modules  intra module dependencies  are within a single module     Key data structures  Discovering the type and  role of important data structures can allow the  code manipulating them to be abstracted away      For example  finding a table that encodes all  the commands a pro
22. cially when  you make mistakes     there is nothing to cement a  design lesson like working with a flawed design  of your own making    There are some standard approaches to good  design which are worth considering     Isolating dependent code  Ideally  any code that  is dependent upon something else should be  separated out  Code can be dependent on many  things  target architecture  operating system   windowing system  specific libraries  Identi   fying and isolating this dependent code helps  abstract your design away from minute details   and makes your code more portable     Directional design  There are three    directional     design methods  A top down approach starts  from a very high level and progressively breaks  the programming task down into smaller and  smaller pieces  Bottom up design starts with  the low level building blocks of a program    46    which actually do the work  piecing them to   gether until the program is complete  Finally  a  middle out design strikes a balance between the  two approaches  building and breaking down    The design method may vary with the pro   gramming task  Creating a good set of building  blocks for a bottom up design comes through  experience  Top down designs are useful for  prototyping  where you may not yet know how  to construct the building blocks  it is also use   ful for undesirable programming tasks  because  it allows the    real    work to be deferred as long  as possible     Coupling and cohesion  The parts of a 
23. code can be rebuilt   Then  run the code to test your hypothesis  Did  you predict the outcome correctly  If you did  you  should proceed to test the changed code exten   sively to ensure that you haven   t introduced any  bugs  If the code has a test suite  then it   s good  to add new test cases to it that exercise your code  modification    The other case is where your hypothesis failed   As part of the scientific process  you need to find  out why this happened  Remember that you   ve  modified a large piece of code which you may not  fully understand     always start by assuming that  the error is yours     1  Examine your modified code for bugs   Does it behave the way your hypothesis said  it should     2  Re read the code  Verify that you have cor   rectly understood how the code you   re mod   ifying interacts with other code  Is it possi   ble that you have chosen the wrong spot to  modify     Once these errors have been ruled out  you can  start expanding the search     3  Look for bugs in the original code  Your  modification may be taxing the code in  some new way that reveals a previously   hidden bug     27    Finally     4  Re examine your hypothesis  If everything  else checks out  then you may simply have  incorrectly predicted the outcome of your  modification  It   s best to leave this possi   bility until last  because it   s very easy to  be lazy and change your hypothesis out of  hand  potentially missing some problems     At the very least  an inspecti
24. coping  Most languages have    static  scoping     which means that it   s always possi   ble  given a name in the code  to decide what  that name refers to just by looking at the code   With dynamic scoping  what a name refers to  may change depending on how the program  executes  In other words  determining what  a name refers to in a dynamically scoped lan   guage is undecidable     14    Dynamic typing  In dynamically typed program   ming languages  the type of a name depends on  the type of what was last assigned to it as the  program executes  As with dynamic scoping   it   s not always possible to determine the exact  type of a name     Overloading  Some languages support overload   ing of subroutines or operators  This means  that the exact code used in any given context  may be dependent on the types of variables in   volved  and the number of arguments  For ex   ample  if the   operator is overloaded  the ex   pression a   b may add a and b together  or  it may post your credit card information to the  Internet  When reading code in the presence of  overloading  you must work out exactly what  code will be executed     Inheritance  Object oriented programming lan   guages allow classes to inherit variables  con   stants  and subroutines from one another  Like  overloading  reading code with inheritance  means that it can be difficult to determine what  code will be executed    For both inheritance and overloading  code  is often spread across multiple files  com   p
25. de and the original code can be found automat   ically using tools  by comparing the current code  against a backup copy     Internal State Information    It is essential when debugging to have information  about the internal state of an executing program   There are several ways to gather this information     Output  Any visible form of output can be used  to relay state information from a program  This  includes print statements and log messages  as  well as low bandwidth outputs like LEDs and  foreground background colors     all these can  be used to convey information    The idea is to add debugging code into the  program in places where you want to query  its state  Debugging code is often    quick and  dirty    code added in haste  but care should be  taken     e The program   s normal operation must  not be changed by adding the debugging  code     e Double check that the state information  being output is in fact the information you  think is being output     38    e Make sure that potential error conditions  in the debugging code are handled     Carelessly written debugging code can waste  lots of time with wild goose chases    It   s good practice to flag debugging code  us   ing specially marked comments  or by outdent   ing it  or conditionally compile it in  so that it  can be found and removed easily once the bug  is fixed     Debuggers  A good debugger is an invaluable  tool  Among other things  it allows program ex   ecution to be stopped at specified breakpo
26. due to the ef   fort involved  it   s preferable to avoid changing  your operating system  but in some situations  it may be the only choice  With luck  the dif   ference can be smoothed over with some mi   nor code changes     essentially  this amounts to  porting the code  Often a good compiler is your  guide  its error messages pointing you to the  differences you need to patch     Code dependencies  One piece of code may de   pend upon some other code being built first   Typically  the build instructions for code will    23    take this into account  but in case of build prob   lems it is worthwhile to keep an eye out for this     Missing pieces  As well as dependencies within  the code  there may be dependencies on exter   nal things  Some code relies on third party li   braries and packages which must be installed to  complete the build     Test the built code to make sure that it works   Ideally  the code will come equipped with a test  suite which can be run to verify its correct oper   ation   Practically  such test suites are more the  exception than the rule      3 3 Making the Change    What constitutes a    change     When modifying  code  you are making a logical change  such as  adding support for a new feature  Making this log   ical change may require multiple lines of code in  multiple files to be added  changed  or deleted    The process for modifying code emphasizes be   ing careful and methodical  One change at a time  is made  using a scientific approach
27. e  learning algorithms to automatically develop  and learn input sequences that cause program  malfunctions      Debuggers  The primary purpose of debuggers is    36    debugging  obviously  However  their ability to  stop an executing program at a specific spot and  modify its state can be used to force code into  places which are otherwise hard or impossible  to reach     5   Debugging  Modified Code    Debugging modified code is like testing modified  code  the techniques for modified code are much  the same as you would use for a whole program    The base assumption when debugging modified  code is that new changes are responsible for new  behavior  Your code modifications are likely sus   pects for any new deviant behavior  using the be   havior of the unmodified code as a basis for com   parison  If you   ve made only one change at a time   this further narrows down the culprit  A bug may  be deceptive  though     it may not manifest itself  directly in the modified code  but may cause other  code to break     5 1 Vital Information  To debug effectively  you need information about    the state of the code and the internal state of the  executing program     37    Know What has Changed    You should ensure that you know exactly what  code has been changed  since any of the changes  may be contributing to the problem  In some  cases  the changed code will be obvious  but in  others it may be scattered throughout the body  of code  The differences between your modified  co
28. e parameters  Situations where arbitrary  values are used in code should be noted  These  values  while correct  may present later oppor   tunities for tuning and optimization     Better algorithms  Better choices for algorithms  may come to mind when writing code  like the  possibility of using a binary search instead of a  linear search  but you may not have the chance  to implement them  It   s always a good idea to  add a note about what algorithm should be used      at the very least  it tells people reading your  code that you did know what you were doing     When writing about such problem areas in  comments  it   s good practice to mark them so that    55    they may be easily searched for later  The strings     XXX    and    TODO    are often used for this pur   pose        XXX   find an algorithm to see  if this code terminates       6 5 Practice    Coding and design skills improve with practice   It   s wise to start small  with coding problems  you can finish in one sitting  Programming lan   guage textbooks often have short exercises in  them which are suitable  or use problems from  programming competitions  For larger projects   choose something you   re interested in  or a pro   gram you need that doesn   t exist  If you don   t  want to start coding from scratch  there are a  seemingly infinite number of open source projects  which are both available and in dire need of major  coding contributions     56    7 Summary    The three most important things in real
29. earch all files for init  being the com   mon part of    init        initialize     and    initial   ization     A case insensitive search will also  find instances with different capitalizations     like    doInit        2  Filter out extraneous results  if necessary   A good way to do this is by searching the  search results themselves  but negating the  result     most search tools permit this  In  other words  search the results for every   thing except some term     3  Expand the search to include logical syn   onyms  In this case  you might also try     start    and    main        4  Start looking through the code for clues   Initialization code is usually called early on   so you can start reading the code from the  place where it would normally start execut   ing  The idea is to look for likely search  terms that you may have omitted     a call to  a    setup    subroutine  for instance  might be  the vital clue     11    An alternative sequence     1  Try to first narrow down the search to the  joystick related code  by searching for    joy   stick    in the code body or by simply look   ing for files with    joystick    or some related  term in the filename     2  Look at the volume of code you   ve discov   ered  For relatively small amounts of code   it can be faster to page through the code  manually  skimming it for subroutines of in   terest  Otherwise  this  smaller  set of files  can be searched using the usual tools     2 4 Vital Information    Obviously  wh
30. ee API  aspect oriented  15 16  assertion  35    backup copy  21 22  38  black box testing  32  bottom up  7   design  46 47  boundary condition  33  breakpoint  39    C  49  C    14  class  7  15   see also module  code  complexity  51  coverage  32  35    dependency  6  7  10  15     23 24  46  formatter  9   see also pretty printer  machine generated  16  maintenance  5   6  20   52  53  obfuscated  17  49  profiling  35    Index    restructuring  28 29   50 52  review  4  style  9  20 21  45   48 50  cohesion  47  comment  5  9  12   13  26   39  42  53 55  concurrent program  4   17 18  34  35  constant  15  53   see also name  core dump  39  coupling  47  cut and paste  51 52    data structure  6  17  47  debugger  25  36  39  44  debugging  4  5  25  35   37 44  design  bottom up   see bottom up design  comprehension  5  6  18  middle out   see middle out design  pattern  7  47  recovery  5  top down   see top down design  visualization  10    71    determinism  34 35  divide and conquer  42 43  documentation  5  12  45   53 56  dynamic  scope  14  15  typing  15    editor  9  10  20  49  defined  2  error checking  34  39    file  9 12  15  22  24   see also module  input  17  32    see also input   Forth  48   Fortran  14   function   see subroutine    Heisenbug  44    idiom  13 14  21  45  50  indentation  49 51  inheritance  15  input  defined  2  integrated development  environment   see editor  interface   see module  interrupt  18    Java  17  53  Javadoc  53   
31. en reading code  the code itself is  an excellent source of information  There is other  information to draw upon  however     some is ig   nored by the computer  some is written in a short   hand way  and some isn   t there at all     Comments    Comments  and more generally  external docu   mentation  appeal to humans reading code be   cause the computer does not look at them     Com   ments are an aside directed solely to humans   Unfortunately  this is also the downfall of com   ments  There is nothing to ensure that the com   ments are correct and that they are in synch with  the code    Where comments are present  there    12    are four cases with respect to the correctness of  code           code code  incorrect correct  comments  incorrect x x  comments  correct x y          You are  needless to say  only interested in the  case where both comments and code are correct   The tricky part is deciding when that is  You  should use comments as a guide to your read   ing  giving them the benefit of the doubt for effi   ciency   s sake  but always remember that the com   ments may be misleading     Idioms    Programming languages have idioms just as hu   man languages do  Recognizing an idiom when  reading code can give immediate understanding  about a piece of code and what it   s doing  Idioms  are learned through the process of reading and  writing code  and so require a certain amount of  expertise in a given language  Fortunately  unlike  human languages  the rigid na
32. ength      typically  80 columns is advisable    First  a relatively short line length improves  readability  Newspapers  for instance  still use  narrow columns to allow a good reader to simply  read down the column with no wasted eye move   ment     The same principle applies to computer  code  A long line  or worse  a long line wrapped  around the screen or a printout  means extra work  for a reader to put all the pieces together    Second  when combined with good spacing and  indentation  a fixed line length is a good heuristic  measure of code complexity  If you can   t express  a line of code in 80 columns using tabs for inden   tation  then it   s a strong indicator that you should  examine what you   re doing  A subroutine may be  needed  or it may suggest that the code needs re   structuring or a completely different approach  If  the code is hard to write  it will likely be hard to  read too     Cut and Paste       Cut and paste    coding is the derogatory term  used to describe copying code from one place to  another in a body of code  possibly making a small  number of changes to the copied code  This sends    51    a strong signal that code restructuring opportuni   ties are present  It also makes code less maintain   able  because bugs are also copied     fixing a bug  fully means tracking down all similar copies of the  buggy code    From the readability point of view  copying  code burdens the code reader by forcing them to  read the same code again and again 
33. gitimately lies in the  input or an incorrect interpretation of the output   Double check inputs and outputs  keeping in mind  that some things  e g   whitespace  control char   acters  nul characters  may not be visible to the  naked eye  Tools that overtly    dump     or print  in   put and output may be helpful  such tools can be  quickly constructed if they are not readily avail   able    Another thing to check is the resources that the  program needs  Is any required hardware attached  and operational  Is there enough disk space  and  are file permissions set correctly  Is the program  executing in the correct environment and loca   tion     41    Hypothesize and Test    Internal state information is used to probe the state  of a malfunctioning program  A scientific ap   proach can be taken  just like the one used when  modifying code   gt  Make a specific hypothesis  about the program   s state that can be verified by  gathering internal state information  For example      at line 452  the pointer variable p should point  to an element of the array A     Then  gather infor   mation to test your hypothesis  If the hypothesis  is wrong  then you are on the trail of the bug  or  your understanding of the code is incorrect  but  arguably  you   re still on the trail of the bug     Instead of probing a specific point  another  approach is to hypothesize how the program   s  state should be changing as it executes     Here   you would form preconditions and postconditions  ab
34. good de   sign     call them modules     should exhibit a  high degree of cohesion and a low degree of  coupling  High cohesion means that a module  does one specific task  like implementing a data  structure  and everything in that module is used  toward that end  Low coupling means that a  module is not intimately connected with the in   ner workings of another module     Design patterns  Object oriented designs have a  wealth of    design patterns    to draw upon  Ef   fectively  this creates a shorthand vocabulary  for describing certain designs  The drawback is  that a person reading the code must understand  this same vocabulary for the shorthand commu   nication to be useful  At the very least  design  pattern bestiaries can act as a helpful source of  design inspiration     Ultimately  good code design is a black art  As  a heuristic  try and imagine if your design will    47    make the code easy to read and modify using the  approach of the last few chapters     in other words   is your design rational and logical     6 3 Code    Name Your Poison    The programming language you write your code  in will undoubtedly bring coding style constraints  with it  Some constraints are more subtle than oth     ers     Write only languages  Some languages are re     48    ferred to as    write only    languages  because  code is fully understood only once  when it is  written  and it is next to impossible to read  afterwards  Perl is the current frontrunner  in this category  p
35. gram understands probably    means that you don   t need to thoroughly read  the code that interprets that table     Modules and their dependencies may be looked   for in a    directional    fashion  top down  follow    ing the way the code would be executed  bottom   up  reading the code linearly and trying to piece it  together  middle out  using a combination of top   down and bottom up reading    In object oriented code  you may also be look   ing for    Design patterns  A design pattern is just that  a  code design which can be applied in a specific  situation that matches the pattern 4 Recogniz   ing such patterns in the code can quickly give  you a high level view of the code   s design  In  theory  design patterns aren   t limited to object   oriented code  but they have found their widest  usage there to date     Class relationships  How are classes in the code  related to one another  For example  they may  be arranged in a hierarchy  and extend and be  extended by other classes in various ways  Un   derstanding class relationships is critical to un   derstanding an object oriented design     Less frequently  you may read code whose ac   tual design cannot be expressed well using the  implementation language   The code author may  have made Herculean efforts to implement the de   sign  and a deep understanding of the code can re   quire abstracting away the excess implementation  details    You may find it helpful to construct hypothe   ses about the code design as 
36. hen modifying code  you want to be a sur   geon with a scalpel  not a monkey with three sizes  of hammer     25    Form Hypothesis    What do you expect to happen  Before chang   ing any code  mentally form a hypothesis stating  what you think will happen when you make your  change to the code  Phrase it in terms of some  observable  verifiable effect  For example     When I add this    print    statement  I  will see the size of the list printed to  the screen just before the error mes   sage box pops up     Forming a hypothesis gives you a way to test  both your understanding of how the code operates   and the efficacy of your code modification  It   s  important to do this before you make the change   since it   s too tempting to fudge it after the fact      yeah  that   s what I thought would happen         Make and Mark    Now  make the modification to the code  It   s  good practice to mark the change with a comment  which briefly describes who made the change   when it was made  and why it was made  If you  use your initials to record    who     then it gives  you a mechanism to easily search for changes you  made to the code  Also  you can think of mark   ing your modifications as a professional courtesy  to the original code author  so that they aren   t held  responsible for your modifications  and vice versa    Some code licenses may legally require changes  to be marked  too     always read the fine print      26    Test Hypothesis    Once the change is made  the 
37. her tradeoff to consider     Idioms  Experienced code readers will be expect   ing language specific idioms to be present and  used appropriately in the code  Using code  idioms can impart a lot of information very  quickly     Spacing and Indentation    Youwillprobablyfindthissentencehardtoread    Spacing plays the same role in code as it does  in prose  Or imagine your favorite music  played  without any rests  In music  when you don   t play  is as important as when you do play  and the same  concept is true for readable code    There is no advantage to writing reams of code  with insufficient space    Your code doesn   t run  any faster  and you don   t save any substantial  amount of disk space  As a concrete example   for many languages you can indent code with  tabs  where a tab is eight spaces  and use spaces  liberally elsewhere  Visually  your code should  look like it has    elbow room        it shouldn   t look  cramped    Having said this  the need for too many levels  of indentation may indicate a design flaw  The  code may need restructuring with subroutines  or  perhaps there are an excessive number of special  cases that can be generalized     50    Line Length    Line length is obviously tied in with code spacing  and indentation  It may seem like a holdover from  the dark ages of computing  from punched cards  and character only video displays  and to a certain  extent it is  However  there are some good reasons  to strictly adhere to a certain fixed line l
38. hodical  scientific process  As  with code modification  it   s a good idea to record  your work  This helps avoid duplicating work by  keeping track of what you   ve done throughout a  complicated debugging session  it also leaves a  record which can be referred to later if a similar  bug arises     how did I fix that before        2    Reproduce the Problem    If you can   t observe a problem  you can   t fix it   The first step when debugging is to reproduce the  problem  This may also be the hardest step  some  bugs only crop up under unusual circumstances   like high loads or complex interactions with other  programs  If you   re not able to reproduce the  problem  then you   re reduced to blindly reading    40    the code for bugs    Ideally  you want to not just reproduce the prob   lem  but reproduce it in the simplest  shortest way   Any inputs should be pared down to the bare min   imum necessary  this reduces the amount of code  to wade through before reaching the suspect parts    Sometimes  spurious bugs may be reproduced  by stress testing  repeatedly testing the suspect  area of code until a failure occurs        The Obvious    Always start debugging by looking for obvious  problems  Although it may seem silly  it   s pos   sible to waste a great deal of time looking for a  complicated answer to a problem when a simple  one suffices      One obvious thing to verify is whether or not  you re actually seeing a bug  Sometimes  the code  is correct  and the error le
39. in a step by   step manner  The basic assumption is that the code  has been designed and written in a logical  rational  way  in which case it isn   t necessary to fully un   derstand the whole body of code in order to make  small  localized changes     3 1 Good Practice    Take Notes    Good code modification is like conducting a sci   entific experiment  Like scientists  it is advisable  to keep notes while making code modifications to  keep track of what you   ve done  Not all the things  you did and attempted will be reflected in the code  or its backups  For instance  the way you build  and install the code will not be there  nor will  any modification dead ends that you backed out  of  Careful notetaking also allows you to record  the rationale for making certain coding choices     19    this may be obvious at the time you   re immersed  in the code  but obscure later    Time and interruptions cause details to vanish   A good rule of thumb to start with is to write down  anything for which you think    oh  PI remember  that    or    I can figure that out again        Coding Style    When modifying code  you have informally joined  a pre existing team  Part of being on a team is  conforming to certain team standards in prefer   ence to individual ones  which in the case of  code modification means that you must abide by  the project   s coding style even if you don   t like it   A project involving ten different programmers and  ten different coding styles is a maintenan
40. ing Pearls  Addison   Wesley  1988     S  Bourne  A conversation with Bruce Lindsay   ACM Queue  2 8  22 33  2004     F  P  Brooks  Jr  The Mythical Man Month  Essays  on Software Engineering  Anniversary Edition   Addison Wesley  1995     R  Brooks  Towards a theory of the compre   hension of computer programs  International  Journal of Man Machine Studies  18 543 554   1983     B  Chan  J  Denzinger  D  Gates  K  Loose  and  J  Buchanan  Evolutionary behavior testing of  commercial computer games  In Proceedings    65    of the 2004 Congress on Evolutionary Compu   tation  pages 125 132  2004     C  Collberg  C  Thomborson  and D  Low  A tax   onomy of obfuscating transformations  Tech   nical Report 148  University of Auckland  De   partment of Computer Science  1997     C  Constantinides  T  Skotiniotis  and M  Stoerzer   AOP considered harmful  In European Inter   active Workshop on Aspects in Software  2004   Position paper for panel session     T  M  R  Ellis  A Structured Approach to FOR   TRAN 77 Programming  Addison Wesley   1982     E  Gamma  R  Helm  R  Johnson  and J  Vlissides   Design Patterns  Addison Wesley  1995     J  D  Gould  Some psychological evidence on  how people debug computer programs  Inter   national Journal of Man Machine Studies  7   151 182  1975     L  Gugerty and G  M  Olson  Comprehension  differences in debugging by skilled and novice  programmers  In Soloway and Iyengar  1986    pages 13 27     R  Jeffries  A  A  Turner  P  G  Polson  and
41. ing references    I hope you find the advice in here useful     Vil    1   Introduction    To become a good writer  you practice writing  A  lot  You also read the works of great writers  And  study them     how is the plot developed  what  words are selected and why  You also read a lot of  work that isn   t so great  and figure out why  so you  don   t make the same mistakes  You edit works in  progress to improve their presentation    Becoming a good programmer requires the  same process  You must practice programming   You need to read and study the code of great pro   grammers  as well as not so great programmers   You must determine how to modify and improve  code    Code is read many more times than it is writ   ten  so it makes sense to look at ways to create  readable code  Maintenance programming is also  a mainstay of programming  for better or worse    This book is a guide to reading code  modifying  code  testing and debugging modified code  and  writing readable code  It does not include much  code  on purpose  The ideas and advice in here are  largely independent of constantly changing pro   gramming languages and tools  For this reason     generic terms are used where possible     e    Subroutine    is used to mean a function   procedure  or method     e    Module    refers to some discrete program  unit  like a module  class  interface  or a file     e    Name    means any identifier in a program   This may include names of variables  sub   routines  modules 
42. interested in the  details of one particular subroutine    Some common reasons for reading code are     Testing  When testing  you   re interested in locat   ing potential problem areas you need to test     This is discussed further in Chapter 4     Debugging  Reading code to track down a bug   As a programmer  you have a    mental model     of the code in your head  modeling what you  think the code should be doing  A bug may  indicate that your model is incorrect  and you  need to discover where the code diverges from  your model so that you can correct the code   Another possibility is that both the code and  your mental model are correct by themselves   but there are complicating external factors to  consider  like concurrency  When debugging   you need to read the code exactly as the com   puter would read it  which requires meticulous  attention to detail  Debugging is the subject of  Chapter 5     Code review  Code review might imply some  amount of software engineering  such as read   ing code to verify that a formal software spec   ification is met  Less rigorously  a code review  may just involve your code being read by an   other programmer as a secondary check against  bugs     Security auditing  Security auditing is a very  specialized form of code reading  Roughly  speaking  a code review verifies that code is do   ing what it   s supposed to  A security audit goes  beyond that to verify that code isn   t doing any   thing it   s not supposed to  and that code c
43. ints   internal state to be easily queried and modified   and execution to be stepped through with fine  granularity  The time invested learning how to  use a debugger will be repaid many times over   The only caveat is that a debugger focuses at   tention on a very small area of code  and it   s  easy to not see the forest for the trees     Core dumps  Some systems take a snapshot of a  program   s memory when it fails in some un   recoverable way  for historical reasons  these  are often called core dumps  A good debugger  can take a program   s core dump and effectively  reconstruct the program   s state at the point at  which it malfunctioned  Using the debugger   you can gather a lot of useful information which  often leads right to the bug  where exactly did  the program fail  what values did its variables  have  what sequence of subroutine calls led to  the failure     Tracing tools  Sometimes tools are available that  are able to track a program   s interaction with    39    another part of the system  For example  a tool  may print out all the system calls or API calls a  program makes as it executes  This doesn   t give  a fine grained look inside the program  but may  give enough insight to help pinpoint a problem     5 2 The Debugging Process    Collecting debugging information is only part of  debugging  The debugging process involves us   ing debugging information  along with a variety  of other techniques  to track down bugs     Take Notes    Debugging is a met
44. ions  Wiedenbeck  1986  gives  some experimental evidence for the existence of beacons    7 Tt can be overwhelming at second  too    8 For example  Microsoft Windows includes FIND  and  Unix systems have the grep family of tools  Some visual pro   gramming environments have multi file search tools as well       For example  see Rigi and SHriMP  Wong  1998  and Wu    59    and Storey  2000     10 Specially marked compiler directives and JCL notwith   standing    11 Archaeologists take note  incorrect comments may indi   cate the original intent of code which has since evolved    12 The same isn   t true in human languages  No amount of  training in English will help decipher    Bob   s your uncle       13 Humans naturally group  or    chunk     related information  together  Miller  1956   McKeithen et al   1981  and Shnei   derman  1976  have verified experimentally that programmers  chunk program code  and that experts are better at doing this  than novices  Idioms may play a role in the effectiveness of  chunking    14 The Story of Mel is an epic programming tale which bril   liantly takes advantage of implicit side effects  It can be found  online  Raymond  2003   Also  not just languages have invisi   ble side effects  Sometimes the library subroutines called from  a language have them too    15 Ellis  1982  page 15     16 Wall et al   1996  page 72     17 A number of introductory articles on aspect oriented pro   gramming can be found in the October 2001 issue of Commu  
45. l values that are used in  code whose meaning is not immediately apparent   should be avoided    Beyond the code  you can have external docu   mentation  like user manuals or manual pages  or  code comments  There is always the danger of the  code  comments  and external documentation get   ting out of synch  and there are a variety of ways  to manage this     Ignore the problem  Maintain the code  com   ments  and external documentation separately     Embedded documentation  Some systems per   mit external documentation to be embedded in  the code  marked using specially denoted com   ments  This documentation is then automati   cally extracted to create the external documen   tation  Currently  the Javadoc system for Java is  the prime example of this technique  The the   ory is that  by merging code and documentation  in this way  programmers will find it easier to  write and update documentation     Embedded code  Another approach is called    lit   erate programming     A literate program has    53    the code embedded in the documentation  here   the code is extracted automatically from the  documentation     What should be documented  Again  remem   ber your audience  It is safe to assume a cer   tain base level of programming knowledge  Thus   comments like     xX  x 4 1     add one to x     supply as much useful information as     xX  x 4 1     x is the 24th letter  of the alphabet       Comments of this sort should always be avoided   Instead  describe your code fro
46. m a high level  point of view     the details are in the code if  needed  Having said that  be sure to document any  tricky or non obvious details too  The interface to  your code should be documented as well  When  in doubt  err on the side of documentation quality  rather than quantity       You should always give credit where it is due   If your code is based on  or blatantly stolen from   some other code  document the source  Failure to  do this in the academic world would be plagia   rism  in industry  it would be grounds for intellec   tual property lawsuits  Some code  while freely   available  has licensing restrictions which requires  users to note its usage in any documentation     al   ways check the fine print     54    Problem Areas    It   s important to document what your code does   but it   s also important to document what it doesn   t  do  Depending on the sort of documentation you  are producing  this information can go in either  the user documentation or in code comments     Bugs  It   s unlikely that you   ll know what all the  bugs are in your code  but it is likely that you  may know about several when writing the code   Even if you don   t fix the bugs  you can at least  leave warnings about them     Limitations  Limitations are not bugs per se  and  do not cause incorrect execution  but impose  constraints of some form  A typical example  of a limitation would be the use of a fixed size  input buffer as opposed to a dynamically sized  one     Tunabl
47. n evolving source code  In Proceed   ings of the 12th IEEE International Workshop  on Program Comprehension  pages 236 240   2004     G  J  Myers  Software Reliability  Principles and  Practices  Wiley  1976     P  W  Oman and C  R  Cook  Typographic style  is more than cosmetic  Communications of the    ACM  33 5  506 520  May 1990     E  Raymond  editor  Jargon File  version 4 4 7    2003  http   www catb org  esr jargon     B  Shneiderman  Exploratory experiments in pro   grammer behavior  International Journal of  Computer and Information Sciences  5 2  123   143  1976     E  Soloway and S  Iyengar  editors  Empirical  Studies of Programmers  1986  Ablex Publish   ing Corporation     M  Sutton  A  Greene  and P  Amini  Fuzzing   Brute Force Vulnerability Discovery  Addison   Wesley  2007     A  T  Turnbull and R  N  Baird  The Graphics of  Communication  Holt  Rinehart  and Winston   third edition  1975     U  Vahalia  UNIX Internals  The New Frontier   Prentice Hall  1996     68    L  Wall  T  Christiansen  and R  L  Schwartz  Pro   gramming Perl  O   Reilly  second edition  1996     S  Wiedenbeck  Processes in computer pro   gram comprehension  In Soloway and Iyengar   1986   pages 48 53     K  Wong  Rigi User   s Manual  version 5 4 4    University of Victoria  1998     J  Wu and M  A  D  Storey  A multi perspective  software visualization environment  In CAS   CON 2000 Proceedings  pages 41 50  2000     69    API  5  40  APL  48  application programming  interface   s
48. omputer whose monitor wasn   t  displaying anything  He spent a great deal of time searching  for the problem     logging in to the computer remotely to make  sure it was working  checking the cables  fiddling with the con   trast and brightness knobs  to no avail  The problem was that  the monitor had been turned off    35 Gould  1975  theorizes that people debug programs by  iteratively generating and testing hypotheses until a clue to the    61    bug is discovered  this approach is used by both novices and  experts  Gugerty and Olson  1986     36 Yet another approach would hypothesize how the pro   gram   s state shouldn   t be changing  or program invariants    37 Pre postconditions and invariants can be part of code test   ing and code design  too  The latter is referred to as    design  by contract     See Meyer  1997  Chapter 11     38 Depending on the language  other methods may be avail   able  A return statement may be inserted prematurely to  avoid executing certain code  or a preprocessor may be used  in  C  with  if 0    endif  to quickly block out chunks of code   A common mistake  especially for languages that have match   ing comment delimiters  is to forget to end a comment and dis   able much more code than you intended  Syntax highlighting   a k a  colorizing  editors help catch this mistake    3   Some labs and help desks have resident stuffed animals   whom you have to explain your problem to first  Apparently a  number of problems are solved this way   
49. on of this sort will  increase your confidence that the change has been  made properly     3 4 After the Change    You   re not done yet  Modifying the code may  have opened up opportunities to restructure the  code and  of course  thorough testing is required     Restructuring    The final code should appear to be cohesive and  well structured  not a patchwork quilt of various  code modifications  Once your modification is  successfully made  you should examine the sur   rounding code to see if there is a better way to  express it along with your changes  For example   if the original code looked for a special case  and  your modification adds a check for a different spe   cial case  there may be a way to generalize both  tests and end up with better code  Another exam   ple is where a modification duplicates code to the  point where a subroutine is called for  a subroutine    28    which can be called from both the original and the  modified code  Code modifications which involve  copying code and altering it slightly are prone to  needing this type of restructuring    When looking for opportunities to restructure   pretend that you   re writing the code from scratch       is the code   s current form the best way to express  it     Regression Test    Testing your modified code looks for bugs in the  code you   ve added  You also need to make sure  you haven   t introduced any new bugs in the whole  code  or re introduced old bugs that had been  fixed  If the code has a tes
50. ough  testing yet     Ask for Help    Programmers tend to see what they think the code  is doing  This is a natural side effect of abstrac   tion  Unfortunately  debugging requires that you  see what the computer is actually doing    How can you see this  The cause of stubborn  bugs may be immediately apparent to another per   son  or may become apparent in the process of ex     43    plaining the problem    Another approach is to  simply take a break from the computer  or get a  printout of the troublesome code and analyze it  instead     5 3 The Impossible       Eliminate all other factors  and the  one which remains must be the truth            Sherlock Holmes       Very rarely  bugs will have exotic causes  There  are some things which you normally assume to  be correct when debugging  the operating sys   tem  system libraries  output from the compiler   the hardware  It is possible  albeit very unlikely   for these assumptions to be wrong  You should  consider this possibility only as a last resort  af   ter all normal causes have been ruled out  even  then  such a claim     my code doesn   t work be   cause the compiler is broken     should be backed  up with convincing evidence  The debugging task  then becomes a search for a way to work around  or fix this new problem    There are also bugs     called    Heisenbugs         that disappear when you look for them    The  mere act of adding output statements or running  the code in a debugger changes the program just  eno
51. ounding the problem  Good tools  like class  browsers  can greatly assist with determining  the structure of such code     Aspects  Aspect oriented programming allows an  existing body of code to be extended without  directly modifying the original code  A pro   grammer defines    aspects     which are snippets  of code that are automatically executed when    15    the original code does certain specified things   like return from a call to subroutine foo  or  when subroutines foo and bar are called in  succession  To properly read aspect oriented  code  you need to be aware of both the origi   nal code as well as all the aspects  7    This should not be construed as a general condem   nation of these features  as each has advantages  for solving certain types of programming prob   lem  The tradeoff  however  is readability     2 5 Complications    Code reading can be complicated because of pe   culiarities of the the code design and implementa   tion  and also because of what happens when the  code executes     Spaghetti    A base assumption to make when reading code is  that the code has been designed and written in a  rational  logical way  Code can be extremely hard  to read if this assumption turns out to be false    There are  unfortunately  some special cases  where this occurs     Machine generated code  Some code is auto   matically generated by tools rather than being  written by humans     Usually such tools op   erate from a high level specification that was  writ
52. out parts of the code     Before subroutine foo is called  p  must not be NULL  after foo returns   p will be NULL and count will have  incremented by one     Both these conditions could then be verified with  internal state information         Divide and Conquer    The way that you look up a word in a dictionary  or a name in a phone book     a binary search      is a very effective way to track down bugs  The  idea is to disable approximately half of the suspect  code  usually by commenting it out     Then you    42    begin an iterative search process  if the bug is still  present  disable another half of the code  and keep  doing so until the bug vanishes  The last piece of  the code to be disabled is likely responsible for the  problem  at least in part    A strict divide and conquer approach can re   duce code   s functionality to the point where it can  no longer be executed  This problem can some   times be ameliorated by replacing the code to be  disabled with trivial stubs that fake values  for de     bugging purposes     Undo    The logical limit of divide and conquer is to dis   able the modified code completely  Remember  that the base assumption was that the original code  was working  and that your modifications some   how introduced a bug  If the bug doesn   t appear  to be the result of the modified code  then this as   sumption should be challenged  It could be the  case that the original code was flawed to begin  with  but the flaw hadn   t been exposed thr
53. ries can be deleted  memory  allotments can be set artificially low  To reach the  ideal code coverage goal will take some creativity  and persistence  though    Code located in hard to reach areas may be eas   ier to test in isolation  A separate test harness can  be quickly constructed to exercise the modified  code thoroughly  before incorporating it into the  original code     32    Boundary Conditions    A good place to test for problems is boundary  conditions  Boundary conditions are places in the  code where some kind of conditional test is made   execute this code or that code  run the loop again  or not  is the buffer full  There are three possibil   ities to test     Within the boundary  This is the    normal    case   where the code is running within acceptable  limits     At the boundary  Testing should be done both  at  and close to  the boundary condition  Code  can contain    off by one    errors which only  manifest themselves close to the boundary     Exceeding the boundary  Finally  look for ways  to go beyond the boundary to test  This may not  always be possible     For large boundaries  like big buffer sizes  it may  be easiest to temporarily lower the bound for test   ing  For example  a buffer size of 10 could be used  instead of 10 000    Some boundary conditions are not explicit in  the code  but implicit in the semantics of the lan   guage  such as fixed size integers quietly wrap   ping from their maximum positive value to their  minimum negative v
54. rms  others will support simple wild   cards  still others will look for patterns speci   fied using regular expressions  For comparison     foo Fixed term  finds foo only   f o Simple wildcard  finds three   letter sequences starting with f  and ending with o       foo bar  Regular expression  finds foo  or bar when they appear at the  start of a line    A multi file search tool that is able to search  files buried in subdirectories  a    recursive    di   rectory traversal  is handy for code spread  across multiple directories     Tags  A common task when reading code is to go    from the use of a name  like a subroutine  to the  name   s definition  Support for this task is given  by    tags    utilities     a tool is run over a body  of code which gathers up all definitions into a  database  Tag savvy editors are able to search  this database  given a name used in the code   and instantly jump to the appropriate definition     Design visualization tools  At the heavyweight    10    end of the tool scale are design visualization  tools   These tools analyze the code automat   ically  and may be used to display dependen   cies within the code  name definitions and cor     responding uses  and other potentially useful  information     Finding the correct term to search for in code  is often a mixture of educated guesses  intuition   and luck  Say  for example  you want to find the  joystick initialization code in an application  You  might try the following sequence     1  S
55. sting modified  code  however  is that there is a clear focus on the  modified code  You want to make sure that your  modified code works  and that you haven   t acci   dentally broken anything  Obviously  existing test  suites will help ascertain the latter  as mentioned  before  The question is  how do you test the mod   ified code     4 1 Mindset    A good tester is malicious  Users will not neces   sarily be gentle with a program  and you should  stress test code beyond anything a normal user  would do  Think evil thoughts  and ask yourself     What is the worst possible thing I can  do to this code to make it crash     31    4 2 Ways to Test    Black Box Testing       Black box    testing is where the program is  treated as a box whose code cannot be examined   Only the program   s input can be manipulated  and  its output can be checked to see if the program ap   pears to be operating properly  This is of limited  value when trying to test a very targeted part of  code     White Box Testing    Another approach to testing is called    white box     testing  Unlike black box testing  you can exam   ine the code to find potential trouble spots to test    Ideally  you want to achieve 100  code cover   age     every single line in the code should be exe   cuted by at least one test  This is complicated by  the fact that certain code is only run under extreme  conditions  like error  and exception handling  code  Some failures can be induced  necessary  files and database ent
56. t suite  especially one  containing examples of old bugs  then you can  perform a regression test to verify that you haven   t  inadvertently broken something  Regression tests  should ideally be automated and easy to run   Some tests that previously succeeded may now  erroneously fail as a result of the modifications  you made  When a test fails  you need to care   fully examine it to determine if it should indeed  be failing     a bug     or if the test suite is now in er   ror in light of your modification  In the latter case   you need to update the test suite appropriately     3 5 Practice    Code modification becomes easier with practice   It is possible  but not very interesting  to contrive  exercises that develop this skill  give a menu item    29    a blue background rather than grey  print    Hello   world     at a specific point  A good way to prac   tice code modification is to find an application you  use  for which you can get the source code  and  modify it in one of two ways  First  you can fix  some irritating behavior that the program has  this  might be something as simple as a bad user inter   face  Second  you can add some functionality that  you want  You may also want to consider sending  any generally useful changes back to the original  code author for incorporation into the project     30    4  Testing Modified  Code    The techniques for testing modified code are es   sentially the same as those for testing an entire  program  The advantage when te
57. ten by humans  here  it is preferable to read  the specification instead of the generated code     16    Obfuscation  A code author may wish to release  code  or an executable  in a form that is re   sistant to reverse engineering  This is partic   ularly the case for scripting languages where  the source code is executed directly  but there  are other languages  like Java  which are es   pecially susceptible to decompilation  Code  may be transformed  or obfuscated  in such a  way that it makes reverse engineering difficult      for example  changing all variables to look  like X00123     Obfuscation is usually done au   tomatically with tools     Spaghetti code  Human written code that jumps  around from place to place in a seemingly arbi   trary manner is referred to as    spaghetti code      While this might be done intentionally to try  and obfuscate the code  it is also considered the  hallmark of a bad programmer     In the latter two cases  stubborn persistence is  needed to read the code  Taking notes may even  be necessary to keep track of what the code is do   ing  It is worth taking extra time in advance  if  necessary  to home in on the spot to read     Concurrency    Concurrent programs can be challenging to read  and write because of the interaction between dif   ferent threads of execution  A good strategy when  reading concurrent code is to identify resources  shared between the threads  such as files  vari   ables  and data structures  Code that manipulates 
58. ture of program   ming language semantics permits the meaning of  a code idiom to be deciphered  even if the idiom  itself is not recognized       idioms are thus a code  reading shortcut for experts       For example  some languages idiomatically it   erate over arrays of size N from element 0 to el   ement N 17  Recognizing this idiom immediately    13    conveys the higher level understanding    the code  is iterating through the entire array     Conversely  a  red flag is raised when the array is iterated through  using non idiomatic bounds  say from   to N J   indicating that something special is happening     The Invisible    Some languages have magical side effects that  happen when executing  These side effects are im   plicit  and not apparent from reading the code  so  the only way to know about them is by being fa   miliar with the language    For example  variables  starting with I through N in Fortran are normally  integers   gt  in Perl  the statement s foo bar   uses and sets the variable  _     C   is notorious  for quietly inserting default code into a program  which may or may not behave like the code author  intended  It   s sometimes helpful to write short test  programs to see the effect of magical statements    Also  certain programming language features  result in programs which are hard to follow not be   cause of implicit side effects  but because of sub   tleties that make it hard to determine what name  is being referenced  For example     Dynamic s
59. ugh to make the problem go away  This does  not  however  mean that the bug has been fixed   Until the cause of a bug has been determined  de   bugging should continue     44    6  Writing Readable  Code    The coding style of an existing body of code  should be adhered to when making changes  But  suppose you re writing brand new code  How can  you write it so that it   s readable     6 1 Remember Your Audience    A standard piece of advice for any communication      verbal  written  or otherwise     is to remember  your audience  The same is doubly true for com   puter code  With code  you not only have to ex   press yourself precisely to the computer  but you  also must leave something understandable for hu   mans    There are almost always multiple ways to write  a piece of code  Making your code readable for  your human audience should help guide your cod   ing choices  How much should you document   What should you document  How densely should  the code be written  What obscure language id   ioms can you use     45    Sometimes it   s useful to use yourself as a ref   erence point  Ask yourself  will I understand this  code in a year   s time  You are your own audience   too     6 2 Design    Code design is something which is best taught by  experience  Reading and modifying someone  else   s code is instructive  although the exact les   son depends on whether the code has a good or  a bad design  Similarly  implementing and using  your own code design is valuable  espe
60. you read through    the code  Understanding the design then becomes  a matter of looking for evidence that supports  or refutes your hypotheses    For example  for a  command driven program  you might hypothesize  that each command is handled in a separate piece  of code  further  you might also hypothesize that  there is a dispatch mechanism to direct each com   mand to the appropriate handler code  To test  these hypotheses  you might look for the pres   ence of many small command handling subrou   tines  and find out where they are invoked from     2 3 Tools    The difficulty of reading code increases with the  size of the program  A hundred lines of code usu   ally presents no special challenge  but large bodies  of code  millions of lines long  are not unusual   This can be overwhelming at first     but there is  one key observation     You don   t need to understand all the  details of code that is designed and  written in a rational  logical way     Given this  the problem of reading code becomes  a matter of discovering what you do need to pay  attention to  Tools play an important role in this  discovery process     Improving Readability    Some code is formatted poorly     by any standard      and is hard to read in its original form  You are    free to improve the readability of the code when  reading your own private copy  Ideally  you will  want to do this quickly  with little or no effort  on your part  Tools called    code formatters    or     pretty printers  
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
  HP Thunderbolt-2 Reference Guide  Software de control Web: Manual de instrucciones trilingüe  取扱説明書 基本編  Manuale del proiettore multimediale Power Bin 6      Omega HH603A User's Manual  R.O.GNT 0603  Samsung AP55M0AN User Manual    Copyright © All rights reserved. 
   Failed to retrieve file