Home

A General Dynamic Information Flow Tracking Framework

image

Contents

1. Perlsec http perldoc perl org perlsec html F Pottier and V Simonet Information flow inference for ML In Proceedings of the 29th ACM Symposium on Principles of Programming Languages POPL 02 pages 319 330 Portland Oregon Jan 2002 V Simonet Flow Caml in a nutshell In G Hutton edi tor Proceedings of the first APPSEM II workshop pages 152 165 Nottingham United Kingdom March 2003 G E Suh J W Lee D Zhang and S Devadas Secure program execution via dynamic information flow track ing In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems pages 85 96 October 2004 Tiny Software Tiny firewall 6 http www tinysoft ware com home tiny2 s 7807136686411155049A1 amp amp pg content05 amp an tf6 home W Xu S Bhatkar and R Sekar Taint enhanced policy enforcement A practical approach to defeat a wide range of attacks In 15th USENIX Security Symposium pages 121 136 August 2006 Y Yu F Guo S Nanda L C Lam and T Chiueh A feather weight virtual machine for windows applications In Proceedings of the 2nd ACM USENIX Conference on Virtual Execution Environments VEE 06 June 2006
2. grained information flow tracking it incurs a modest performance overhead Measurements on the first Aus sum prototype show that the CPU overhead is less than 170 for computation intensive applications and is less than 20 for non computation intensive applications The elapsed time overhead is less than 35 even for the computation intensive applications References 1 A Conry Murray Product focus Behavior blocking stops unknown malicious code http www networkmagazine com shared article show Article jhtml 2articleld 8703363 amp classroom June 2002 J R Crandall and F T Chong Minos Control data attack prevention orthogonal to memory model In 37th Annual International Symposium on Microarchitecture pages 221 232 December 2004 P Efstathopoulos M Krohn S VanDeBogart C Frey D Ziegler E Kohler D Mazires F Kaashoek and R Morris Labels and event processes in the asbestos 2 3 10 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 operating system In Proceedings of the 20th ACM Sym posium on Operating Systems Principles pages 17 30 October 2005 J S Fenton Memoryless subsystems Computing Jour nal 17 2 143 147 May 1974 GrammatTech Inc Codesurfer http www grammatech com products codesurfer R W M Jones and P H J Kelly Backwards compatible bounds checking for arrays and pointers
3. Aussum s Security Policy module provides the fol lowing three callback functions for the underlying sand boxing system to determine when to treat a network connection suspicious when to mark a file tainted and when to reject a sensitive system call invocation when it uses tainted arguments int aussum_mark_connection sockaddr addr xif this function returns true the connection to addr should be marked as suspicious int aussum_mark_file int descriptor char file name xthis function marks the output file as tainted int aussum_check_ argument char function_name char xargument if the return value is 0 terminate the application if the return value is 1 simply return without makeing the system call if the return value is 2 make the system call Since these three functions are very simple it takes minimal effort to add these functions into an existing sandboxing system 4 5 1 Performance Analysis 4 6 Performance Evaluation 4 6 1 Methodology Because Aussum is designed to provide selective sand boxing for applications running on desktop machines we chose a set of network client applications listed in ta ble 1 to evaluate the performance overhead of Aussum s taint attribute propagation mechanism Because every assignment statement in a network application needs to be instrumented we purposely chose CG WGET and LYNX which are computation intensive to stress test the efficiency of th
4. gives the sandboxing sys tem an opportunity to examine the remote IP address to decide if the connection should be marked tainted In Aussum a tag s value is either 0 or 1 where 1 means tainted and 0 means not tainted To initialize the tag for data read from a descriptor the Input Monitor intercepts such system calls and LIBC function calls as read fread fgets recv and recvfrom using proxy functions each of which first calls the original function and then checks if the socket file descriptor is marked as tainted by looking up the descriptor table If data is read from a tainted descriptor Aussum locates the corresponding node in the splay tree and sets its tag to 1 otherwise the tag is set to 0 4 3 Dynamic Taint Tracker GIFT requires developers to implement their own gift_do_set_tag to propagate tag values across as signment statements In Aussum the tag of the left hand side memory block of an assignment statement is the bitwise OR of the tags of the memory blocks at its right hand side That is if any memory block on the right hand side is tainted the left hand side mem ory block is also tainted The following code shows the Aussum s implementation of gift_do_set_tag void gift_do_set_tag void lhs_ tag int num void rhs_tags int 2 tmp 0 int rt int rhs_ tags for i 0 i lt num i tmp tmp rt i int lhs tag tmp 4 4 System Call Monitor The System Call Moni
5. s ad dress space such as read or write func tions for the file system the network and share memory regions When these functions are called the GIFT compiler redirects the calls to their cor responding proxy functions which perform tag ini tialization for those program variables that are allo cated to receive external inputs e Output Channel These correspond to functions that move data in an application s address space to the outside world i e a file system a remote node or another process Output channels are intercepted and redirected in the same way as input channels to their proxy functions which typically examine tags for the data to be output and make certain decisions For example a user can intercept write system calls using a proxy function write_proxy int fd void xbuf int side which can ex amine the tag associated with the variable pointed by buf and decide to reject a write call that tries to write a password file to a socket descriptor e Assignment Statement After every assignment statement in a program the GIFT compiler inserts a call to a programmer provided function by de fault its name is gift_set_tag to combine the tags associated with program variables at the right hand side to form the tag associated with the vari able at the left hand side The proxy function takes the addresses of the tags of all variables in the as signment statement as the arguments and performs application sp
6. A General Dynamic Information Flow Tracking Framework for Security Applications Lap Chung Lam Tzi cker Chiueh Rether Networks Inc 75 Health Sciences Drive suite 111 Stony Brook NY 11790 USA Iclam chiueh rether com Abstract Many software security solutions require accurate tracking of control data dependencies among infor mation objects in network applications This paper presents a general dynamic information flow track ing framework called GIFT for C programs that al lows an application developer to associate application specific tags with input data instruments the appli cation to propagate these tags to all the other data that are control data dependent on them and invokes application specific processing on output data accord ing to their tag values To use GIFT an application de veloper only needs to implement input and output proxy functions to tag input data and to perform tag dependent processing on output data respectively To demon strate the usefulness of GIFT we implement a complete GIFT application called Aussum which allows selec tive sandboxing of network client applications based on whether their inputs are tainted or not For a set of computation intensive test applications the measured elapsed time overhead of GIFT is less than 35 1 Introduction Information flow tracking refers to the ability to track how the result of a program s execution is related via ei ther data or control d
7. ags e int gift add_locals_to_tree int number void address int size void tag_addr This function is inserted in each function s prologue to create splay tree nodes for all lo cal data variables and input parameters whose addresses are assigned to pointers as illustrated by the call at Line 8 of Figure 1 B This function takes a variable num ber of arguments The first argument number indi cates how many triples of void xaddress int size void tag_addr are passed to the func tion where address is the base address of a data variable size is the size of the data variable and tag_addr is the address of the associated shadow vari able The return value of this function is used to remove the tree nodes when the function returns e void gift_remove_locals_from_tree int fun_index This function is inserted in each function s epilogue to remove all tree nodes allocated for the current function The pa rameter fun_index is the transaction ID returned by gift_add_locals_to_tree e void gift add_globals_to_tree int number void ad dress int size void tag addr This function is used to allocate tree nodes for all global and static data vari ables in a source file Its prototype is similar to that of gift_add_locals_to_tree except that it does not need to return a transaction ID because it is not necessary to explicitly free the tree nodes allocated to the global and static data variables GIFT creates a global constr
8. antees accurate information flow track ing because it follows data and control dependencies that actually take place at run time Compared with statisti cal correlation or machine learning approach to informa tion flow tracking 17 GIFT is completely automatic without requiring a time consuming and labor intensive training process Finally GIFT makes it possible to in strument the same program differently when it is used in different applications or systems in a way tailored to their security requirement Therefore GIFT allows clean decoupling of application logic from tag manipu lation logic In summary GIFT is an enabling technol ogy that could be immediately applied to a wide variety of information flow accounting applications including information flow control intrusion impact assessment execution trajectory analysis etc To demonstrate the usefulness of the GIFT frame work we build a tool based on GIFT called Aussum which could automatically instrument arbitrary network applications to track the provenance of information ob jects and enable selective sandboxing on the execution of application programs when they operate on informa tion objects from suspicious sources Aussum solves a long standing problem for existing behavior blocking or sandboxing systems 1 How to minimize disrup tion to legitimate applications while stopping all mali cious attacks By leveraging GIFT s accurate informa tion flow tracking Aussu
9. e GIFT compiler CG is a newsgroup binary downloader which needs to parse every mail in a group download and decode all the attachments WGET parses every downloaded html file to find new files to download LYNX parses and displays a web page on a terminal We modified LYNX to exit immediate after the page is displayed so that we could measure the perfor mance of LYNX programmatically For the interactive programs such as CG we also made similar modifica tions to them so that they could perform the required operations and exit immediately The server machine used in this study is a 2 8GHz P4 machine with 256MB RAM and it runs Fedora core 3 0 The client machine is a 2 8GHz P4 machine with 1 5GB RAM that runs Redhat 7 3 To test the ftp client we fetched a 6MB file To test the CG program we set up a newsgroup with a 40KB image and two 700KB bi nary programs and used CG to read all the information from the newsgroup and download the image and the binaries To test Gnut which is a peer to peer applica tion we fetched a 500KB file and a 10KB file We used WGET to download Apache s user manual files from an Apache server and used LYNX to download and display the main page of the Apache manual The Aussum prototype includes a simple sandbox ing system which provides callback functions to mark a os Aare a Space Overhead Application Description Lines ofcode gcc bytes splay P shadow splay shadow splay slice t
10. e effectiveness of GIFT s tag propagation mech anisms we tested three versions of Aussum Aussum using the pure splay tree scheme splay Aussum us ing the combined splay tree shadow variable scheme shadow splay and Aussum using the combined splay tree shadow variable scheme with slicing opti mization shadow splay slice We measured Aussum s performance overhead using both elapsed time and CPU time Because the elapsed time includes disk I O time the elapsed time overhead could be much smaller than the CPU time overhead Be cause Aussum targets at network client applications the elapsed time overhead is a more appropriate measure be cause it reflects the user s perception of the slowdown due to GIFT s instrumentation Table 2 shows that the CPU time overhead of Aus sum ranges from 24 14 to 1120 for the pure splay tree scheme for tag representation As expected the pure splay tree scheme incurs the highest performance penalty for computation intensive programs For exam ple the run time overhead of CG is 1120 or 12 times as slow and this high overhead is attributed to the de code computation in CG where taint attributes are prop agated across each decoding computation step The per formance overhead for WGET and LYNX is also very high 534 76 and 400 respectively because they both need to parse html files In contrast the perfor mance overhead of the pure splay tree scheme for the ftp client program is r
11. e incurs a seri ous performance penalty for computation intensive ap plications due to tree lookups To reduce this perfor mance penalty we use a shadow variable to store the tag of every global local memory block instead of putting the tags into the splay tree The key idea is that if a memory block is accessed through its name the GIFT compiler directly looks up its tag in its shadow variable without looking up the splay tree However if a memory block is accessed through a pointer GIFT creates a splay tree node for it which contains a pointer called int tag_ptr that points to the shadow variable holding the memory block s tag When a memory block is ac cessed through a pointer GIFT looks up the splay tree to locate its corresponding shadow variable If a mem ory block is returned by malloc GIFT stores its tag in the int tag field of its corresponding splay tree node in this case the tag_ptr pointer of its splay tree node points to its int tag field Figure 1 illustrates GIFT s instrumentation using a simple program and its transformed version All vari ables whose name ends with a _tag_info suffix in Fig ure 1 B are shadow variables inserted by the GIFT com piler Line 12 and 14 in Figure 1 B show that the tags of b and c i e b_tag_info and c_tag_info are di rectly accessed because these two variables are accessed through names In addition the GIFT compiler inserts the following functions into a program to manage t
12. e overhead LYNX reads an html file from a network connection and displays it on the screen Even though the tag propagation code inserted by GIFT can a high CPU time overhead this overhead is not necessarily visible to the user because the network I O time and screen output time largely dominate the CPU time GIFT applies program slicing in the hope to elimi nate unnecessary instrumentation and reduce the perfor mance overhead of tag propagation This slicing op timization works for some programs but not the oth ers such as TNFTP and LYNX The main reason is that Codesurfer s pointer analysis is not very effective against TNFTP and LYNX When we applied the most accurate pointer analysis option Codesurfer ran out the 1 5GB memory quickly Therefore we had no choice but to use the less accurate pointer analysis option How ever in this case Codesurfer s slicing result is almost the same as the original program From Table 3 al though program slicing slightly decreases the number of splay tree look ups for TNFTP and LYNX the perfor mance overhead of TNFTP and LYNX is actually higher when the slicing optimization is turned on 4 6 2 Effectiveness of Aussum We wrote a simple sandbox tool to test the effectiveness of Aussum The sandbox tool specifies a policy that ap plications cannot open an existing file with a file name that is obtained from a suspicious host e g any node not on the intranet We then invoke Aussum instr
13. each variable and the program counter PC with a security class The value of a variable must be computed from the variable with the same security class In contrast GIFT is a practical system that can enable any existing C programs to perform dynamic information flow con trol or tracking Recently dynamic information flow technique is mainly use to detect control hijacking attacks such as in the implementations of 17 2 13 The dynamic infor mation tracking system implemented by Sub et al 17 is a hardware implementation Every memory byte and register has a one bit hardware tag to tag the data All tags are initialized to zero and the operating system tags the data with one if they are from a potentially malicious input channel Instruction sets are augmented to propa gate the tags The processor ensures that no tagged data can be used as execution control transfer Minos 2 is also a hardware implementation that is similar to 17 Newsome and Song 13 implemented the simi lar mechanism in software They implemented their TaintCheck system using Valgrind 12 Valgrind is an x86 emulator that can instrument a program as it is run Each byte of memory including the registers stack heap etc has a four byte shadow memory to store a pointer to a Taint data structure Input data that come from untrusted source are marked as tainted and the TaintCheck system instruments a program at runtime to propagate the taint All control t
14. ecific tag value propagation from the right hand side to the left hand side In addition GIFT provides the following library functions for the programmer provided functions to ac cess the tags of a function call s arguments e void gift_lookup_tag void xaddress If the input parameter is a pointer to a memory block this function looks up the address of the tag associated with the memory block pointed by address e void gift_lookup_parameter_tag int index If the input parameter is not an pointer this function returns the address of the tag associated with the index th argument e void gift_save_return_tag void return_address void tag If a proxy function needs to return a data value this function is called to save a copy of its tag into the shadow stack GIFT is derived from GCC 3 3 3 To add dy namic information flow tracking to an applicationi using GIFT the user needs to implement proxy func tions for input output channels and gift_set_tag for assignment statements in a object file and link the original program with the object file For example if a developer wants to compile a file called myprogram c she should invoke GCC us ing gece fgift fgproxy myproxy pro myprogram c myproxy o The fgift option enables GIFT s instrumentation The names of the intercepted functions and their corresponding proxy functions are specified in the file called myproxy pro The file myproxy o contains the developer p
15. ection and marks it according to a configurable security pol icy The Dynamic Taint Tracker propagates the mark throughout the program Finally the Sys tem Call Monitor marks files and system call ar guments that are derived from network inputs 4 Selective Application Sandboxing Using GIFT 4 1 Overview Many end user computers are infected by malicious programs because the users knowingly or unknowingly download from the network objects containing mali cious programs by reading email attachments file trans fer web browsing and peer to peer file sharing Most existing anti malware tools are based on signatures and therefore cannot protect end user machines from zero day attacks Behavior blocking or sandboxing 1 which monitors and restricts network applications ex ecution according to a pre defined security policy is considered a better alternative against zero day attacks because it focuses on benign program behaviors rather than malicious ones However in practice the behavior blocking technology exhibits two pitfalls First behav ior blocking could disrupt the operation of legitimate ap plications because for safety the sandboxing policy is typically set stricter than necessary One way to solve this problem is to apply program analysis techniques to extract highly accurate sandboxing policies directly from application programs source or binary code 7 Second existing behavior blocking systems do not su
16. ecution using that file is sandboxed In addition If a network application uses marked contents as input arguments to sensitive system calls such as exec open and unlink these system call invocations are also sandboxed Because of the fine gained information flow tracking capability Aussum al lows a legitimate application such as Microsoft s IE or WORD to run with full privilege when it operates on lo cal files but is properly sandboxed when it operates on objects downloaded from the network Moreover Aus sum injects this selective sandboxing capability to net work applications in a way that is completely transpar ent to the applications developers and users Figure 2 shows the system architecture of Aussum which adds three modules into a network application The Input Monitor marks as tainted input packets from network connections that are considered suspicious ac cording to a security policy stored in the Security Pol icy Hooks module The Dynamic Taint Tracker propa gates the taint attribute of data items across a program s computation The System Call Monitor checks the argu ments of sensitive system calls such as unlink open and exec and invokes the operating system s sandbox ing mechanism when their arguments are tainted In ad dition when tainted data is written to a file the System Call Monitor marks the file as tainted according to the same security policy The security policy in the Secu rity Policy Ho
17. elatively small because it just fetches files and writes them to disk without performing any computation The combined shadow variable spray tree tag man agement scheme is very effective The CPU time over head ranges from 15 45 to 180 In particular the Application Splay Tree Shadow Splay Shadow Splay Slice tnftp 23 032 4 879 4 751 yafc 51 369 14 699 14 433 gnut 72 136 24 265 20 796 cg 2 975 687 100 178 83 138 wget 55 537 361 22 285 332 22 180 508 lynx 193 818 47 894 46 392 Table 3 Number of tree lookups of each config uration CPU time overhead for CG drops from 1120 to 180 as it changes the tag representation from the pure splay tree scheme to the combined scheme Table 3 which lists the number of splay tree look up in each run demonstrates that the reason why the combined scheme is so effective is because it eliminates most of the splay tree look ups More concretely under the pure splay tree scheme CG needs to look up the splay tree 2 975 687 times but only 100 178 times under the combined splay tree shadow variable scheme The elapsed time overhead of Aussum s taint attribute propagation when compared with vanilla GCC ranges from 6 41 to 308 97 for the splay scheme from 0 70 to 30 00 for the shadow splay scheme and from 0 90 to 31 43 for the shadow splay slice scheme The worst case occurs with LYNX which incurs a 30 elapsed time overhead and a 166 67 CPU tim
18. ependencies to its inputs from the network the file system and any other external param eters such as environment variables and command line arguments To accurately track information flow each piece of input should be assigned a tag which could be a bit e g taint bit or a pointer to an arbitrarily com plex meta data structure and for each assignment op eration the tag of the assignment operation s left hand side is derived from the tags at its right hand side accord ing to certain tag combination rules Different informa tion flow tracking applications require different types of tag and use different tag combination rules Moreover to accommodate modern network services that consist of multiple programs communicating with each other through messages it is essential to track information flow not only within a single process but also across processes machines and even Internet sites Tracking information flow within a program requires following edges in the program s data flow graph and propagating tags from the dependees to the depen dents Given an application program and its application specific tag initialization and propagation rules in the ory a compiler should be able to simulate tag propaga tion and compute the tag values associated with program variables through abstract interpretation and symbolic execution In practice it is not feasible to statically de termine the tag value of every program variable becau
19. formation Jif 10 11 is a java extension which enables pro grammers to annotate variables with security labels as in the declaration int ol rl r2 02 r2 r3 x The security policy in this declaration speci fies x is own by o1 and o2 O1 allows rl and r2 to read the data and o2 allows r2 and r3 to read the data In this example only r2 can read the data x since both owners grant the permission to it Jif s type checking system verifies that no information from one variable can flow to another if the policies prohibit the transac tion Unlike Flow Caml which only uses static analysis security label in Jif can be first class value which can be assigned and checked at run time The weakness of Flow Caml and Jif is that they re quire programmers to layout the information flow se curity policies at programming time Although Jif en ables users to set the labels of some variables at run time the labels of most of the variables are set at the programming time Therefore unlike our GIFT mech anism Flow Caml and Jif have to modify existing ap plications to use their information control mechanism Furthermore GIFT can be used for the purposes other than security such as logging application file system and database access requests and operations to enable fast intrusion recovery Fenton s Data Mark Machine 4 is an early ab stract machine that implements dynamic information flow control The data mark machine associates
20. formation flow control mech anisms which control information flow at application variable level the Asbestos Operating System 3 implements information flow control at process level GIFT can also be used to implement information flow tracking between processes For example we can use GIFT to compile apache server to intercept setenv which is used to send information to a CGI program and compile the CGI program to intercept getenv which is used to get information from apache Then we can implement the information flow tracking in the proxy functions of setenv and getenv 6 Conclusion This paper presents the design and implementation of a general dynamic information flow tracking framework for C programs called GIFT and a complete application of GIFT called Aussum GIFT provides an interface for developers to specify their own tag initialization propa gation and processing functions Then the GIFT com piler automatically propagates these tags through an ap plication program along data dependencies and control dependencies that actually take place at run time The flexibility of GIFT allows it to support a wide range of applications Aussum is an example GIFT application that is designed to minimize the probability that sand boxing or behavior blocking systems disrupt legitimate applications by turning on sandboxing only when ab solutely necessary Although Aussum minimizes the disruption of the sandboxing technology through fine
21. g info 0 2 a 1 amp b_tag_info 0 32 33 gift_save_return_tag return_address amp r_tag_info 34 return r 35 36 37 __GLOBAL_FileName 38 39 gift_add globals_to_tree 1 buffer 40 40 A Original B Transformed Figure 1 This code segment illustrates how GIFT optimizes away splay tree look ups by directly accessing a memory block s tag which is stored in a shadow variable if it is accessed through its name The statements in the bold font are inserted by GIFT memory area returned by a malloc call Because a data memory block can be accessed through pointers it is necessary to provide a mechanism to identify a mem ory block s tag from a pointer to the memory block The explicit look up approach 6 uses a separate search data structure to associate a pointer with its metadata and completely does away with modification to pointer rep resentation For each data memory block this scheme creates a node in a splay tree which stores the base ad dress and the size of the memory block and the mem ory block s metadata Given a pointer this scheme first looks it up in the splay tree to identify the corresponding metadata A pointer matches a splay tree node if it falls within the node s extent as defined by its base and size Originally we implemented GIFT using the splay tree scheme where each tree node contains the base extent and tag of a data memory block How ever we found that the splay tree schem
22. ight hand side memory blocks When an input argument of a function call is not a pointer GIFT needs to propagate its tag to the callee Instead of passing the tag through the stan dard stack which could cause compatibility prob lems with legacy code GIFT allocates a shadow stack one per thread to pass the tags of non pointer function call arguments Specifically the caller calls gift_save_argument_tag e g Line 17 of Figure 1 B to place on the shadow stack the entry point of the callee and the tags of non pointer arguments before the call The callee calls gift_init_parameter_tag to retrieve the tag for each parameter Gift_init_parameter_tag first compares the callee s entry point with the entry point stored on the shadow stack If they match the tags of the parameters of the callee are initialized with the tag values passed through the shadow stack If two entry points do not match this indicates that callee is called from a legacy function which does not push the tags of the actual arguments to the shadow stack In this case gift_init_parameter_tag initializes each param eter tag with zero If a function returns a non pointer value back to the caller GIFT also needs to propagate the tag of the re turn value to the caller It uses the same shadow stack mechanism as in the case of a function call But the arguments pushed to the shadow stack are the return ad dress and the address of the return value s tag as shown b
23. in c programs In Proceedings of Automated and Algorithmic Debugging Workshop pages 13 26 1997 L C Lam and T Chiueh Automatic extraction of highly accurate application specific sandboxing policy In Sev enth International Symposium on Recent Advances in Intrusion Detection Sophia Antipolis French Riviera France September 15 17 2004 L C Lam Y Yang and T cker Chiueh Secure mobile code execution service In Proceedings of USENIX Large Installation Systems Administration LISA Conference December 2006 Z Liang V Venkatakrishnan and R Sekar Isolated pro gram execution An application transparent approach for executing untrusted programs 19th Annual Computer Security Applications Conference December 8 12 2003 A C Myers JFlow Practical mostly static information flow control In Symposium on Principles of Program ming Languages pages 228 241 1999 A C Myers and B Liskov A decentralized model for information flow control In Symposium on Operating Systems Principles pages 129 142 1997 N Nethercote and J Seward Valgrind A program super vision framework In Proceedings of the Third Workshop on Runtime Verification RV o3 July 2003 J Newsome and D Song Dynamic taint analysis for automatic detection analysis and signature generation of exploits on commodity software In Proceedings of the 12th Annual Network and Distributed System Secu rity Symposium NDSS 2005 February 2005 Perl
24. lices The re sulting program segment is a set of statements that GIFT instruments because they are affected by input data and their results may be used by output functions While conceptually promising the performance gain from this optimization depends on the pointer usage in the applications If an application uses pointers and or function pointers extensively this slicing optimization does not help much as Codesurfer s pointer analysis is not very effective Even when Codesurfer just performs flow insensitive pointer analysis it requires an inordi nate amount of memory resource and could run very slowly when one turns on the most accurate pointer anal ysis option In the case of less accurate pointer analysis option the result that Codesurfer produces is not much different from the original program and as a result slic ing does not result in any noticeable performance im provement Aussum Subsystem Application Binary Application GIFT Source Code Compiler network connections J Input Dynamic Taint Tracker System Call Monitor Monit jonitor User Level Security Policy Callback Function FHH Library Security Policy Hooks f ai t User i y Kernel Sandbox Subsystem Linux Kernel Figure 2 The Aussum compiler is built from the GIFT framework The Input Monitor inter cepts data read from a network conn
25. m allows the same application to be sandboxed in a different way when it operates on different input objects 2 General Dynamic Information Flow Tracking Framework GIFT GIFT associates each program variable in an appli cation with a 4 byte tag which could correspond to a piece of metadata that annotates the variable or a pointer to another data structure that annotates the variable Be ing an application independent information flow track ing framework the GIFT compiler does not interpret the tags and leaves their interpretation to the applica tion programmers Therefore a tag can be used to rep resent different metadata for different types of infor mation flow tracking for example packet ID user ID file name network IP address security class etc The main job of the GIFT compiler is to insert code that calls programmer provided tag initialization and com bining functions at the times specified by the program mers The GIFT compiler is also responsible for passing tags throughout an entire program To use the GIFT compiler the application program mer needs to specify a set of interception points each of which corresponds to a function in the original applica tion and a proxy function that should be called instead at each of the interception points Currently GIFT sup ports the following three types of interception points e Input Channel These correspond to functions that bring external data into an application
26. nftp FTP client 35 573 183 820 64 11 61 49 50 71 yafc FTP client 26 587 204 444 49 93 44 14 34 33 gnut Gnutella client 22 298 190 804 58 11 46 44 31 65 cg newsgroup binary downloader 10 974 54 260 79 28 76 81 61 21 wget mirroring tool 35 018 164 904 50 15 48 54 43 57 lynx text web browser 161 605 1 038 340 36 12 29 81 28 85 Table 1 Description of the test applications used in the performance evaluation study and the increase in binary size for each of three versions of Aussum Application Splay Tree Shadow Splay Shadow Splay Slice PP Elapsed Time CPU Time Elapsed Time CPU Time Elapsed Time CPU Time Overhead Overhead Overhead Overhead Overhead Overhead tnftp 6 52 24 14 0 70 19 54 0 90 16 54 yafc 6 41 78 18 2 92 15 45 1 32 10 00 gnut 89 12 257 14 5 44 28 57 1 16 20 00 cg 308 97 1120 8 97 180 6 41 160 00 wget 71 06 534 76 14 29 127 2 9 75 117 13 lynx 152 86 400 30 00 166 67 31 43 166 67 Table 2 Performance overheads of the three Aussum versions when compared with vanilla GCC tainted data and files and is organized as a loadable Linux module When an application opens an existing file for write by invoking fopen or open with tainted input arguments the sandboxing system copies the file to some temporary directory and opens it there in a way similar to FVM 20 and Alcatraz 9 To evalu ate th
27. oks module is represented in the form of callback functions which determine if a network con nection should be considered suspicious and or if a file should be marked as tainted When a tainted file is ex ecuted or used as input to a new process Aussum au tomatically sandboxes the corresponding process Aus sum can work with any existing sandbox system as long as the sandbox system implements the security policy hook functions required by Aussum 1 Struct au_descriptor_t 2 int suspicious tag to identify network descriptor 3 int type file or socket 4 char name ip or file name 5 au_descriptor max_open_files 6 7 int aussum_connect int s struct sockaddr addr socklen_t addrlen 8 int ret 10 ret connect s addr addrlen call the original function ti if ret 1 connection successes 12 eo Xs create descriptor entry and set the ip callback function implemented by the sandboxing system 13 if mark_connection_ptr NULL 14 if mark_connection_ptr addr let a sandboxing system to ma 15 au_descriptor s gt suspicious TRUE 16 else 17 au_descriptor s gt suspicious FALSE 18 19 else 20 au_descriptr s gt suspicious TRUE default operation 22 return ret Figure 3 Proxy Function for connect 4 2 Input Monitor Because both network connections and files are accessed through descriptors using functions such as read and fread the Input Monitor needs to firs
28. pport selective sandboxing which sandboxes the same application differently depending on the mode in which the application is currently in To solve this problem some existing sandboxing systems such as Tiny Fire wall 18 require the end user to specify both the applica tions that need to be sandboxed as well as the policies to be used This approach is inconvenient and error prone Eventually the user is likely to choose convenience over security and turns off sandboxing completely Other systems such as SEES 8 attempt to address this prob lem by transparently intercepting the file download path of web browsers and email clients and marking each downloaded file These systems then sandbox the exe cution of a program if it itself is marked or the object it operates on is marked such as Mcrosoft WORD How ever this approach is limited because the way they mark downloaded files is very specific to individual applica tions and therefore cannot be generalized to arbitrary network applications This section presents the design and implementation of Aussum which is an application of the GIFT com piler framework that can automatically instrument net work applications so as to enable selective sandbox ing Aussum leverages GIFT s dynamic information flow tracking capability to automatically mark contents downloaded from the Internet When marked contents are written to a file Aussum marks the file in such a way that any subsequent ex
29. ransfer instructions are intercepted to make sure no tainted data can be used as the transfer destination The disadvantage of this scheme is its performance overhead which can be more than 20 times slower Furthermore this scheme can use 4 times more memory in the worse case Sekar et al 19 also developed a comprehensive taint analysis system to detect and prevent a wide range of attacks such as buffer overflow format string cross site scripting and SQL injection attacks The way they instrument applications is similar to GIFT except that they use one bit to tag each memory byte and GIFT use a four byte shadow variable to tag each data vari able Their system focuses on detecting different attacks while GIFT focuses on providing a set of APIs to al low users to propagate program information in their own way Therefore a 1 bit tag is not suitable for GIFT The 4 byte tag of GIFT can be used as a pointer to point to a more complicate data structure which can record more information such as user ID IP address and packet ID Perl also implements taint check 14 to lock out the security bugs existed in perl scripts When the taint check is turned on Perl marks all user input data as tainted The tainted data may not be used in a call to the functions such as eval and open If the interpreter detects an operation that uses tainted data in a manner it considers unsafe it stops the execution with an error Unlike the traditional in
30. ream 10 if au_descriptor fd gt marked we have not marked the file 11 if is_suspicious ptr is the memory location suspicous 12 asussum_mark file ptr fd au_descriptor fd gt name 13 au_descriptor fd gt Marked 1 14 15 17 return ret Figure 4 Proxy Function for fwrite Line 8 of figure 4 checks if the underlying sandbox ing system provides a file marking function Through the function pointer aussum_mark_file_ptr the underlying sandboxing system can make its own deci sion on whether a file should be marked as tainted and if so how For example even if data written to a file are marked as tainted the underlying sandboxing sys tem can choose to mark the resulting file as not tainted because the file type is txt The embedded scripts present a technical challenge to Aussum because it cannot afford to sandbox at all time the application that downloads them such as IE or Firefox Ideally one should sandbox the downloading application only when it is executing embedded scripts but leave it alone to run at full privilege when it oper ates on local files Aussum solves this problem by selec tively turning on and off sandboxing for a network ap plication based on whether the operands it is currently operating on is tainted or not It makes the following three assumptions First a malicious script cannot cause damage if it cannot make system calls Second a mali cious script can only make
31. rovided proxy functions and the tag propagation function for assignment statements 3 Design and Implementation of the GIFT Framework 3 1 Tag Management GIFT associates a tag with each data memory block which could be a local variable a global variable or a 1 int buffer 10 1 int buffer 10 2 2 3 void work void 3 void work 4 4 5 int a b c d 5 int a b c 6 b 10 6 int a_tag_info 0 b_tag_info 0 c_tag_info 0 d_tag_info 0 a c 20 7 int fun_index 8 read socket_fd amp a 4 8 fun_index gift_add_locals_to_tree 2 9 d decode amp a b c 9 amp a 4 amp a_tag_ info amp d 4 amp d_tag info 10 write output_fd amp d 4 10 11 Ti b 10 12 12 b_tag info 13 int decode int a int b int c 13 G 20 14 14 c_tag info 15 int r 15 gift_save_argument_tag aussum_read 1 amp socket_fd_tag_info 16 r a b 30 16 read_proxy socket_fd amp a 4 17 return rj 17 gift_save_argument_tag decode 2 amp b tag info 1 amp c_tag_info 2 T8 18 d decode amp a b c 19 gift_copy_return_tag decode_return_address amp d_tag_info 0 20 gift_save_argument_tag aussum_ write 1 amp output_fd_tag_info 21 write _proxy output_fd amp d 4 22 gift_remove locals from _tree fun_index 23 24 int decode int a int b int c 25 26 int 27 int r_tag_info 0 b_tag_ info c_tag info 28 gift_init_parameter_tag decode 2 amp b_ tag info 1 amp c_tag_info 2 29 30 r a b 30 31 gift_set_tag amp r_ta
32. ry functions in LIBC such as memcpy and strcpy The GIFT compiler redirects all calls to memory string copying functions to their corresponding wrapper functions which properly set the tag of the destination memory block using the tags associated with the source memory blocks 3 3 Array Union and Structure GIFT treats each array union and structure variable as a single memory block and allocates to it a single tag This means that the tag of an array union structure should contain the most important metadata associated with that variable If fine grained tagging for example one tag for each field in a structure variable is needed the developer can use the structure variable s shadow variable as a pointer to a more complex tag data struc ture which can contain many sub tags 3 4 Slicing Optimization Blindly intercepting all assignment statements and function calls returns in a program may result in many unnecessary calls to GIFT library functions when only parts of the program involve data read from input chan nels To focus only on those program statements that manipulate data related to input channels GIFT uses the program slicing result from a commercial tool called Codesurfer 5 Given a set of input and output functions that users want to intercept GIFT computes a forward slice of the original program from the input functions and a backward slice from the output functions and takes the intersection between the two s
33. se of pointer aliasing and loops with a dynamic bound Therefore accurate information flow tracking has to be done dynamically This requires instrumentation of pro grams so as to initialize propagate and combine tags ap propriately as their corresponding program variables are being processed While it is possible for application pro grammers to take on this instrumentation task it would be ideal if a compiler could automate the entire program instrumentation process and make it more efficient and less error prone This paper describes the design and implementation of such a compiler called GIFT Gen eral dynamic Information Flow Tracking GIFT is a compiler for programs written in the C language that takes programmer specified application specific rules for tag initialization propagation and com bination and automatically instruments programs so as to execute these rules as part of the program execu tion GIFT not only significantly improves the accu racy of information flow tracking by applying dynamic data control flow analysis but also largely automates the process of implementing information flow tracking into individual applications To the best of our knowl edge GIFT is the first known application independent implementation framework for information flow track ing which could be quickly customized through incor poration of application specific knowledge Compared with static information flow tracking 10 16 GIFT guar
34. system calls through the in terpreter that interprets it Finally if a malicious script causes damage through system calls some of the system call arguments must be tainted Based on these assump tions Aussum intercepts sensitive system calls such as open and exec to check if any of the input arguments is tainted By default Aussum terminates any applica tions that attempt to invoke a sensitive system call using tainted arguments except those that use file names read from the network to open or create non existing files Aussum allows applications to open the non existing files because applications such as Firefox and IE need to save to cache or temp directory files whose names are derived from an html document file Although this de fault policy may open doors to deny of service attack at least it prevents malicious scripts from modifying or deleting the existing files This policy is also consistent with the taint data policy used in Perl Unfortunately this policy may break certain peer to peer applications which allow a remote application to request a local application to perform sensitive system calls To address this issue Aussum again provides a callback function called aussum_check_argument to allow the underlying sandboxing system to override the default policy by allowing for example read and write accesses to certain directories or execution of cer tain existing binaries 4 5 Callback Functions as Security Pol icy
35. t identify tainted descriptors so that it could properly mark results of read and fread calls as tainted In each application Aussum maintains a descriptor table to store information about each opened file or socket descriptor To mark a descriptor Aussum uses GIFT to redirect each call to accept connect dup or dup2 to its corresponding proxy function which creates a descriptor table entry for each new de scriptor and marks it according to the security policy More specifically if a security policy callback function int mark_connection sockaddr addr ex ists these proxy functions call this function to determine how to mark a new descriptor otherwise they simply mark all socked descriptors as tainted For example the proxy function for connect works as in figure 3 Line 10 of the proxy function in figure 3 calls the original connect function and Line 13 17 con sults with the underlying sandboxing system to deter mine if a network connection is suspicious Through this callback function mechanism the underlying sand boxing system can implement its own security pol icy For example all data from a local net work may be considered as safe If the variable mark_connection_ptr which points to the call back function int mark_connection sockaddr xaddr is null the underlying sandboxing system does not have any specific security policy and Aussum simply marks every network connection descriptor as tainted The parameter addr
36. tor marks as tainted output files that contain tainted data If an executable file is tainted the sandboxing system will sandbox its exe cution when it is invoked If a document is tainted the sandboxing system will sandbox the application that opens it This mechanism can effectively thwart MS WORD VB Macro virus attacks and recent Windows WMF attacks Ideally the taint attribute of a file should be an inherent part of the file so that when it is copied to a new file its taint attribute is copied automatically Under an UNIX like operating system there is no un used file attribute that can be used for this purpose One possible way to solve this problem is to create a group called TAINTED If a file is considered tainted its group owner attribute is set to TAINTED However Aussum leaves this decision to the underlying sandboxing sys tem That is Aussum provides a callback function for the underlying sandboxing system to specify how files should be marked as tainted To support file marking Aussum intercepts file output functions such as write and fwrite using proxy functions For example the proxy function for fwrite is shown in figure 4 1 size_t aussum_fwrite const void ptr size_t size size_t nitems FILE str 2 3 int ret 4 int fd 5 6 ret fwrite ptr size nitems stream call the original function 7 if ret nitems 8 if aussum_mark_file ptr NULL hook function for marking files 9 fd fileno St
37. uctor function for each source file as indicated by the function at Line 37 of Figure 1 B The compiler always generates code to call global constructor functions before the main function is executed e void gift_add_heap_to_tree void address int size is called in the malloc proxy functions to create tree nodes for heap memory blocks allocated by malloc e void gift_remove_heap_from_tree void address is called in the free proxy function to remove tree nodes associ ated with heap memory blocks 3 2 Dynamic Tag Tracking For each assignment statement in the program GIFT inserts a call to gift_set_tag which sets the tag of the left hand side memory block based on those of the right hand side blocks as shown in Line 31 of Fig ure 1 B which corresponds to the assignment state ment in Line 16 of Figure 1 A In this case the address of r s shadow variable the address of the memory block pointed to by a and the address of b s shadow variable are passed to gift_set_tag which only needs to look up the splay tree for the address of a s shadow variable Because GIFT does not understand how a tag is used after the addresses of all the shadow variables involved in an assign ment statement are resolved gift_set_tag calls the user supplied function gift_do_set_tag void ltag int number void xtags toactually propagate the tags The argument void tags of this function are the addresses of the shadow variables of the r
38. umented versions of wget and lynx to download files from the Internet Aussum could successfully stop their open system call when a file name is read from a non intranet host and a local file with the same name exists We alo tested Aussum on several network applications used at end user machines In all cases Aussum is able to cor rectly mark as tainted those files that are downloaded from suspicious network addresses as specified in the se curity policy If a file is downloaded from a trusted host or it is generated by the test applications themselves Aussum marks the file as non tainted These experi ments show that the taint attribute propagation mech anism of Aussum which is built on GIFT can indeed correctly propagate tags from input data to output data 5 Related Work Flow Caml 15 16 is an extension of the Objective Caml language with a type system tracing information flow Each variable in a Flow Caml program is anno tated with a principal which can be any entity such as a user a file stdin and stdout Information owned by one principal cannot flow to another principal un less the programmer specifically permits the operation using the keyword flow The example flow bob lt alice means that the programmer specifically allows Bob to send information to Alice To correctly implement a program in Flow Caml a programmer must first list all principals appeared in the program and care fully layout who can read whose in
39. y the call to gift_save_return_tag at Line 33 of Figure 1 B The first argument return_address is the return address of the callee which is generated by the GIFT compiler The caller compares the re turn address on the shadow stack with the call site and propagates the tag of the return value accordingly if they match by calling gift_copy_return_tag as shown in Line 19 of Figure 1 B The first ar gument decode_return_address of the call to gift_copy_return_tag is the call site return ad dress of the call to the function decode at Line 18 Gift_copy_return_tag compares this return ad dress with the return address stored in the shadow stack If the two return addresses do not mach the function call is returned from a legacy function which does not place the return tag on the shadow stack In this sce nario gift_copy_return_tag treats the return tag as zero When a GIFT function calls a legacy function the tags that the GIFT function puts on the call shadow stack is ignored When the GIFT function returns the infor mation that the GIFT function puts on the return shadow stack is ignored by the legacy function In either case the program continues working without any disruption Consequently the above tag propagation scheme solves the compatibility problem associated with the shadow variable approach To avoid complete re compilation of the LIBC li brary GIFT includes a set of wrapper functions for memory and string copying libra

Download Pdf Manuals

image

Related Search

Related Contents

  WH-M505 - Shimano  u740 manual del usuario  

Copyright © All rights reserved.
Failed to retrieve file