Home

Structural Identification of Immunoglobulin Variable Domains

1. 33 CHAPTER 3 RESULTS AND DISCUSSION 3 2 Template verification After the templates had been improved to increase their reliability in identifying and classifying immunoglobulin variable domains tests were carried out to verify these templates In this case for each known domain we compared it not only to the template it is supposed to match we also compared it to other templates to verify the reliability of those templates For example we compared known VL domains not only to VL template and the improved immunoglobulin variable domain template we also compared them to other templates including the VH TCR and f variable domain templates Similar tests were carried out for other kinds of known variable domains The program was tested with the following 65 structures from the Protein Data Bank e Antibody structures with VL and VH domains Intact antibody 11GT 11GY Fab 12E8 15C8 1ACY 1ADO 1AD9 1A0Q 1ADQ 1AE6 1AFV 1AHW I AIT 1A14 1A3L 1A3R 1A4J LASF 1A6T Fv 1A6V 1A6W 1A7N 1470 1A7P 1A7Q 1A7R 1A6U scFv 1LMK diabody INQB 2AP2 1AP2 1QOK 1F3R 1H8N 113G 1H80 1HSS anti idiotope 1CIC e Bence Jones proteins with only VL domains IBWW 1REI 1AR 2 2RHE 1BJM 3BJL 4BJL e Camel antibodies with only VH domains 1JTP 1JTO IMEL 1F2X 1G6V 1B2Q LTT e T cell receptors with a and domains 1BD2 IQRN 1FYT 1QSE 1QSF 1AO7 1TCR 1K
2. count1 count1 1 J Pattern column row if count1 gt SumHbond column row Hbond_less1 if SumHbond row column 0 count2 0 for 1 0 I lt SumHbona row column l Count the total number of hydrogen bonds at gt corresponding positions in Pattern row column xpos row_start Tempposition sum2 I 0 Blockstart row ypos column_start Tempposition sum2 1 Blockstart column if xpos chain size amp amp ypos chain size amp amp Hbondno xpos ypos 71 count2 count2 1 2 if count2 gt SumHbond row column Hbond_less2 Here print Pattern column row and Pattern row column Figure 2 8 Procedure for searching for a pair of inter block hydrogen bond patterns in one round When we search for intra block hydrogen bond patterns or a pair of inter block hydrogen bond patterns we search for the same patterns from that of the examined chain at first If no match is found we then relax the hydrogen bond pattern to find a similar one by permitting one hydrogen bond missing step by step until we find it Finally we count the total number of intra block and inter block hydrogen bond patterns found in the pattern of the examined chain If we find all of them the examined chain matches the template For example if we found 12 patterns of the VL template from the patterns of the examined chain we say that this examined chain matches VL template 27 CHAPTER 3 RESULTS AND DISCU
3. s ccscssssesnsssavessssesessessenssensssnsseasorteosaess 47 Figure B 3 Typescript of running the search program using command bin search 47 Figure B 3 Typescript of running the search program using command bin search a EG ER N 48 Figure B 4 New screen shown after using command tcl v_seq tel iese esse esse ee 49 Figure B 5 List of choices shown in the new screen sse 49 Figure B 6 One example of schematic representation of the template 50 VII CHAPTER INTRODUCTION Chapter 1 Introduction Immunoglobulin variable domains are present in many proteins These domains have a compact globular structure containing two beta sheets Examples of immunoglobulin variable domains include antibody variable domains and TCR a and Pp variable domains These variable domains have hypervariable loops which connect strands of this sheet framework and are important for binding antigen Immunoglobulins are of particular interest due to their high degree of specificity which provides a wide range of therapeutic and other applications Experimental developments over the past few years have led to new techniques for constructing artificial molecules based on natural immunoglobulin variable domains Although a molecule s three dimensional structure determines its biological function many sequences do not have their structures determined experimentally Known variable domain structu
4. the reliability of this template 29 CHAPTER 3 RESULTS AND DISCUSSION Figure 3 2 Improved plane of the P sheet framework conserved in TCR domain Table 3 1 Added hydrogen bonds in the new TCR domain template C O N H 31 95 95 31 34 47 47 34 114 88 9 113 117 15 30 CHAPTER 3 RESULTS AND DISCUSSION The main reason to add more hydrogen bonds is that more hydrogen bonds in Pattern i will increase the probability to get only one match for this pattern For example if only one part of the hydrogen bond pattern between the main chain C O group and N H group of residues of the examined chain is same as the hydrogen pattern between Block 4 and Block 6 of the TCR domain template shown in Figure 3 2 which means there are four pairs of main chain hydrogen bonds with the corresponding positions same as those between Block 4 and Block 6 of the template If we search the hydrogen bond pattern of the examined chain only for the lower three pair hydrogen bonds we can get two matches either the upper three pairs or the lower three pairs However if we search for these four pair hydrogen bonds we can get only one match Thus the reliability of this template was improved to be used for classifying TCR variable domains by adding more conserved main chain hydrogen bonds 3 1 2 New immunoglobulin variable domain template Figure 3 3 shows the improved P sheet fram
5. 1024 Jan 12 12 36 src drwxr xr x 2 mdlhuihu cthstud 1024 Jan 12 12 32 tcl Figure A 1 Contents of root directory Sod Sd total 276 rw r r mdlhuihu cthstud 5821 Jan 5 10 02 HbondtempTAlpha c rw r r mdlhuihu cthstud 5969 Jan 5 10 04 HbondtempTBeta c rw r r mdlhuihu cthstud 5841 Jan 5 10 06 HbondtempV c rw r r mdlhuihu cthstud 6173 Jan 5 10 10 HbondtempVH c rw r r mdlhuihu cthstud 5983 Jan 5 10 12 HbondtempVL c rw r r mdlhuihu cthstud 2261 Jan 5 10 14 Hcoordinate c rw r r mdlhuihu cthstud 2692 Jan 5 10 15 ReadHbond c rw r r mdlhuihu cthstud 4680 Jan 5 10 16 ReadPatterntemp c rw r r mdlhuihu cthstud 5935 Jan 5 10 20 getHbond c rw r r mdlhuihu cthstud 590 Jan 5 10 18 getHbondScope c rw r r mdlhuihu cthstud 1233 Jan 5 10 19 getSum c rw r r mdlhuihu cthstud 13270 Jan 5 09 59 main c rw r r mdlhuihu cthstud 1916 Dec 31 01 12 patternsearch h rw r r mdlhuihu cthstud 48057 Jan 5 10 33 printHbond c rw r r mdlhuihu cthstud 8929 Jan 5 10 35 searchDiagonalPattern c rw r r mdlhuihu cthstud 13559 Jan 5 10 40 searchPattern c rw r r mdlhuihu cthstud 720 Jan 5 10 40 size c rw r r mdlhuihu cthstud 445 Jan 5 10 40 sortHbond c Figure A 2 Contents of directory src 40 APPENDIX A MAINTENANCE MANUAL Sls 1 total 304 IWXI XI X 1 mdlhuihu cthstud 145324 Jan 12 Figure A 3 Contents of directory bin gt 1s 1 total 2230 SEW ee m
6. 20 CHAPTER 2 METHODS Table 2 5 Recorded hydrogen bonds in VL template file Hbondtemplate txt C O N H Pattern 1 3 5 24 9 103 Pattern 2 7 11 105 13 107 Pattern 3 1 24 5 19 75 Pattern 3 5 21 73 23 71 33 50 35 47 35 48 37 45 39 42 Pattern 4 4 45 37 47 55 48 35 49 53 53 49 34 89 Pattern 4 6 36 87 38 85 71 23 Pattern 5 3 73 21 75 19 61 76 63 74 65 72 Pattern 5 5 67 70 70 67 72 65 74 63 85 38 Pattern 6 4 87 36 89 34 84 104 Pattern 6 7 86 102 88 99 90 97 Pattern 7 2 103 11 105 13 97 90 Pattern 7 6 102 86 In order to compare the whole hydrogen bond pattern between the examined chain and the template we can compare each non empty Pattern i j step by step instead of the whole pattern at one time Finally we count the total number of the found Pattern i j of the template in hydrogen bond pattern of examined chain to determine whether the examined chain matches the template domain 21 CHAPTER 2 METHODS When we analysed the hydrogen bond pattern Pattern i j of each template we found that usually intra block hydrogen bond patterns have more hydrogen bonds than inter block hydrogen bond patterns For example there are 10 and 7 hydrogen bonds in two intra block hydrogen bond patterns Pattern 4 4
7. 21 Table 3 1 Added hydrogen bonds in the new TCR P domain template 30 Table 3 2 Different hydrogen bonds between the new immunoglobulin variable domain template and he old one aca eoo oe eben A taaan 32 Table 3 3 Proposed New Codes vs the codes using Chothia Numbering Scheme 33 Table 3 4 Summary of results of tests Se Ur 35 Table 3 5 Different hydrogen bonds between VH and TCR templates 37 LIST OF FIGURES List of Figures Figure 1 1 Connectivity of strands in a VH domain adapted from Kemp et al 1994 3 Figure 1 2 Simplified structure of a VH domain seen 3 Figure 1 3 Plane of the f sheet framework conserved in VL left and VH right domains adapted from Chothia and Lesk 1987 see 6 Figure 1 4 Plane of the sheet framework conserved in T cell receptor a left and right variable domains adapted from Chothia et al 1988 T Figure 1 5 Plane of P sheet framework conserved in VL VH T cell receptor a and P variable domains adapted from Chothia et al 1988 sse 8 Figure 2 1 Flow chart for identifying and classifying immunoglobulin variable domains A RE KERE ED NOMEA atodos 10 Figure 2 2 The polypeptide eia assem tace IRNOS 11 Figure 2 3 The relative positions of main chain atoms C N and H bonded to N of residue i main chain atom C of residue i l seen 12 Figure 2 4
8. 5 and five pairs inter block hydrogen bond patterns Pattern 1 3 and Pattern 3 1 Pattern 2 7 and Pattern 7 2 Pattern 3 5 and Pattern 5 3 Pattern 4 6 and Pattern 6 4 Pattern 6 7 and Pattern 7 6 Next we record the hydrogen bonds of the template using the method shown in Figure 2 5 Here sum_block refers to the total number of blocks in the template Blockstart i and Blocklength i are the starting point and the length of the Block i respectively The two dimensional array Tempposition records the positions of hydrogen bonds in the template Table 2 5 shows the format of the recorded hydrogen bonds in the VL template using the method shown in Figure 2 5 if out1 fopen Hbondtemplate txt wb NULL else printf Hbondtemplate txt cannot be opened n for i 0 i lt sum_block i OR for j 0 j lt sum_block j for xpos Blockstart i xpos lt Blockstart i Blocklength i xpos for ypos Blockstart j ypos lt Blockstart Blocklength j ypos Record the template Hbond position for each pair of blocks and record the Tempposition count 0 xpos positions of hydrogen bonds Tempposition count 1 ypos into file Hbondtemplate txt fprintf out1 d d n xpos ypos count count 1 if Hbondtemplate xpos ypos 1 A closeresult fclose out1 if closeresult 0 printf Hbondtemplate txt cannot be closed n Figure 2 5 Record Hydrogen bonds of template
9. 54 02 11 05 57 11 26 19 06 08 15 03 01 search Blocklength txt Blockstart txt CAcoordinate txt Ccoordinate txt Chainposition txt hainresidue txt orrespondingChain txt orrespondingHbond txt bond txt bondSize txt bondTempSize txt bondno txt bondnotemp txt bondtemplate txt Ncoordinate txt Ocoordinate txt Size txt SumBlock txt SumHbond txt Sumpattern txt coordinates txt data txt getcoordinate pl start txt Tome aamwaaa TBCodesnew txt VCodesnew txt frameworkGridXY TA txt frameworkGridXY TB txt frameworkGridXY TBnew txt frameworkGridXY V txt frameworkGridXY VH txt frameworkGridXY VL txt frameworkGridXY Vnew txt frameworkPositions TA txt frameworkPositions TB txt frameworkPositions TBnew txt frameworkPositions V txt frameworkPositions VH txt frameworkPositions VL txt frameworkPositions Vnew txt hBonds TA txt hBonds TB txt hBonds TBnew txt hBonds V txt hBonds VH txt hBonds VL txt hBonds Vnew txt temp ps v seq tcl vhCodes txt vlCodes txt APPENDIX A MAINTENANCE MANUAL A 2 Description of files getcoordintate pl Here perl version 5 6 is used This perl file is in the directory of Data to read the coordinates of main chain atom positions of C O N and C from corresponding PDB file The PDB file is recorded into Data data txt All these atom coordinates are recorded into Data coordinates C O N and C positions are recorded into Data Ccoordinate txt Data Oco
10. ct Gt er ct ct ct ct ct GE ct ct Ck ct ct ct ct Cx ct ct CL ct ct ct hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud hstud 24 26 6325 6327 748 856 3452 1416 1144 91806 259560 299 6326 6326 4 2 106 2 69336 646053 4984 26 405 336 432 426 387 320 388 363 329 438 429 395 317 389 367 326 439 415 502 407 568 509 419 51269 53264 699 627 Jan Jan Dec Dec Dec Dec Jan Jan Jan Jan Jan Jan Jan Jan Dec Dec Dec Jan Jan Jan Dec Dec Jan Jan Dec Oct Nov Nov Dec Nov Nov Nov Nov Oct Oct Dec Oct Oct Oct Oct Oct Oct Dec Oct Oct Oct Oct Dec Jan Oct Oct 12 12 31 Sil 31 d ND D NO ND PO PO ND 2 3 hen wo CD 3 20 19 19 19 25 25 26 20 20 28 2 5 25 26 20 21 29 Ll 20 21 41 12 38 12 38 12 38 01 30 01 30 01 30 01 30 2138 2 38 2 38 2 38 2 38 2 38 2 38 2 38 01 30 01 30 01 30 2 38 2 38 2 38 01 30 01 30 0 42 2 38 22 Ww O wo UI Ww DNONDONOWOUDAINWO WS o o n 22 0 1 4 09 093 09 3 al o OY O1 O CO Ex Ww o 1 29 22 20 05 03 08
11. different domains is introduced in Section 1 3 Chapter 2 describes the main methods used in this project to compare the hydrogen bond pattern of the examined chain to those of the templates Chapter 3 gives the improved templates and summarises the performance of the software developed in this project Finally short conclusions are drawn in Chapter 4 to evaluate this software The maintenance manual and user manual of the programs developed in this project are given in appendices 1 3 Background 1 3 1 Immunoglobulin variable domain In this project we are interested in structures that are classified in SCOP Murzin et al 1995 as belonging to the family V set domains antibody variable domain like such as antibody variable domains and TCR and f variable domains These structures have a f sandwich fold with two antiparallel sheets Within each immunoglobulin variable domain the protein chain threads back and forth from one end of the domain to the other Adjacent strands in the same sheet are held together by a regular hydrogen bond pattern to provide a stable framework The heavy chain variable domain VH domain is one of the immunoglobulin variable domains and the connectivity of its strands is shown in Figure 1 1 CHAPTER INTRODUCTION H3 H2 N H1 C Figure 1 1 Connectivity of strands in a VH domain adapted from Kemp et al 1994 By cutting the loops towards the bottom of Figure 1 1 and folding the four
12. energy Here we choose a generous cutoff and assign a hydrogen bond between C O group of residue i and N H group of residue j if E lt 0 5 kcal mol 2 3 Identify framework blocks As shown in Figure 1 1 each immunoglobulin variable domain consists of two PB sheets each sheet is composed of several strands In each strand between members of the same family there are no insertions or deletions However insertions or deletions may occur in the loops between strands In this case we need to divide the whole sequence of the f sheet framework into several blocks corresponding to the strands in the immunoglobulin variable domains Figure 2 4 shows the blocks of residues in the conserved f sheet framework of immunoglobulin variable domains In this figure blocks are numbered consecutively from 1 to 8 with block 1 closest to the N terminal while block 8 closest to the C terminal Here each block represents one strand Since there is no insertion or deletion between blocks 5 and 6 for some kinds of immunoglobulin variable domains sometimes these two blocks are combined into one block Therefore for different kinds of immunoglobulin variable domains the total numbers of blocks may be different but they are partitioned in the same way 14 CHAPTER 2 METHODS Block1 Block3 Block6 Block5 Block7 E Block4 Block2 Block8 Figure 2 4 Blocks of residues in the conserved sheet framework of variable doma
13. left to the lower right for example Pattern 2 7 and Pattern 7 2 in VL TCR a and f templates Pattern 2 8 and Pattern 8 2 in VH template All these hydrogen bond patterns are shown in Tables 2 1 2 2 2 3 and 2 4 Suppose we want to search for an intra block hydrogen bond Pattern 1 1 shown in Figure 2 6 Suppose in total there are n hydrogen bonds in all of the Pattern c r when c lt i or c i and r lt i The numbers in parenthesis shown in Figure 2 6 are the sequence numbers we record with the corresponding hydrogen bonds According to these 22 CHAPTER 2 METHODS sequence numbers we can easily find the corresponding hydrogen bond positions from file Hbondtemplate txt Blocklength i 1 n 6 1 n 5 5 1 n 4 2 1 n 3 ds 1 n 2 1 n 1 1 n Figure 2 6 One example of intra block hydrogen bond pattern Suppose the length of the examined chain is L A hydrogen bond pattern in a square with the side of L is obtained after calculating the energy between main chain C O groups and N H groups of the examined chain In this case we divide the hydrogen bond pattern of the examined chain into small squares with the side length same as the length of Block i Since this kind of hydrogen bond pattern lies in one block of the template we only need to check the squares on the main diagonal line of the hydrogen bond pattern of the examined chain We call the process of searching the observed hyd
14. sum l 0 Tempposition sum l 1 position xpos sign xpos 0 position ypos Hbondno xpos ypos else if xpos lt chain_size amp amp ypos lt chain_size amp amp position ypos position ypos 1 printf d d d d c Y d n Tempposition sum l 0 Tempposition sum l 1 position xpos position ypos sign ypos 0 Hbondno xpos ypos else if xpos lt chain_size amp amp ypos lt chain_size amp amp position xpos position xpos 1 amp amp position ypos position ypos 1 printf d d d c d c d n Tempposition sum 0 Tempposition sum l 1 position xpos sign xpos 0 position ypos sign ypos 0 Hbondno xpos ypos else if xpos lt chain_size amp amp ypos lt chain_size amp amp position xpos position xpos 1 amp amp position ypos position ypos 1 printf d d d d d n Tempposition sum l 0 Tempposition sum l 1 position xpos position ypos Hbondno xpos ypos else printf d d n Tempposition sum l 0 Tempposition sum l 1 Hbond 0 start Hbond 1 end 1 Print hydrogen bond positions in the matched square Figure 2 7 Procedure for searching for intra block hydrogen bond pattern in one round After Pattern 1 1 is found the searching areas for other blocks are adjusted For those blocks before Block i the searching area should end before the starting point of Block i for those blocks after Block i the searching area should start after the ending p
15. with position number 0 from the N terminal and then partition and search the rest of the pattern of the examined chain If no match found in this round we continue ignore one more reside from the N terminal and partition and search the rest of the pattern of the examined chain step by step until one match is found If there is still no match until the total number of ignored residue from the examined chain is up to the 23 residues we search for Pattern 4 4 again by permitting one hydrogen bond to be missed from the beginning until we find the match or the total number of missed hydrogen bonds is up to 3 Figure 2 7 shows the procedure for searching for Pattern i 1 in one round distance refers to the number of residues permitted to be missed in a specific round and its range can be from 0 to Blocklength i 1 Here we use block distance to limit the maximum distance between the starting points of two blocks and it can be 50 residues for two consecutive blocks because these cannot be too far apart and a reliable match can be found using this limit Hbond 0 and Hbond 2 records the starting points of two consecutive blocks or two blocks involved in two consecutive intra block hydrogen bond patterns while Hbond 1 and Hbond 3 records the ending points of these two blocks For example when we search for Pattern 4 4 of the VL template Hbond 0 and Hbond 2 refer to the starting points of Block 4 and Block 3 Hbond 1 and Hbond 3 refer to th
16. 4 shows the new screen shown after using this command By pressing button Template a list of choices is shown Figure B 5 48 APPENDIX B USER MANUAL Eset Template File Protein CDR1 CDR2 CDR3 Figure B 4 New screen shown after using command tcl v seg tcl PRES O lol Template File Protein Figure B 5 List of choices shown in the new screen In order to draw one template we only need to press the corresponding choice from this list Figure B 6 shows the schematic representation of immunoglobulin variable domain template by pressing button V By pressing button File and choose PostScript this figure is saved to file temp ps In order to draw another picture Reset should be pressed before making another choice under Template 49 APPENDIX B USER MANUAL Bi v_seq tel lol Template File Protein CDR CDR2 CDR3 4 4 25 24 69 76 66 73 3 4 24 25 71 71 68 71 19 18 75 82 9 7779 37 102 30 94 33 34 50 51 105 107 92 94 32 32 49 49 107 112 115117 Figure B 6 One example of schematic representation of the template B 2 Format of data files All these data files produced by running perl and C files are in the directory of Data Blocklength txt This one row numbers lists the length of each block consecutively in the template Blockstart txt This one row numbers lists the position of the starting residue of each bl
17. 46 APPENDIX B USER MANUAL gt i gcc gcc gcc gcc gcc gcc gcc gcc gcc gcc gcc gcc gcc gcc gcc gcc gee gcc nstall s n c src main c c src size c coordinate c c src getHbond c c src ReadHbond c bondtempVH c bondtempVL c bondtempTAlpha c bondtempTBeta c bondtempV c c src ReadPatterntemp c c src getSum c c src sortHbond c c src getHbondScope c c src searchDiagonalPattern c c src searchPattern c c src printHbond c Search lm main o size o Hcoordinate o getHbond o ReadHbond o HbondtempVH o ReadPatterntemp o getSum o sortHbond o getHbondScope o searchDiagonalPattern o searchPattern o printHbond o rm search o c src c src c src c src c src c src Q HbondtempVL o HbondtempTAlpha o HbondtempTBeta o HbondtempV o Figure B 2 Typescript of running install sh bin search This program aims to check whether th xamined chain belongs to VL or VH domain T cell receptor alpha or beta domain O10 NRP Ple cho compared compared compared compared compared ae input ice 1 to to to to to VL template VH template T cell receptor Alpha domain template cell receptor Beta domain template Variable domain template m your choice now 1 Now compare the examined chain to template VL Please input starting point now NOT E Please input starting point as 0 if examined chain is not scFv If examin
18. B5 INFD 2CKB 1FOO 1D9K 1G6R 34 CHAPTER 3 RESULTS AND DISCUSSION Table 3 4 Summary of results of tests Identified as T otal VL VH TCR TCR 7 Variable domain VL 66 0 0 0 64 66 Known VH 0 69 54 0 66 69 to be TCR a 0 17 16 0 16 17 TCR 0 0 0 16 14 17 Total 160 169 Table 3 4 summarises the results of these tests In total 169 known immunoglobulin variable domains have been checked and 160 of them matched the improved template of immunoglobulin variable domain which means that the improved one works well for identifying immunoglobulin variable domains Among these we checked 66 VL domains and 17 TCR f variable domains From this table we can see that all of the known VL variable domains matched the VL template and most of the known TCR f variable domains matched the improved TCR variable domain template Moreover no false positive is found for these known VL and TCR variable domains In other words all known VL domains did not match the VH TCR and variable domain templates and all known TCR variable domains did not match the VL VH and TCR a templates Therefore these two kinds of templates work well to classify VL and TCR variable domains We also checked 69 VH domains and 17 TCR a variable domains Although all of the VH domains matched the VH domain template and most of the TCR variable domains matched the TCR a variable domain template mo
19. Blocks of residues in the conserved 2 sheet framework of variable domains T P w A C 15 Figure 2 5 Record Hydrogen bonds of template 20 Figure 2 6 One example of intra block hydrogen bond pattern iese sesse ee ee RA ee 23 Figure 2 7 Procedure for searching for intra block hydrogen bond pattern in one round HH IE 25 Figure 2 8 Procedure for searching for a pair of inter block hydrogen bond patterns in one round RR 27 Figure 3 1 The relationship of VL VH TCR q TCR 7 and immunoglobulin variable A DE Ge SE EG GE GE GR Ge GE h 28 Figure 3 2 Improved plane of the f sheet framework conserved in TCR f domain 30 Figure 3 3 Improved plane of the f sheet framework conserved in VL VH TCR and TOR 2D IOMA EE GE N ee De Sas ER Ge Ge A tana ee Ne 32 Figure 3 4 Superposed template of VH and TCR variable domain templates 36 Figure A 1 Contents of root OUEN RR EG Ge EG N ee 40 Figure A 2 Contents of directory ce Vets 40 Figure A 3 Contents of directory bin dd A ee ee 41 Figure A 4 Contents of directory Data esse esse sesse ese des Seed ese se se De dek se kg Ne de 41 LIST OF FIGURES Figure A 5 Contents of directory 16D ES QI SR EUR uS RDUM IRI enix 41 Figure B 1 Typescript of the getcoordinate pl program using command runperl sh 46 Figure B 2 Typescript of running install sh
20. SSION Chapter 3 Results and Discussion 3 1 Template calibration The first set of tests performed in this project aimed to check the VL VH TCR a f and the immunoglobulin variable domain templates to see whether immunoglobulin variable domains can be identified reliably using these templates Figure 3 1 shows the relationship of the VL VH TCR a TCR f and immunoglobulin variable domains From this figure we can see that the subsets of VL VH TCR a TCR variable domains belong to the family of immunoglobulin variable domains Therefore the known VL VH TCR a and TCR variable domains should be compared to not only their own templates but also the immunoglobulin variable domain template to test the reliability of these templates Figure 3 1 The relationship of VL VH TCR q TCR f and immunoglobulin variable domains For example we compared known VH domains to VH domain template and the immunoglobulin variable domain template to test the reliability of VH and immunoglobulin variable domain templates Similarly we compared the known VL 28 CHAPTER 3 RESULTS AND DISCUSSION domains known TCR domains and known TCR domains to the templates they are supposed to match and the immunoglobulin variable domain template to test their reliability The results here show that the templates of VL VH and TCR domains work well It means that known VL domains can be classified correctly using the VL template S
21. Structural Identification of Immunoglobulin Variable Domains Li Huihua Supervisor Graham J L Kemp Chalmers University of Technology ACKNOWLEDGEMENTS Acknowledgements This report is submitted as a Master s thesis in the International Master s Program of Bioinformatics at Chalmers University of Technology The project described in this report was carried out at Chalmers University of Technology under the supervision of Graham J L Kemp to whom I would like to express my deep gratitude for his advice guidance and encouragement I benefited a lot from close collaboration with him I would also thank other teachers and students in our program for their help and encouragement ABSTRACT Abstract A database of antibody sequences and structures was built in an earlier project Kemp et al 1994 That database contained indexes that made it easy to find residues at structurally equivalent positions in different domains thus making it convenient to explore structural hypotheses However constructing these indexes by hand was a slow and inconvenient process so we now seek to automate this task as we extend the database to include additional immunoglobulin domains Earlier structural studies of antibody VH and VL variable domains Chothia and Lesk 1987 and T cell receptor variable domains Chothia et al 1988 identified template P sheet framework patterns based on conserved main chain hydrogen bond patterns In th
22. a GEWESE ma SEW ma EW CI I ma TIrw r r ma SE WSL pas ma TIW I E ma SEW Sn ma BEWESE EE ma TEWE n TETE ma SEWE ma SE WS TS Se ma SE WSL n ma srw bapes ma TW I CI ma Swf ma GEWESE ma HCW ma SEW Sek Va a ma BEWEER ma rW I I ma SE W E E2 ma SITW EF E ma EW I UET ma Figure A 4 mcs total 258 EWST I gt ma SEWE ma SEW rer ma SIWAL E gt gt ma TYW ETLE ma pwW I L I ma SE W E E ma TIrwW Ir r ma EW LA gt ma LDW CI CPE ma SEW sn ma TDW I E ma SIW Lhasa ma SE W L E ma TWO EES ma SEWPL LA ma IWS IT ATE gt ma EW See ma SEWE SEE ma SEWE ma FTW CrI rfI ma TIWOE KE gt ma SEWEF SERES ma TIWI E ma IWXI XI X ma Cw Eo Sr ma SE WT ws ma Figure A 5 Contents of directory tcl hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui Contents of directory Data hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui hui nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu nu ct ct ct ct ct ct CL ct ct ct ct ct ct GE cr ct ct ct ct ct ct ct ct Ck ct ct et ct
23. ain C 0 and N H group and calculate H positions bonded to main chain N Find all main chain hydrogen bonds Identify framework blocks Compare the hydrogen bond pattern of examined chain to that of template different Y The examined chain does not match the template domain The examined chain matches the template domain Figure 2 1 Flow chart for identifying and classifying immunoglobulin variable domains 2 1 Get positions of main chain C O and N H groups Since we need the positions of main chain atoms C N atom O bonded to main chain atom C and atom H bonded to main chain atom N to calculate the energy between main chain C O group of residue i and N H group of residue j we read positions of main 10 CHAPTER 2 METHODS chain atoms C N C and atom O bonded to main chain atom C at first by writing a program getcoordinate pl Since we cannot get the positions of atom H bonded to main chain atom N from PDB files we use the positions of main chain atoms C N and C to compute the positions of the hydrogen atom bonded to each main chain nitrogen atom A polypeptide chain with three amino acid residues is shown in Figure 2 2 In this figure the peptide bonds joining consecutive residues are represented by the thick black lines R1 R2 and R3 are the side chains of different residues The six atoms in the dotted rectangle are on the same p
24. and Pattern 5 5 of the VL template much more than the hydrogen bonds in inter block hydrogen bond patterns these have at most 4 hydrogen bonds Therefore it is more likely that we will find a unique match when searching for an intra block pattern Moreover for the intra block hydrogen bond patterns formed between the main chain C O groups and N H groups of the same block since we only need to search the hydrogen bond patterns in the main diagonal line from the N terminal to the C terminal it is faster to search for them than inter block hydrogen bond patterns between two different blocks Therefore we search for the intra block hydrogen bond patterns at first Searching areas for other blocks are adjusted after finding this kind of patterns Then we search for inter block hydrogen bond patterns 2 4 1 Searching for intra block hydrogen bond patterns Since hydrogen bonds usually occur between two antiparallel strands in these templates most of the hydrogen bond patterns are in a line from the lower left to the upper right perpendicular to the main diagonal as the one shown in Figure 2 6 Sometimes there are more than two antiparallel strands involved to form one hydrogen bond pattern for example there are three antiparallel strands involved in Pattern 4 4 in the VL template which gives hydrogen bonds in two lines In some cases hydrogen bonds form between two parallel strands which also gives hydrogen bonds in one line from the upper
25. black strands upwards we can put all the nine strands on the same plane with the four black stands above the five white ones to show us the simplified structure of the VH domain Figure 1 2 The connectivity of strands in light chain variable domain VL domain and T cell receptor TCR and domains is similar IB HI H2 N Figure 1 2 Simplified structure of a VH domain CHAPTER INTRODUCTION 1 3 2 Numbering Scheme So far numbering schemes adopted in the Protein Data Bank include the Kabat Numbering Scheme the Chothia Numbering Scheme and the Consecutive Numbering Scheme This variety of numbering schemes makes it difficult to find structurally equivalent positions in different domains The Kabat Numbering Scheme The Kabat Numbering Scheme developed from multiple sequence alignments is the widely adopted standard for numbering the residues in an antibody in a consistent manner However the insertion positions in the complementarity determining regions CDR may not match the structural insertion positions The Chothia Numbering Scheme The Chothia Numbering Scheme is identical to the Kabat Numbering Scheme but places the insertions in CDR regions at the structurally correct positions However some confusion may arise because the Kabat Numbering Scheme is so widely used Consecutive Numbering Scheme Sometimes the Consecutive Numbering Scheme is adopted in PDB file to assi
26. cheme The corresponding position codes in other variable domains can be found easily from Figure 3 3 19 18 75 82 18 19 TITO 63 66 4 4 25 24 69 76 68 73 3 4 24 25 71 73 68 71 97 102 50 51 105 107 107 112 115 17 Figure 3 3 Improved plane of the P sheet framework conserved in VL VH TCR and TCR domains Table 3 2 Different hydrogen bonds between the new immunoglobulin variable domain template and the old one C O N H New template Old template 73 22 31 93 i 93 31 Je Has this hydrogen bond Does not have this hydrogen bond 32 CHAPTER 3 RESULTS AND DISCUSSION Although VL VH TCR a and f variable domains share the common core pattern shown in Figure 3 3 they have different position codes at structurally equivalent positions even using the same numbering scheme e g the Chothia Numbering Scheme In this project a new numbering scheme is proposed to help the process of finding structurally equivalent positions automatically Table 3 3 lists the new codes at the corresponding positions using the Chothia Numbering Scheme in VL VH TCR a and p variable domains Using the new code proposed here block 1 begins with position 100 1 from its N terminal The first half of those residues between two consecutive blocks are numbered consecutively after the former block while the second half are numbered consecutively before the later bloc
27. d Fothergill J E 1994 Combining Computation with Database Access in Biomolecular Computing In Litwin W and Risch T editors Applications of Databases Proceedings of the First International Conference pages 317 335 Springer Verlag Murzin A Brenner S Hubbard T and Chothia C 1995 SCOP A Structural Classification of Proteins Database for the Investigation of Sequences and Structures J Mol Biol 247 536 540 39 APPENDIX A MAINTENANCE MANUAL Appendix A Maintenance Manual A 1 Project files Figure A 1 lists the files and directories in the root directory All C source codes are stored in directory src shown is Figure A 2 compiled file search in directory bin shown in Figure A 3 the perl file and all the text files produced by C programs here are stored in directory Data shown in Figure A 4 There are also two shell files runperl sh and install sh in the root directory to help to run these perl and C programs The directory of tcl stores the tcl tk source code all the txt files needed in this program and the produced file temp ps shown in Figure A 5 gt 1s 1 total 18 drwxr xr x 2 mdlhuihu cthstud 1024 Jan 12 12 32 Data W I I 1 mdlhuihu cthstud 2212 Dec 31 01 14 Makefile drwxr xr x 2 mdlhuihu cthstud 512 Jan 12 12 32 bin rWXY Xr x 1 mdlhuihu cthstud 31 Dec 31 01 15 install sh rwWXr Xr x 1 mdlhuihu cthstud 27 Dec 31 01 15 runperl sh drwxr xr x 2 mdlhuihu cthstud
28. e ending points of Block 4 and Block 3 24 CHAPTER 2 METHODS R c float Hbondend Hbondstart distance 1 Blocklength Calculate the number of blocks divided in k int floor c l the examined chain at each round for i 0 i lt k i t if pattern 1 amp amp Hbond 0 Hbond 2 lt block_distance break count1 0 start Blocklength i distance Hbondstart end start Blocklength for x start x lt end x amp for y start y lt end y Record the total number of hydrogen bonds of within each individual block count1 count1 1 of the examined chain if x chain size amp amp yschain size amp amp Hbondno x y 71 if count1 gt SumHbond Hbond_less count2 0 for q 0 q lt SumHbond q xpos start Tempposition sum q 0 Blockstart ypos start Tempposition sum q 1 Blockstart E if xpos lt chain_size amp amp ypos lt chain_size amp amp Hbondno xpos ypos 1 corresponding hydrogen bonds in squares of the examined domain with Calculate the number of count2 count2 1 enough hydrogen bonds inside if count2 gt SumHbond Hbond_less Hbondstart start if Hbondstart Hbond 2 lt block_distance pattern 1 for I 0 l lt SumHbond xpos start Tempposition sum 1 0 Blockstart ypos start Tempposition sum 1 Blockstart if xpos lt chain_size amp amp ypos lt chain_size amp amp position xpos position xpos 1 printf d d Y d c 96d d n Tempposition
29. ed chain is scFv please check N terminal at first The is sta starting point to search the domain after the N terminal domain dtt the next residue after th nding point of the N terminal domain Starting point is 0 xamined chain templat Pattern 4 4 33 50 35 47 35 48 37 45 39 42 45 37 47 55 48 35 PRPRPRPRPRPRO Figure B 3 Typescript of running the search program using command bin search 47 APPENDIX B USER MANUAL 4 104 1 6 102 1 99 1 97 1 90 1 24 1 DY ed sum is 12 49 53 49 53 49 53 Pattern 5 5 61 76 61 63 74 63 65 72 65 67 70 67 70 67 70 72 65 72 74 63 74 Pattern 3 5 19 75 19 21 73 21 23 71 23 Pattern 5 3 71 23 71 73 21 73 75 19 75 Pattern 4 6 34 89 34 36 87 36 38 85 38 Pattern 6 4 85 38 85 87 36 87 89 34 89 Pattern 6 7 84 104 8 86 102 8 88 99 88 90 97 90 Pattern 7 6 97 90 97 102 86 1 Pattern 2 7 9 103 9 11 105 1 13 107 1 Pattern 7 2 103 11 1 105 13 1 Pattern 1 3 5 24 5 Pattern 3 1 24 5 24 count is 12 This is VL domain This VL domain terminates at 106th residue Figure B 3 Typescript of runnning the search program using command bin search continued All these variable domain templates can be drawn one by one using command tcl v seq tcl Figure B
30. er 2 we shall introduce how to find the hydrogen bond patterns of the new structures and how to identify and classify these structures CHAPTER 2 METHODS Chapter 2 Methods In this chapter the methods to identify and classify immunoglobulin variable domains are described Figure 2 1 shows the whole procedure for identifying and classifying these domains The main depository of immunoglobulin structure data is the Protein Data Bank Bernstein et al 1977 In order to find all main chain hydrogen bonds we need to know the coordinates of all main chain N H C and O atoms Atom records in these PDB files contain the x y and z coordinates of all of the heavy atoms N C O S etc but do not contain coordinates for hydrogen atoms We need to get information of N C and O positions in main chain C O and N H group and C positions from PDB files and calculate positions of hydrogen bonded to main chain N Then we find all the possible hydrogen bonds between each pair of main chain C O group and N H group of the examined chain Section 2 2 Then we identify framework blocks Section 2 3 corresponding to strands of each of the templates shown in Figures 1 3 1 4 and 1 5 Finally we compare the hydrogen bond pattern of the examined chain to that of the template if they are similar we assert that the examined chain matches the template Otherwise it doesn t match CHAPTER 2 METHODS Read N C O positions in main ch
31. ern txt total number of blocks of the template into Data SumBlock txt the starting point and the length of each block of the template into Data Blockstart txt and Data Blocklength txt respectively ReadPatterntemp c After running this file all the hydrogen bond positions of the corresponding template are recorded into file Data Hbondtemplate txt and the total number of the hydrogen bonds in the template is recorded into Data HbondTempSize txt ReadHbond c This file records the hydrogen bond pattern of the examined chain into file Data Hbondno txt getHbondScope c This file initializes the search scope for each block in the examined chain getSum c Running this file gives us the total number of hydrogen bonds of the template before Pattern 1i j These include all the Patterns between main chain C O group of Block k when k lt i and main chain N H group of either block and the Patterns between main chain C O group of Block i and main chain N H group of Block r when r lt j SearchDiagonalPattern c Those hydrogen bonds within one block can be searched after running this file At the same time the search scope of other blocks is adjusted sortHbond c As for each Pattern i j when i j hydrogen bonds exist in Pattern j i only when there are some hydrogen bonds in Pattern i j in order to increase the probability of getting one match for each Pattern 1 the total number of hyd
32. ework conserved in VL VH TCR a and TCR variable domains In this figure four numbers are shown at the ends of each framework region The upper pair M M are the VL and VH sequence position codes proposed in Chothia and Lesk 1987 the lower pair N N are the TCR a and B domain sequence position codes proposed in Chothia et al 1988 The new immunoglobulin variable domain template was developed in two steps First we compared the four templates including the VL VH TCR and templates When we superposed them we found a common core pattern based on conserved main chain hydrogen bonds Based on our initial tests by comparing known VL VH TCR a and variable domains to the old template proposed in Chothia et al 1988 we then modified the pattern first by removing one main chain hydrogen bond and adding two more The reason to remove the hydrogen bond represented by dotted line in Figure 3 3 is because most of TCR q domains do not have this hydrogen bond and the results of searching for other kinds of variable domains are not affected after removing this one In Figure 3 3 two hydrogen bonds represented by dashed lines are added in order to make automatic identification of variable domain structures more reliable Table 3 2 31 CHAPTER 3 RESULTS AND DISCUSSION summarises the difference between the new template and the old one The position codes in Table 3 2 are TCR a position codes using the Chothia Numbering S
33. gn consecutive numbers to the residues starting with residue number one at the N terminal of the chain and numbering residues according to their actual positions within the protein chain 1 3 3 Variable domain templates Earlier structural studies of antibody VH and VL variable domains Chothia and Lesk 1987 and T cell receptor and domains Chothia et al 1988 have identified template P sheet framework patterns based on conserved main chain hydrogen bond patterns Figures 1 3 and 1 4 show a schematic representation of four different kinds of sheet framework VL VH TCR and variable domains based on those earlier Adapted from descriptions of antibody residue numbering schemes by Andrew Martin http www rubic rdg ac uk abs CHAPTER INTRODUCTION studies Figure 1 5 shows the common schematic representation for all these kinds of domains In each figure one number is shown at each end of framework region which is the position index using the Chothia Numbering Scheme In these figures circles represent amino acid residue positions Those circles with C or W inside indicate that the residue at these positions is usually Cysteine or Tryptophan Thick black lines join residues that are adjacent in the sequence and thin blue lines represent main chain hydrogen bonds In Figure 1 3 the red lines represent an antibody s complementarity determining regions The consecutive residues in one line connected by thick b
34. hBonds TA txt hBonds TB txt hBonds TBnew txt hBonds_V txt hBonds VH txt hBonds VL txt hBonds Vnew txt Each pair of codes in one parenthesis in these files refer to one hydrogen bond formed between the main chain N H group of the first residue and the main chain C 0 group of the second one The positions codes here are also codes using the Chothia Numbering Scheme except for those codes in files hBonds V txt and hBonds Vnew txt where position codes of strand i begins with 100 1 TBCodesnew txt VCodesnew txt VhCodes txt VICodes txt 53 APPENDIX B USER MANUAL These files contain both the position codes within the conserved framework and part or all of the position codes outside of the conserved framework 54
35. imilarly VH and TCR domains can also be classified correctly using the VH and TCR q templates respectively However the TCR domain and immunoglobulin variable domain templates did not work well at first Therefore these two templates were improved Section 3 1 1 and Section 3 1 2 before being used to classify the corresponding domains 3 1 1 New TCR domain template The new template of the conserved main chain hydrogen bond pattern in TCR P domain is shown in Figure 3 2 This improved template was obtained based on the results of our initial tests by comparing known TCR domains to the TCR domain template shown in Figure 1 4 During this step we found that there is usually false match with Pattern 4 6 and Pattern 6 4 because in most cases there is one pair of hydrogen bonds between residues 31 and 95 which makes the found Block 4 two residues before and the found Block 6 two residues after the corresponding blocks After adding this pair of hydrogen bonds between residues 31 and 95 the match is reliable Compared to the old one seven more hydrogen bonds were added in the new template In this figure the seven dashed lines represent the seven hydrogen bonds added in the new template These added main chain hydrogen bonds are also listed in Table 3 1 Here the position codes follow the Chothia Numbering Scheme The lengths of blocks were also adjusted to include these added hydrogen bonds in order to improve
36. in 180 C NC IN C NS in 180 7C N H b 7 i 1 sin i l i 2 4 sin 180 7C N C IN C Therefore the positions of the hydrogen atom bonded to each main chain nitrogen atom should be Hx Nx ta Cx Nx b Cox Nx 2 5 Hy Ny a Cy Ny b C y Ny 2 6 Hz Nz ta Cz Nz tb C z ENE 2 7 02 where ZC4 N H 118 ZC N C 122 ZC N H 120 IN H 21 02 A 2 2 Hydrogen bond pattern of examined chain Since the structural analysis to identify immunoglobulin variable domains is based on the conserved main chain hydrogen bonds the second step is to find all main chain hydrogen bonds by calculating the energy between the main chain C 0 group of residue i and the main chain N H group of residue j using the method described in Kabsch and Sander 1983 1 1 1 ef E qq oci 2 HON r CH r OH r C N 2 8 with electric charge on the main chain C O group q 0 42e and electric charge on the main chain N H group q 0 20e e is the unit electron charge and r AB is the inter atomic distance from A to B In chemical units r is in ngstroms the dimensional factor f 332 and E is in kcal mol A good hydrogen bond has about 3kcal mol http www pharmacology2000 com physics chemistry_physics physics23 htm http www cmbi kun nl gv service counting SET1 BNDLEH 13 CHAPTER 2 METHODS binding
37. ins The hydrogen bonds between the C O groups of Block i and N H groups of Block j is named as Pattern 1 j We call Pattern i j intra block hydrogen bond pattern if 1 is equal to j otherwise we call it inter block hydrogen bond pattern Tables 2 1 2 2 2 3 2 4 list the summary of blocks and hydrogen bonds conserved in VL and VH domains Chothia and Lesk 1987 shown in Figure 1 3 and in TCR and domains Chothia et al 1988 shown in Figure 1 4 All the residue numbers in these tables follow the Chothia Numbering Scheme CHAPTER 2 METHODS T 1 1 1 1 Log 1 1 T 1 I PAS 1 1 1 1 m 1 1 1 1 1 Spog 1 1 1 T HN 1 zs Is 1 os 1 6r 1 sr 1 tr or 1 sp ld ler vog 1 1 1 1 1 1 I 1 epog 1 1 1 I opaa b 1 ls peorg Ir cor 901 Sol Ol 01 TOT 101 OOl 66 86 16 ss 8 OL SL HL EL TL IL OL 69 9 99 9 t9 9 79 19 5 t ES 25 IS OS 6b Sh Lb OF Sb W tb ch Ip OF GE SC Le 9 se be EE c pz ez c WV 0 o fer c II o 6 p s Lol org OL T se EPIA sz epang cr goPpag 9 PrPag OD SUIPUIOP TA UI poA1osuoo spuoq UISOIP Y pue SADO JO Kreurumng Z oge 16 CHAPTER 2 METHODS 901g 1 Dag sl pag oog HN Pra epa 1 o apo 3 E 9 s pog t CH TI ol 60 7 I OS OF Lt OF Sb Vb tt ch Ip Op of St Le OF St Ft tt sc 6I sija 6 8 9 s v t SOL t
38. is project we have adapted and improved some of these templates so that these can also recognize some of the new structures that have been determined experimentally since the earlier templates were derived We have then used these templates to identify VH VL TCR V a and TCR V domains in Protein Data Bank files automatically by computing main chain hydrogen bonds within the structures and then matching these computed patterns against the template patterns The results show that the software developed here can help the process of finding structurally equivalent positions in different domains automatically II LIST OF CONTENTS List of Contents Acknowledeemehtsz rta tl I P uocare E II His of COBUCHES ses de ve dutem nei id ce m MEIN III ASCE el CI N PC uM M EE V LiSt oL BEASUEGS ook Seca zs e ine Nena atate ier Pa cede t rato tede a eee tbe e bas iun VI Chapter T Introduction aie A RON S a 1 I TAOBIGGH A ST 2 A OVOLvie We ose e Maa tags coc cel mee ated me ms 2 ABACO EE EER enda ceci e Cu te o ed ota eom dtu edu ed 2 1 3 1 Immunoglobulin variable domain id 2 13 2 Numb ring scheme A T br A a c dus e 4 1 3 3 Variable domain templates esc o RO a cod e E uae sed 4 Chapter 2 Methods En 9 2 1 Get positions of main chain C O and N H TOUDS iese esse es see see ee ee RA Ge Re ee 10 2 2 Hydrogen bond pattern of examined chain esse see see ee RA Ge Re Ge Re ee ee RA ee 13 2 3 Identify framework blocks DA az 14 2 4 Identify and classif
39. k Since VL and VH domains have one more block than TCR and f variable domains the corresponding positions are ignored in TCR and f variable domains Table 3 3 Proposed New Codes vs the codes using Chothia Numbering Scheme New Code VL VH TCR Q TCR fJ New Code VL VH TERG TCR B 100 4 4 3 4 501 54 59 101 5 5 4 5 502 55 60 102 6 6 5 6 600 63 68 63 66 200 10 9 9 10 601 64 69 64 67 201 1 10 10 1 602 65 70 65 68 202 12 1 1 12 603 66 71 66 69 203 13 12 12 13 604 67 72 67 70 300 19 18 18 19 605 68 73 68 71 301 20 19 19 20 700 69 76 71 73 302 21 20 20 21 701 70 TI 72 74 303 22 21 21 22 702 7 78 73 75 304 23 22 22 23 703 72 79 74 76 305 24 23 23 24 704 73 80 75 77 306 25 24 24 25 705 74 81 76 78 400 33 34 32 32 706 75 82 77 79 401 34 35 33 33 800 84 88 86 88 402 35 36 34 34 801 85 89 87 89 403 36 37 35 35 802 86 90 88 90 404 37 38 36 36 803 87 91 89 91 405 38 39 37 37 804 88 92 90 92 406 39 40 38 38 805 89 93 91 93 407 40 41 39 39 806 90 94 92 94 408 41 42 40 40 900 97 02 105 07 409 42 43 4l 41 901 98 03 106 08 410 43 44 42 42 902 99 04 107 09 Al m 45 43 43 903 100 05 108 10 412 45 46 44 44 904 101 06 109 1 413 46 47 45 45 905 102 07 110 12 414 47 48 46 46 906 103 08 111 13 415 48 49 47 47 907 104 09 112 14 416 49 50 48 48 908 105 10 113 15 417 50 51 49 49 909 106 1 114 16 500 53 58 910 107 12 115 17
40. lack lines correspond to one strand shown in Figure 1 2 except for the residues from position 9 to 13 in the VL domain and from position 8 to 12 in the VH domain In Figure 1 5 four numbers are shown at the ends of each framework region The upper pair M M are the VL and VH sequence position codes proposed in Chothia and Lesk 1987 the lower pair N N are the TCR a and f domain sequence position codes proposed in Chothia et al 1988 CHAPTER INTRODUCTION Figure 1 3 Plane of the f sheet framework conserved in VL left and VH right adapted from Chothia and Lesk 1987 domains CHAPTER INTRODUCTION Figure 1 4 Plane of the 2 sheet framework conserved in T cell receptor a left and 7 right variable domains adapted from Chothia et al 1988 CHAPTER INTRODUCTION 19 18 75 82 18 19 77 79 63 68 63 66 6 6 5 6 4 4 25 24 69 76 68 73 3 4 24 25 71 73 68 71 97 102 90 94 33 34 50 51 105 107 92 94 32 32 49 49 107 112 115 117 Figure 1 5 Plane of P sheet framework conserved in VL VH T cell receptor a and Pp variable domains adapted from Chothia et al 1988 If we have a new structure file and we want to identify to which family it belongs one possible approach would be to look for main chain hydrogen bond pattern in the new structure and then compare this pattern with the template hydrogen bond patterns shown in Figures 1 3 1 4 and 1 5 In Chapt
41. lane the peptide plane The relative positions of main chain atoms C N and H bonded to N of residue i main chain atom C of residue 1 1 are shown in Figure 2 3 H HA Al e R2 Pw C alpha Pd Wu COOH PO O alpha p 2 2 C alpha aa N terminal x z T C terminal Ri H eed H O Rs H Residue 1 TR Residue 2 Residue 3 Figure 2 2 The polypeptide chain 11 CHAPTER 2 METHODS Ci C alpha Ni Om SNC NS A yd Po set ue S NC alpha Hi Figure 2 3 The relative positions of main chain atoms C N and H bonded to N of residue 1 main chain atom C of residue 1 1 As shown in Figure 2 3 for any two consecutive residues i 1 and 1 the main chain atom C of residue i 1 the main chain atom C N and H bonded to N of residue i are on the same plane If we consider the distance from N to H of residue 1 the distance from N to C of residue i and the distance from N of residue 1 to the main chain atom C of residue i 1 as vectors they have the following relationship N H 2a NC b N C 2 1 where a and b are coefficients For the triangle shown in Figure 2 3 the sides and angles have the following relationship INGE _ la INC E bi IN C sin 180 C N C sin 180 ZC N H sin 180 ZC NEH i ai 2 2 The coefficients can be found as follows 12 CHAPTER 2 METHODS NH sin 180 4C N H a 2 3 s
42. n the directory bin Before running search getcoordinate pl should be run using the command runperl sh at first to get all the coordinates needed here gt runperl sh Please input PDB ID 12e8 Please input chain name if chain name is NULL please press Enter L Examined chain is L Figure B 1 Typescript of the getcoordinate pl program using command runperl sh After running getcoordinate pl the coordinates of main chain C O N and C are recorded into Data Ccoordinate txt Data Ocoordinate txt Data Ncoordinate txt and Data CAcoordinate txt respectively The calculated length of the examined chain is recorded into file Data Size txt Then compile the C files using command install sh and record the executable file search in the directory of bin if something changed in these files Figure B 2 Finally compare the hydrogen bond pattern of the examined chain to that of the template using command bin search In this case you need to choose the corresponding template by input 1 2 3 4 or 5 at first and then input the starting point to search Figure B 3 shows the typescript of the running program search It lists the hydrogen bonds in the template and the corresponding hydrogen bonds of the examined chain 0 or 1 in the last column shows whether a hydrogen bond is assigned at the corresponding position in the examined chain or not
43. ng an antibody structure database Kemp et al 1994 much more easy and convenient than before Generally immunoglobulin variable domains can be identified and classified reliably based on the conserved main chain hydrogen bonds although false positives exist among VH and TCR variable domains Since the structure of CDR regions has a high degree specificity false positives may be avoided by combining conserved main chain hydrogen bond pattern with the structure information of the different CDR regions In this way immunoglobulin variable domains can be identified better in the future and the existing antibody database can be updated automatically to include more immunoglobulin variable domains 38 REFERENCES References Bernstein F Koetzle T Williams G Mayer E Bruce M Rodgers J Kennard O Shimanouchi T and Tasummi M 1977 The Protein Data Bank a Computer Based Archival File for Macromolecular Structures J Mol Biol 112 535 542 Chothia C Boswell D R and Lesk A M 1988 The outline structure of the T cell alpha beta receptor EMBO J 7 3745 3755 Chothia C and Lesk A M 1987 Canonical Structures for the Hypervariable Regions of Immunoglobulins J Mol Biol 196 901 917 Kabsch W and Sander C 1983 Dictionary of Protein Secondary Structure Pattern Recognition of Hydrogen Bonded and Geometrical Features Biopolymers 22 1577 2637 Kemp G J L Jiao Z Gray P M D an
44. ock consecutively in the template Ccoordinate txt Ocoordinate txt Ncoordinate txt CAcoordinate txt These files records the positions of corresponding atoms with the X coordinates in the first column the Y coordinates in the second column and the Z coordinates in the third column Chainposition txt This file records the position index from the corresponding PDB file in one column Chainresidue txt 50 APPENDIX B USER MANUAL This file records the name of each residue from the corresponding PDB file in one column coordinates txt This file records all the positions of main chain C N O and C Each line begins with ATOM the second column is the series number the third column is the name of the atom the fourth column is the name of the residue it belongs to the fifth column is the name of the chain the sixth is the residue number followed by three dimensional coordinates in the seventh eighth and ninth columns CorrespondingChain txt The upper part of the file records the starting point of each block The lower part records the residue positions using different numbering schemes including the one used in the corresponding PDB file Chothia Numbering Scheme and the New code CorrespondingHbond txt This file records the positions of hydrogen bonds in the template in the first column the corresponding positions in the examined chain in the second column followed by the third column with lists whether the h
45. ocklength i j do search Pattern i 1 if Pattern 1 1 is found adjust the search scope for other Block k when k i count_pattern count_pattern 1 break do sort pairs of Pattern 1 j and Pattern j i when i j if all Pattern 1 1 are found for c 0 c lt count_pair c for Hbond_less1 0 Hbond_less1 lt n Hbond_less1 if Pattern i j and Pattern j i have been found break else for Hbond_less2 0 Hbond_less2 lt m Hbond_less2 for i 0 i lt Blocklength k i if Pattern 1 j and Pattern j i have been found break else for 0 lt Blocklength r j do search Pattern k r and Pattern r k if Pattern i j and Pattern j i have been found adjust the search scope of other blocks count pattern count pattern 2 break j j j j j j if all the Patterns are found The examined chain matches the template else The examined chain does not match the template 45 APPENDIX B USER MANUAL Appendix B User Manual B 1 Using the programs Before running this program it is better for the user to have the knowledge of the structure of immunoglobulin variable domains such as the structure of natural and artificial VL VH T cell receptor a and f variable domains It is also required to understand the format of PDB files in order to run this program This program can be used under Unix Linux The executable file search is i
46. oi e i s6 68 88 cs 18 08 GL SL LL corpo Oppo SUIPLIOP HA UI poAJosuoo SPUOQ UISOIP Y pue sxoo q Jo reurumng Z Z qL 17 CHAPTER 2 METHODS eerteegzpoztssss oor eo SIL TH XH ZH YI UT o 78 1808 Gr SL LL OL SL vL XL Ze 1 0 T a V i SOL 10i 901 SOL z6 i6 06 68 pag XL ioa DAR TOA T poa oD SUIBUIOPp 2 YO L UI possuo spuoq ussoIpAy pue syoojq Jo Kreurung c JOEL 18 CHAPTER 2 METHODS 19 T DIT ST FH ET OIL GOT SOT ZO TSOG d 0 EE IL 9L SL OW EL L Ww 68913993920 e s wi EA w uu SIL or st pr a a Gc Pea PAR oD sureuiop Y YL UI peA1esuoo spuoq UIZOIJPAY pue SADO Jo TBUIMUINS t c AB L CHAPTER 2 METHODS 2 4 Identify and classify immunoglobulin variable domains To identify and classify immunoglobulin variable domains we should compare the hydrogen bond pattern of the examined chain to the templates based on the conserved main chain hydrogen bonds Each template has been partitioned into several blocks a number of non empty Pattern i j in the template are obtained Tables 2 1 2 2 2 3 and 2 4 For example there are 12 patterns in VL domain template Table 2 1 including two intra block hydrogen bond patterns Pattern 4 4 and Pattern 5
47. oint of Block i After reducing the searching areas we continue searching for other patterns 25 CHAPTER 2 METHODS 2 4 2 Searching for inter block hydrogen bond patterns After all the non empty Pattern 1 1 of the template have been found we begin to search for Pattern i j between different blocks For each pair of Pattern i j and Pattern j 1 there are hydrogen bonds in Pattern j 1 only when hydrogen bonds exist in Pattern 1 j During this stage we search for Pattern i j and Pattern j i at the same time to increase the reliability of the results In this case we sort each pair of Pattern 1 5 and Pattern j 1 based on the total number of hydrogen bonds in these patterns in decreasing order to decide the order in which to search for these patterns This makes it more likely that we will find unique matches when searching for inter block patterns Since we have minimized the searching area for each block in order to search for Pattern i we only need to partition the area with the width the same as the searching area of Block i and the height the same as the searching area of Block j into smaller rectangles with the size same as Pattern 1 j Figure 2 8 shows the method used to search for Pattern i j and Pattem j i at the same time Here column distance and row distance are the numbers of residues ignored in searching area of Block column and Block row respectively Hbondstart column and Hbondstart r
48. onds in the TCR a variable domain template Figure 3 4 Superposed template of VH and TCR variable domain templates 36 CHAPTER 3 RESULTS AND DISCUSSION Table 3 5 Different hydrogen bonds between VH and TCR a templates C O N H VH template TCR a template 3 25 25 3 7 21 21 7 T 24 76 F 76 24 109 88 33 95 34 51 F 56 52 T 50 58 T 48 60 94 102 Has this hydrogen bond 37 Does not have this hydrogen bond CHAPTER 4 CONCLUSIONS Chapter 4 Conclusions In this project we have improved the TCR f and immunoglobulin variable domain templates based on conserved main chain hydrogen bond pattern to increase the reliability of these templates in identifying and classifying these variable domains The C program developed here can successfully identify immunoglobulin variable domains and classify VL and TCR variable domains although there is some misclassification between VH and TCR a variable domains because of the similarity of their structures Although it is still difficult to discriminate VH and TCR a variable domains only considering the conserved main chain hydrogen bond patterns the program developed in this project will help the process of finding residues at structurally equivalent positions from different variable domains automatically Therefore it will make the process of extendi
49. ordinate txt Data Ncoordinate txt and Data CAcoordinate txt respectively The position numbers and each residue are recorded into files Data Chainposition and Data Chainresidue respectively The calculated length of the examined chain is recorded into file Data Size txt patternsearch h This header file records all the functions included in this C program and defines all the constants used Size c Running this c file returns the length of the examined chain Hcoordinate c As no position of hydrogen bonded to main chain N is available from PDB file this file describes the method used to calculate the position of hydrogen bonded to main chain atom N getHbond c This file calculates all the positions of hydrogen bonded to main chain N at first then assign all the possible hydrogen bonds between main chain C O and N H groups and record these positions with assigned hydrogen bond into file Data Hbond txt and the total number of the hydrogen bonds into file Data HbondSize txt HbondtempTAlpha c HbondtempTBeta c HbondtempVL c HbondtempVH c HbondtempV c 42 APPENDIX A MAINTENANCE MANUAL Running these files records the hydrogen bond pattern of the corresponding template into file Data Hbondnotemp txt total number of hydrogen bonds between each pair of blocks of the template into Data SumHbond txt total number of patterns with hydrogen bonds inside into Data SumPatt
50. ow refer to the starting point of the searching area of Block column and Block row respectively 26 CHAPTER 2 METHODS m float Hbondend column Hbondstart column column_distance 1 Blocklength column Calculate the k int floor m 1 number of blocks n float Hbondend row Hbondstart row row_distance 1 Blocklength row to search in X and r int floor n 1 J Y directions for i 0 i lt k i Record the y n column start Blocklength column i column distance Hbondstart column searching area for column end column start Blocklength column J each possible for j 0 j lt r j Block column Record the row_start Blocklength row j row_distance Hbondstart row searching area for row_end row_start Blocklength row 3 each possible count 0 Block row for x column_start x lt column_end x N for y row_start y lt row_end y v Ps y s ye Count the total if x lt chain_size amp amp y lt chain_size amp amp Hbondno x y 1 number of gt hydrogen bonds in count count 1 each searched i rectangle u if count SumHbond column row Hbond less1 count1 0 for q 0 q lt SumHbond column row q Count the total xpos column start Tempposition sum1 q 0 Blockstart column number of hydrogen ypos row start Tempposition sum1 q 1 Blockstart row bonds at if xpos chain size amp amp ypos chain size amp amp Hbondno xpos ypos 71 d corresponding positions in
51. res can be used as the basis for modelling others following a comparative modelling strategy In an earlier project at the University of Aberdeen a database of antibody sequences and structures was constructed to support systematic structural studies and comparative modelling Kemp et al 1994 However it is not easy or convenient to maintain this database by adding more immunoglobulin variable domains since different numbering schemes are adopted in different structure files in the Protein Data Bank PDB Bernstein et al 1977 which is the main depository for protein structure data We cannot easily make sure that residues with the same position index from corresponding PDB file are the residues at structurally equivalent positions which contribute similar structure in different PDB files Different position indexes at structurally equivalent positions made the maintenance of the database slow and awkward since these positions had to be identified and recorded by hand CHAPTER INTRODUCTION 1 1 Objectives This project aimed to automate the task of finding residues at structurally equivalent positions in different immunoglobulin variable domains The main C program developed here can be used under Solaris 7 or any other Unix Linux system 1 2 Thesis overview The background of this project including immunoglobulin variable domains different numbering schemes used in PDB and the conserved main chain hydrogen bond patterns of
52. rogen bond pattern of the examined chain from the N terminal to the C terminal of the searching area for a specific template pattern a round We searched the whole hydrogen bond pattern of the examined chain for Pattern i 1 in the first round If no match is found we ignore one more residue of the examined chain from the N terminal for each round search the hydrogen bond pattern of the examined chain with one residue less for each round for Pattern i 1 until it is found If it is still not found until the total number of missed residue is up to the length of Block i we relax the hydrogen bond pattern Pattern 1 1 by allowing one hydrogen bond to be missed step by step and repeat the searching process If it is found we 23 CHAPTER 2 METHODS continue searching for other patterns Otherwise we give up and draw the conclusion that the examined chain does not match the template For example if we search the hydrogen bond pattern of the examined chain for Pattern 4 4 of the VL template we partition the pattern of the examined chain into small squares with the side of 23 residues long the same as the length of the Block 4 of the VL template from residue 0 of the N terminal to the C terminal then we compare Pattern 4 4 to those patterns on the diagonal line of the pattern of the examined chain from N terminal Since we cannot find the corresponding pattern in that of the examined chain in this round we ignore the first residue
53. rogen bonds in one 43 APPENDIX A MAINTENANCE MANUAL pair of Patterns of the corresponding template are calculated and sorted using this program before searching these patterns SearchPattern c Those hydrogen bond patterns between two different blocks are searched by running this file after those within one block have been found printHbond c By running this file the hydrogen bonds of the template and the corresponding hydrogen bonds in the examined chain are recorded into file Data CorrespondingHbond txt and the examined chain with different position indexes is recorded into file Data CorrespondingChain txt main c This is the main C file and all other files are called by running this file Makefile The file here aims to make the general configuration file The executable file obtained here is stored in the directory of bin Here the compiler GCC version 2 95 is used v_seq tel This program aims to draw the schematic representations of different kinds of P sheet frameworks including VL VH TCR c TCR and immunoglobulin variable domain templates Each time one figure is drawn and recorded into file temp ps 44 APPENDIX A MAINTENANCE MANUAL A 3 Pseudo code for the critical algorithm main c count_pattern 0 for 1 0 1 lt sum_block i if sumHbond i 1 0 for Hbond_less 0 Hbond_less lt n Hbond_less if Pattern 1 1 is found break else for G 0 j lt Bl
54. s of each block in the examined chain SumBlock txt This number lists the total number of blocks divided in the template SumHbond txt The matrix here lists the total number of hydrogen bonds between each pair of blocks of the template SumPattern txt 52 APPENDIX B USER MANUAL The number here lists the total number of patterns with hydrogen bonds inside in the template All the txt files needed to run tcl file v seq tcl are in the directory of tel frameworkGridXY TA txt frameworkGridXY_TB txt frameworkGridXY TBnew txt frameworkGridXY V txt frameworkGridXY VH txt frameworkGridXY VL txt frameworkGridXY Vnew txt These files record two dimensional coordinates of each residue in the conserved framework of the corresponding templates consecutively from the N terminal to the C terminal The numbers in the first column are the X coordinates of the corresponding residues while those in the second column are the Y coordinates of the corresponding residues frameworkPositions TA txt frameworkPositions TB txt frameworkPositions TBnew txt frameworkPositions V txt frameworkPositions VH txt frameworkPositions VL txt frameworkPositions Vnew txt The codes in these files are the position codes of the conserved frameworks using the Chothia Numbering Scheme except for those codes in files frameworkPositions V txt and frameworkPositions Vnew txt where position codes of strand i begins with 100 1
55. st of the VH domains could also match the TCR variable domain template and all of the TCR a variable domains matched the VH domain template In this case there is misclassification between these two kinds of variable domains The main reason is the similar structure of these two kinds of domains The superposed figure of VH and TCR a variable domain templates is shown in Figure 3 4 In this figure blue lines represent hydrogen bonds in the VH template while purple lines represent hydrogen bonds in the TCR a variable domain template From this figure we can see that the only difference between them is that there are 12 hydrogen bonds existing in the VH domain template but not in the TCR a 35 CHAPTER 3 RESULTS AND DISCUSSION variable domain template and one in the TCR a variable domain template but not in the VH domain template Table 3 5 lists the hydrogen bonds that are different between the VH variable domain template and the TCR variable domain template All the position codes in Table 3 5 are VH position codes using the Chothia Numbering Scheme However if we compare the structures of VH and TCR a variable domain templates to the VL and TCR 2 domain templates Figure 1 3 and Figure 1 4 different structures were found Therefore misclassification was found between VH and TCR variable domains but no misclassification between them and other variable domains Blue Hydrogen bonds in the VH template Purple Hydrogen b
56. y immunoglobulin variable domains sss 20 2 4 1 Searching for intra block hydrogen bond patterns sssssss 22 2 4 2 Searching for inter block hydrogen bond patterns sssssss 26 Chapter 3 Results and DISCUSSION si a n edet e me ed x dee add 28 SA Template ca A e us Presa n ate urba 28 3 1 1 New TCR 75 domain template AAA etu Saa dre n acount 29 3 1 2 New immunoglobulin variable domain template sss 31 3 2 Template VEA CA dd e 34 Chapter 4 Conclusions cd eh D cuo at te Satins Sean Saag a ated bea td sueta Sa ten 38 A TT MR 39 HI LIST OF CONTENTS Appendix A Maintenance Manuals AS 40 Pr a e de O A or Mt 40 ALD Description OF le si lts 42 A 3 Pseudo code for the critical algorithm main c esses 45 Appendix B User Manuals oo te alos uba de 40 AN RO 46 B 2 Format of data Tiles iN etse ied ie ine due p at RITU NI e a 50 IV LIST OF TABLES List of Tables Table 2 1 Summary of blocks and hydrogen bonds conserved in VL domain 16 Table 2 2 Summary of blocks and hydrogen bonds conserved in VH domain 17 Table 2 3 Summary of blocks and hydrogen bonds conserved in TCR domain 18 Table 2 4 Summary of blocks and hydrogen bonds conserved in TCR domain 19 Table 2 5 Recorded hydrogen bonds in VL template file Hbondtemplate txt
57. ydrogen bond is assigned or not in the corresponding positions by using 0 or 1 Data txt This file only copies the corresponding PDB file Hbond txt In this file the first column records the position of the residues which provide main chain C O groups to form hydrogen bonds while the second column records the position of the residues which provide main chain N H groups to form hydrogen bonds with the corresponding C O groups recorded in the first column Hbondnotemp txt 51 APPENDIX B USER MANUAL Here 0 and 1 are used to represent the hydrogen bond pattern of the template where 0 represents no hydrogen bond is assigned in the template while 1 represents hydrogen bond is assigned Hbondno txt Here 0 and 1 are used to represent the hydrogen bond pattern of the examined chain HbondSize txt The number here represents the total number of hydrogen bonds assigned in the examined chain Hbondtemplate txt This file records the positions of each pair of residues where hydrogen bond is assigned in the template the same way as in the file Hbond txt The only difference is this file records the positions one pattern after another HbondTempSize txt The one number here means the total number of hydrogen bonds in the template Size txt The one number here represents the length of the examined chain start txt This row of numbers lists the corresponding starting position

Structural Identification of Immunoglobulin Variable Domains

Contents

Download Pdf Manuals

Related Search

Related Contents