Home
        "An Introduction to Modeling Structure from Sequence". In: Current
         Contents
1.         Figure 5 7 12 Comparison of the  A  perspective and  B  orthographic projection modes  For  the color version of this figure go to http  www currentprotocols com                       Figure 5 7 13 Stereo image of the ubiquitin protein  Shown here with Cue Mode   Linear  Cue  Start   1 5  and Cue End   2 75  To view the stereo image  use the  wall eyed  method  hold the  page close to eyes  and shift the focus beyond the page until the two images overlap to form a  three dimensional object  If this is difficult  try scaling down the figure to a smaller size  This will  make viewing easier  For the color version of this figure go to http   www currentprotocols com     Rendering   By now  we have seen some techniques for producing nice views and representations  of the molecule loaded in VMD  Now  we will explore the use of the VMD built in  snapshot feature and external rendering programs to produce high quality images of your  molecule  The    snapshot    renderer saves the on screen image in the OpenGL window  and is adequate for use in presentations  movies  and small figures  When one desires  higher quality images  renderers such as Tachyon and POV Ray are better choices     19  Hide or delete all previous representations  and create the four new representations  listed in Table 5 7 7     Current Protocols in Bioinformatics    Table 5 7 7 Example Representations       Selection Coloring method Drawing style    Material       protein and not resid 72 to 76 Structur
2.      Diagonal     number of residues   Upper triangle     number of identical residues   Lower triangle       sequence identity  id min length      lb8pA  llbdmA  llcivA  25mdhA G27mdhA G21smkA Q2    1b8pA G1 327 194 147 151 153 49  lbdmA G1 61 318 152 167 155 56  lcivA  2 45 48 374 139 304 53  5mdhA  2 46 53 42 333 139 57   mdhA  2 47 49 87 42 39L 48  lsmkA  2 16 18 17 18 15 313    Weighted pair group average clustering based on a distance matrix     86 0600 73 4150 60 7700 48 1250 35 4800 22 8350 10 1900  79 7375 67 0925 54 4475 41 8025 29 1545 16 5125       beum  I   I   l    J   3   Qu         Q1     e1      2      2      2      2     4 55     8 l3     0000     5000    3750    0000     2500       Figure 5 6 6 Excerpts from the log file compare  10og        env   environ    aln   alignment env     mdl   model env  file  lbdm   model segment   FIRST A   LAST A     aln append model mdl  align codes  lbdmA   atom files  lbdm pdb    aln append file  TvLDH ali   align  codes  TvLDH      aln align2d    aln write file  TvLDH lbdmA ali   alignment format  PIR    aln write file  TvLDH lbdmA pap   alignment format  PAP                  Figure 5 6 7 The scriptfilea1ign2d py usedto align the target sequence against the template  structure     The MODELLER script shown in Figure 5 6 7 aligns the TvLDH sequence in file  TvLDH ali withthe Ibdm A structure in the PDB file 1bdm pdb  filealign2d  py    In the first line of the script  an empty alignment object a1n  and a new model object md
3.      Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 26       Supplement 24                      Figure 5 7 18 Ubiquitin in the VDW representation  colored according to the hydrophobicity of  its residues  For the color version of this figure go to http   www currentprotocols com     9  You will now change a physical property of the atoms to further illustrate the distri   bution of hydrophobic residues  In the Tk Console window type  crystal set  radius 1 0 to make all the atoms smaller and easier to see through  and then   sel set radius 1 5 to make atoms in the hydrophobic residues larger  The  radius field affects the way that some representations  e g   VDW  CPK  are drawn     You have now created a visual state that clearly distinguishes which parts of the protein  are hydrophobic and which are hydrophilic  If you have followed the instructions correctly   your protein should resemble Figure 5 7 18     Many times in studies of proteins  it is important to identify the locations of the hydrophobic  residues  as they often have a functional implication  The method you have just learned  is useful in this task  For example  you can easily see that in ubiquitin  the hydrophobic  residues are almost exclusively contained in the inner core of the protein  This is a typical  feature for small water soluble proteins  As the protein folds  the hydrophilic residues will  have a tendency to stay at the water interface  while the hydrophobic residues are pushed  toget
4.     As stated in the above section  it is difficult to place an upper boundary on the RMSD  threshold  stating unequivocally that above a certain limit the structures are no longer the  same  One should keep in mind however that RMSD is obviously not a linear repre     Current Protocols in Bioinformatics    sentation of similarity  In other words  when the RMSD between two structures is 1     instead of 2 A  they have not become twice as similar  Empirically  the authors tend not    to raise the RMSD threshold beyond 1 5        COMMENTARY    Background Information   Structural studies have so far shown that  membrane proteins fold into one of only two  topologies  B barrels or o helical bundles   Since a helical membrane proteins are far more  abundant  as well as pharmaceutically more  important  the following discussion will be re   stricted to this family    Predicting membrane protein structure is of  significant importance because  despite the  pharmaceutical importance that they possess   out of nearly 20 000 protein structures solved  using crystallographic or NMR methods  only  a few dozen are membrane proteins  This pau   city of experimentally solved structures is strik   ing  considering that according to a recent cen   sus of genomes  20  to 30  of all genes are  predicted to encode membrane proteins   Stevens and Arkin  2000     Knowledge based homology methods that  rely on structural information are difficult to  implement for membrane proteins  simply be   cau
5.     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 37       Supplement 24    BASIC  PROTOCOL 13    Using VMD  An  Introductory  Tutorial       5 7 38    Supplement 24    9     also be Selected by highlighting them  You can align only the molecules of your choice  by selecting Align Marked Sequences or Align Selected Sequences  depending if you  have marked or highlighted your molecules  This option is available for both structural  alignment and sequence alignment     The structure of spinach aquaporin is actually available  Tornroth Horsefield et al   2006    but now that you have learned how to import FASTA sequence data  you can compare the  sequences of proteins even if their structures are not resolved yet experimentally     When you finish comparing the sequence of spinach aquaporin with other aquaporins   delete it by clicking on spinach aqp and press delete or Backspace on your  keyboard     Creating a Phylogenetic Tree with MultiSeq    The Phylogenetic Tree feature in MultiSeq elucidates the structure based and or  sequence based relationships between different proteins  Structure based phylogenetic  trees can be constructed according to the RMSD or Q values between the molecules  after alignment  sequence based phylogenetic trees can be constructed according to the  percent identity or ClustalW values  Thompson et al   1994      1     Align the structures again by going to the MultiSeq window and selecting  Tools    Stamp Structural
6.     Surf for drawing method  Coloring Method     Molecule for coloring  method  and type protein in the Selected Atoms field  For this last representation   choose Transparent in the Material pull down menu  Fig  5 7 8C   This representation  shows the protein s volumetric surface in transparent     Note that you can select and modify different representations you have created by clicking  on a representation to highlight it in yellow  Also  each representation can be switched  on off by double clicking on it  To delete a representation  highlight it and then click on the  Delete Rep button  Fig  5 7 8B   At the end of this section  the Graphical Representations  window should look like Figure 5 7 6     Sequence viewer extension   When dealing with a protein for the first time  it is very useful to be able to find and  display different amino acids quickly  The sequence viewer extension allows viewing  of the protein sequence  as well as to easily pick and display one or more residues of  interest     37  In the VMD Main window  choose the Extension     Analysis     Sequence Viewer  menu item  A window  Fig  5 7 9A  with a list of the amino acids  Fig  5 7 9E  and  their properties  Figs  5 7 9B through 5 7 9C  will appear on the screen     38  With the mouse  try clicking on different residues in the list  Fig  5 7 9E  and see how  they are highlighted  In addition  the highlighted residue will appear in the OpenGL  Display window in yellow and rendered in the bond drawing m
7.     The result of atomselect is a function  Thus   crystal is now a function that  performs actions on the contents of the     a11    selection     Obtaining and changing molecule properties with text commands   After you have defined an atom selection  you have many commands that you can use  to operate on it  For example  you can use commands to learn about the properties of  your atom selection  number of atoms  coordinates  total charge  etc   You can also  use commands to change its coordinates and other properties  See VMD User s Guide   http   www ks uiuc edu Research vmd vmd 1 8 6 ug   for an extensive list of commands     3  Type  crystal numin the Tk Console window     Passing num to an atom selection returns the number of atoms in that selection  Check  that this number matches the number of atoms for your molecule displayed in the VMD  Main window     4  We can also use commands to move our molecule on the screen  You can use these  commands to change atom coordinates      crystal moveby  10 0 0   Scrystal move  transaxis x 40 degree     Editing properties of selected atoms   5  Open the Graphical Representation window by selecting Graphics      Representations    in the VMD Main window  Type in protein as the atom se   lection  change its Coloring Method to Beta and its Drawing Method to VDW  Your  molecule should now appear as a mostly red and blue assembly of spheres     The    B    field of a PDB file typically stores the    temperature factor    for a crystal
8.     i  lt   nf   incr i         Write out the frame number and update the selections to the current frame     puts   frame  i of S nf        Ssell frame  i   sel2 frame  i      Find the center of mass for each selection  com1 and com2 are position vectors      set coml  measure center  sell weight mass   set com2  measure center  sel2 weight mass       At each frame i  find the distance by subtracting one vector from the other     command vecsub  and computing the length of the resulting vector  command  veclegth   assign that value to an array element simdata  i r   and print  a frame distance entry to a file   set simdata  i r   veclength  vecsub  coml  com2    puts Soutfile       Si  simdata  i r              Close the file     close Soutfile    i  The second part of the script is for obtaining the distance distribution  It starts    from finding the maximum and minimum values of the distance   set rmin  simdata 0 r   set rmax  simdata 0 r   for  set i 0    i  lt   nf   incr i     set r tmp  simdata S i r   if   r tmp  lt   r min   set r min  r tmp   if   r tmp  gt   r max   set r max  r tmp          j  The step over the range of distances is chosen based on the number of bins N_d    defined in the beginning and all values for the elements of the distribution array  are set to zero    set dr  expr   r_max    rmin     N_d   1     for  set k 0    k  lt   N d   incr k      set distribution  k  0           The distribution is obtained by adding 1  incr      to an array element
9.    2002    EVA  Koh et al   2003    LIVEBENCH  Bujnicki et al   2001     http   www cryst bioc cam ac uk fugue  http   prodes toulouse inra fr multalin   http   www driveS  com muscle  http   www salilab org modeller   http    ffas ljcrf edulseal  http   www ch embnet org software TCoffee html    http   www hto usc edu software seqaln    http    www bmm icnet uk servers 3djigsaw   http   www tripos com  http llwww congenomics com   http   www molsoft com  http   trantor bioc columbia edulprograms jackall  http   www accelrys com  http   www salilab org modeller   http   www tripos com   http    dunbrack fccc edu SCWRL3 php  http   salilab org snpweb  http   www expasy org swissmod    http   www cmbi kun nl whatifl    http   protein bio puc cl cardex servers   http   urchin bmrb wisc edu  jurgen aqual   http    biotech embl heidelberg de  8400   http    www doe mbi ucla edul Services ERRAT   http   www biochem ucl ac uk  roman procheck procheck html  http   www came sbg ac at  http   www ucmb ulb ac be UCMB PROVE   http   www  ysbl york ac uk  oldfield squid   http   www doe mbi ucla edulServices Verify_3D   http   www cmbi kun nl gv whatcheck     http   cafasp bioinfo pl  http   predictioncenter lInl gov   http    capb dbi udel edu casa  http   cubic bioc columbia edu eva   http    bioinfo pl LiveBench        Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 6 3       Supplement 15    BASIC  PROTOCOL    Comparative  Protein Structure  Modeling Usi
10.    5 7 8       Supplement 24                            Figure 5 7 7 Graphical Representations window and the  A  Selections tab   B  list of Single   words   C  list of Keywords  and  D  Value box that displays possible choices for a given keyword     28  Change the current representation s Drawing Method to CPK and the Coloring  Method to ResName in the Draw style tab  In the screen  the different lysines and  glycines will be visible     29  In the Selected Atoms text field  entry type water  Choose Coloring Method      Name  The 58 water molecules present in the system now appear  in fact only their  oxygen atoms      30  In order to see which water molecules are closer to the protein  use the command  within  Type water and within 3 of protein for Selected Atoms in  the text field     This selects all the water molecules that are within a distance of 3A of the protein     31  Finally  try typing in the Selected Atoms field the selections shown in the first column  of Table 5 7 1  Each of these selections will show the protein or part of the protein as  explained in the second column of Table 5 7 1     Current Protocols in Bioinformatics    Table 5 7 1 Examples of Atom Selections          Selection Action   Protein Shows the protein  resid 1 The first residue    resid 1 76  and  not water  The first and last residues     resid 23 to 34  and  protein  The oc helix                      Figure 5 7 8 Multiple Representations of ubiquitin  Representations can be either created
11.    Kneller  D G   Langridge  R   and  Cohen  F E  1992  Taxonomy and conforma   tional analysis of loops in proteins  J  Mol  Biol   224 685 699     Ring  C S   Sun  E   McKerrow  J H   Lee  G K    Rosenthal  P J   Kuntz  I D   and Cohen  F E   1993  Structure based inhibitor design by us   ing protein models for the development of an   tiparasitic agents  Proc  Natl  Acad  Sci  U S A   90 3583 3587     Rost  B  1999  Twilight zone of protein sequence  alignments  Protein Eng  12 85 94     Rost  B  and Liu  J  2003  The PredictProtein server   Nucl  Acids Res  31 3300 3304     Rufino  S D   Donate  L E   Canard  L H   and  Blundell  T L  1997  Predicting the conforma   tional class of short and medium size loops    Current Protocols in Bioinformatics    connecting regular secondary structures  Appli   cation to comparative modelling  J  Mol  Biol   267 352 367     Rychlewski  L  and Fischer  D  2005  LiveBench 8   The large scale  continuous assessment of auto   mated protein structure prediction  Protein Sci   14 240 245     Rychlewski  L   Zhang  B   and Godzik  A  1998   Fold and function predictions for Mycoplasma  genitalium proteins  Fold Des  3 229 238     Sadreyev  R  and Grishin  N  2003  COMPASS  A  tool for comparison of multiple protein align   ments with assessment of statistical significance   J  Mol  Biol  326 317 336     Sali  A  and Blundell  T L  1993  Comparative pro   tein modelling by satisfaction of spatial re   straints  J  Mol  Biol  234 779 815     Sa
12.    Stenger  B   and  Gerstein  M  2002  GeneCensus  Genome com   parisons in terms of metabolic pathway activ   ity and protein family sharing  Nucl  Acids Res   30 4574 4582     Lindahl  E  and Elofsson  A  2000  Identification of  related proteins on family  superfamily and fold  level  J  Mol  Biol  295 613 625     Luthy  R   Bowie  J U   and Eisenberg  D  1992   Assessment of protein models with three   dimensional profiles  Nature 356 83 85     MacKerell  A D  Jr   Bashford  D   Bellott  M    Dunbrack  R L  Jr   Evanseck  J D   Field  M J    Fischer  S   Gao  J   Guo  H   Ha  S   Joseph   McCarthy  D   Kuchnir  L   Kuczera  K   Lau   ET K   Mattos  C   Michnick  S   Ngo  T    Nguyen  D T   Prodhom  B   Reiher  W E  II   Roux  B   Schlenkrich  M   Smith  J C   Stote  R    Straub  J   Watanabe  M   Wi  rkiewicz Kuczera   J   Yin  D   and Karplus  M  1998  All atom em   pirical potential for molecular modleing and dy   namics studies of proteins  J  Phys  Chem  B  102 3586 3616     Madhusudhan  M S   Marti Renom  M A    Sanchez  R   and Sali  A  2006  Variable  gap penalty for protein sequence structure  alignment  Protein Eng  Des  Sel  19 129 133     Current Protocols in Bioinformatics    Mallick  P   Weiss  R   and Eisenberg  D  2002  The  directional atomic solvation energy  An atom   based potential for the assignment of protein  sequences to known folds  Proc  Natl  Acad  Sci   U S A  99 16041 16046     Marti Renom  M A   Stuart  A C   Fiser  A    Sanchez  R   Melo  F
13.    The script  evaluate model py  Fig  5 6 10  evaluates the model with the DOPE  potential  In this script  sequence is first transferred  using append model       and then  the atomic coordinates of the PDB file are transferred  using transfer xyz    toa  model object  md1  This is necessary for MODELLER to correctly calculate the energy   and additionally allows for the possibility of the PDB file having atoms in a nonstandard  order  or having different subsets of atoms  e g   all atoms including hydrogens  while  MODELLER uses only heavy atoms  or vice versa   The DOPE energy is then calculated  using assess  dope     An energy profile is additionally requested  smoothed over a  15 residue window  and normalized by the number of restraints acting on each residue   This profile is written to a file TvLDH profile  which can be used as input to a  graphing program such as GNUPLOT     Similarly  evaluate model py calculates a profile for the template structure  A  comparison of the two profiles is shown in Figure 5 6 11  It can be seen that the DOPE  score profile shows clear differences between the two profiles for the long active site  loop between residues 90 and 100 and the long helices at the C terminal end of the target  sequence  This long loop interacts with region 220 to 250  which forms the other half ofthe  active site  This latter region is well resolved in both the template and the target structure   However  probably due to the unfavorable nonbonded interaction
14.    and Sali  A  2000  Com   parative protein structure modeling of genes and  genomes  Annu  Rev  Biophys  Biomol  Struct   29 291 325     Marti Renom  M A   Ilyin  V A   and Sali  A  2001   DBAli  A database of protein structure align   ments  Bioinformatics 17 746 747     Marti Renom  M A   Madhusudhan  M S   Fiser   A   Rost  B   and Sali  A  2002  Reliability of  assessment of protein structure prediction meth   ods  Structure  Camb  10 435 440     Marti Renom  M A   Madhusudhan  M S   and Sali   A  2004  Alignment of protein sequences by  their profiles  Protein Sci  13 1071 1087     Matsumoto  R   Sali  A   Ghildyal  N   Karplus   M   and Stevens  R L  1995  Packaging of pro   teases and proteoglycans in the granules of mast  cells and other hematopoietic cells  A cluster of  histidines on mouse mast cell protease 7 regu   lates its binding to heparin serglycin proteogly   cans  J  Biol  Chem  270 19524 19531     McGuffin  L J  and Jones  D T  2003  Improve   ment of the GenTHREADER method for ge   nomic fold recognition  Bioinformatics 19 874   881     McGuffin  L J   Bryson  K   and Jones  D T   2000  The PSIPRED protein structure predic   tion server  Bioinformatics 16 404 405     Melo  F  and Feytmans  E  1998  Assessing protein  structures with a non local atomic interaction  energy  J  Mol  Biol  277 1141 1152     Melo  F   Sanchez  R   and Sali  A  2002  Statisti   cal potentials for fold assessment  Protein Sci   11 430 448     Mezei  M  1998  Chameleon sequence
15.    http   www csb yale edu userguides datamanip chi html chi html   One can obtain the file  simply by contacting the authors and editing it manually with any text editor  chi param  contains exhaustive comments making the editing of the file self explanatory     To create a new parameter file from scratch  10a  Inthe CHI main menu on the left hand side of the CHI home page  Fig  5 3 2   click  on    Create setup        11a  In the first    Create setup    screen that appears  Fig  5 3 3   type the desired molecule name     For convenience the name of the molecule should be identical to the subdirectory name   e g   variantA      Current Protocols in Bioinformatics                   Figure 5 3 2 CHI main page                    Figure 5 3 3 CHI  Create setup  first screen     12a  Type the number of helices and choose the proper option between homo oligomer  false or true     13a  Click    Edit sequence     A new editing screen will appear  Fig  5 3 4      14a  Type the first residue number  then enter the sequence in one letter amino acid  format  see APPENDIX 1A      Note that the residue number is only important for the proper indexing of the sequence   and does not mean that the input sequence will be considered from that position     15a  Choose the orientation of the helix     If  true  was chosen for for homo oligomer on the previous screen  step 12a   than one  may choose either    up    or    down     as this option only describes the relative orientation  between he
16.   1 implies that structures  are identical  When Q has a low score  0 1 to 0 3   structures are not aligned well  i e    only a small fraction of Cy atoms superimpose  Along with RMSD and Percent Identity   these numbers tell you that the 1  qy and 1rc2 structures are pretty well aligned  You  can repeat the previous step to compare the alignment of other molecules  To unselect a  highlighted molecule  Ctrl click on it again  or command click on a Mac            untided  muliseg               Figure 5 7 22 The four aquaporins aligned according to their structural similarity  For the color  version of this figure go to http www currentprotocols com           eoe untitled  multiseq             Figure 5 7 23 Result of a structural alignment of the four aquaporins  colored by Qes  For the  color version of this figure go to htip  www currentprotocols com     Modeling  Structure from  Sequence       5 7 35    Current Protocols in Bioinformatics Supplement 24    BASIC  PROTOCOL 12    Using VMD  An  Introductory  Tutorial    5 7 36       Supplement 24    Coloring molecules according to structural identity   You can also color the molecules according to the value of Q per residue  Q      obtained  in the alignment  Qes is the contribution from each residue to the overall Q value of  aligned structures     13  In the MultiSeq window  choose View     Coloring     Qres     Look at the OpenGL window to see the impact this selection has made on the coloring of  the aligned molecules  Fig 
17.   19 91dtA X 1 331 85 301 93 304 207 26  0 10E 05  20 Trig X 1  321 64 239 53 234 164 26  0 20E 03  21 111dA X 1 303  13 242 9 233 216 31  0 31E 07  22  5mdhA x t 333 2 332 1 331 328 44  0 0  23 7mdhaA X 1 351 6 334 14 339 325 34  0 0  24 I1ml1 dA X 1 313 5 198 1 189 183 26  0 13E 05  25 10c4A X 1 315 5 191 4 186 174 28  0 18E 04  26 1ojuA X 1 294 78 320 68 285 218 28  0 43E 05  27 ipzgA X T 327 74 191 71 190 114 30  0 16E 06  28 ismkA X i 313 7 202 4 198 188 34  0 0  29 1sovA X i 316 481 256 76 248 160 27  0 93E 03  30 1y63A X i 289    777 191 58 167 109 33  0 32E 05             Figure 5 6 4 Anexcerptfrom the file build profile prf The aligned sequences have been removed for convenience     the parameter max  aln evalue is set to 0 01  indicating that only sequences with  E values smaller than or equal to 0 01 will be included in the output     Execute the script using the command mod8v2 build profile py  At the end  of the execution  a log file is created  build profile 1og  MODELLER always  produces a log file  Errors and warnings in log files can be found by searching for the   E   and _W gt  strings  respectively     Selecting a template    An extract  omitting the aligned sequences  from the file build profile prf is  shown in Figure 5 6 4  The first six commented lines indicate the input parameters used  in MODELLER to create the alignments  Subsequent lines correspond to the detected  similarities by prof  ile build    The most important columns in the output are the 
18.   313 interact ESTRADIOL RECEPTOR   313 interact STEROID HORMONE RECEPTOR ERR1  313 ESTROGEN RELATED RECEPTOR GAMMA  313 GLUCOCORTICOID RECEPTOR   313   313   313   313   313   313   313 browse interact ULTRASPIRACLE PROTEIN   313 browse interact ULTRASPIRACLE   313 browse interact RETINOIC ACID RECEPTOR GAMMA   313 browse interact RETINOID X RECEPTOR ALPHA   313 browse interact RETINOID X RECEPTOR   313 browse interact RETINOIC ACID RECEPTOR RXR ALPHA  313 browse interact PEROXISOME PROLIFERATOR ACTIVATED REC   313 browse interact RETINOIC ACID RECEPTOR RXR ALPHA  313 browse interact RXR RETINOID X RECEPTOR   313 browse interact ANDROGEN RECEPTOR   313 browse interact OESTROGEN RECEPTOR BETA   313 browse interact OXYSTEROLS RECEPTOR LXR BETA    H  H        0o  004460WN  P      D    1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060   1060  1060  1060  1060  1060  1060  1060  1060  1060  1060  1060  1060     BEpBPMIBI  BI  HBHPHiIIIIBIHIBH  BBmB    A    NEB BBB ENN NNN BBB BBB EBB EBB BEB BEBE pa pa pa pa pa pa ee    BON NNN p p pa eee pa pa pa pa pt pt pt   oUnuUNHBUNNI                   Figure 5 5 9 A large number of nuclear receptors belonging to the same fold class as estradiol  receptor  Where a sequence structure domain mapping is available  they have all been classified  into the same ADDA domain family  numbered 1060      sequence similarity in terms of percent identity  As can be seen from Figur
19.   81 2681 2692     Torres  J   Briggs  J A   and Arkin  I T  2002a  Con   tribution of energy values to the analysis of  global searching molecular dynamics simula   tions of transmembrane helical bundles  Bio   phys  J  82 3063 3071     Torres  J   Briggs  J A   and Arkin  I T  2002b  Con   vergence of experimental  computational and  evolutionary approaches predicts the presence of  a tetrameric form for CD3 zeta  J  Mol  Biol   316 375 384     Torres  J   Briggs  J A   and Arkin  I T  2002c  Mul   tiple site specific infrared dichroism of CD3   zeta  a transmembrane helix bundle  J  Mol  Biol   316 365 374     Treutlein  H R   Lemmon  M A   Engelman  D M    and Br  nger  A T  1992  The glycophorin A  transmembrane domain dimer  Sequence spe   cific propensity for a right handed supercoil of  helices  Biochemistry 31 12726 12732     Key References  Arkin et al   1994  See above     In this article  global searching molecular dynamics  simulation is used to find a model for phospholam   ban     Current Protocols in Bioinformatics    Adams et al   1995  See above     Here  the theory of global searching molecular dy   namics simulation is presented in detail     Briggs  J A G   Torres  J   Kukol  A   and Arkin  I T   2001  A new method to model membrane protein  structure based on silent amino acid substitu   tions  Proteins Struct  Funct  Genet  44 370 375     In this article  silent substitution modeling is intro   duced for the first time     Torres et al   2002a  See abov
20.   A   and Rost  B  2003  EVA  Evaluation  of protein structure prediction servers  Nucl   Acids Res  31 3311 3315     Krogh  A   Brown  M   Mian  I S   Sjolander  K   and  Haussler  D  1994  Hidden Markov models in  computational biology  Applications to protein  modeling  J  Mol  Biol  235 1501 1531     Laskowski  R A   MacArthur  M W   Moss  D S    and Thornton  J M  1993  PROCHECK  A pro   gram to check the stereochemical quality of pro   tein structures  J  Appl  Crystallogr  26 283 291     Laskowski  R A   Rullmannn  J A   MacArthur   M W   Kaptein  R   and Thornton  J M  1996   AQUA and PROCHECK NMR  Programs for  checking the quality of protein structures  solved by NMR  J  Biomol  NMR 8 4711   486     Laskowski  R A   MacArthur  M W   and Thornton   J M  1998  Validation of protein models de   rived from experiment  Curr  Opin  Struct  Biol   8 631 639     Lessel  U  and Schomburg  D  1994  Similarities  between protein 3 D structures  Protein Eng   7 1175 1187     Levitt  M  1992  Accurate modeling of protein  conformation by automatic segment matching   J  Mol  Biol  226 507 533     Li  R   Chen  X   Gong  B   Selzer  P M   Li  Z    Davidson  E   Kurzban  G   Miller  R E   Nuzum   E O   McKerrow  J H   Fletterick  R J   Gillmor   S A   Craik  C S   Kuntz  I D   Cohen  F E    and Kenyon  G L  1996  Structure based design  of parasitic protease inhibitors  Bioorg  Med   Chem  4 1421 1427     Lin  J   Qian  J   Greenbaum  D   Bertone  P   Das   R   Echols  N   Senes  A
21.   Acknowledgments   This tutorial is largely based on the follow   ing VMD tutorials  case studies  and user s  guides  We hence would like to thank these au   thors who have provided this tutorial its start   ing form    Jordi Cohen  Marcos Sotomayor  and Eliz   abeth Villa     VMD Molecular Graphics       Alek Aksimentiev  John Stone  David  Wells  and Marcos Sotomayor   VMD Images  and Movies Tutorial     Fatemeh Khalili  Elizabeth Villa  Yi Wang   Emad Tajkhorshid  Brijeet Dhaliwal  Zan  Luthey Schulten  John Stone  Dan Wright   and John Eargle     Aquaporins with the VMD  MultiSeq Tool     VMD has been developed by the Theoreti   cal and Computational Biophysics Group at  the University of Illinois and the Beckman  Institute  and is supported by funds from the  National Institutes of Health and the National  Science Foundation     Citing VMD   The development of VMD is funded by the  National Institute of Health  Proper citation is  a primary way in which we demonstrate the  value of our software to the scientific commu   nity  and is essential to continued NIH funding  for VMD  The authors request that all pub   lished work  that utilizes VMD include the  primary VMD citation at a minimum    Humphrey  W   Dalke  A  and Schulten   K    VMD   Visual Molecular Dynamics   J   Molec  Graphics  1996  vol  14  pp  33 38    Work that uses softwares or plugins incor   porated into VMD should also add the proper  citations for those tools  For example  work  that uses MultiSeq as
22.   Alexov  E   and Honig  B  2003  Us   ing multiple structure alignments  fast model  building  and energetic analysis in fold recogni   tion and homology modeling  Proteins 53 430   435     Pieper  U   Eswar  N   Braberg  H   Madhusudhan   M S   Davis  F P   Stuart  A C   Mirkovic  N    Rossi  A   Marti Renom  M A   Fiser  A   Webb   B   Greenblatt  D   Huang  C C   Ferrin  T E   and  Sali  A  2004  MODBASE  a database of anno   tated comparative protein structure models  and  associated resources  Nucl  Acids Res  32 D217   D222     Pieper  U   Eswar  N   Davis  F P   Braberg  H    Madhusudhan  M S   Rossi  A   Marti Renom   M   Karchin  R   Webb  B M   Eramian  D    Shen  M Y   Kelly  L   Melo  F   and Sali  A   2006  MODBASE  A database of annotated  comparative protein structure models and as   sociated resources  Nucl  Acids Res  34 D291   D295     Pietrokovski  S  1996  Searching databases of con   served sequence regions by aligning protein  multiple alignments  Nucl  Acids Res  24 3836   3845     Pontius  J   Richelle  J   and Wodak  S J  1996  Devi   ations from standard atomic volumes as a qual   ity measure for protein crystal structures  J  Mol   Biol  264 121 136     Que  X   Brinen  L S   Perkins  P   Herdman  S    Hirata  K   Torian  B E   Rubin  H   McKerrow   J H   and Reed  S L  2002  Cysteine proteinases  from distinct cellular compartments are re   cruited to phagocytic vesicles by Entamoeba his   tolytica  Mol  Biochem  Parasitol  119 23 32     Ring  C S
23.   Avww currentprotocols com     Current Protocols in Bioinformatics    Table 5 7 5 Example of a More  Transparent Material          Setting Value  Ambient 0 30  Diffuse 0 50  Specular 0 87  Shininess 0 85  Opacity 0 11       Table 5 7 6 Example of Representations Drawn with Different Materials          Selection Coloring method Drawing method Material  protein Structure NewCartoon Opaque  protein ColorID    8 white Surf Material 12       14  Hide all of the current representations and create the two representations listed in  Table 5 7 6     Depth perception   Since the molecular systems are three dimensional  VMD has multiple ways of repre   senting the third dimension  In this section  how to use VMD to enhance or hide depth  perception is discussed     15  The first thing to consider is the projection mode  In the VMD Main window  click the  Display menu  Here  we can choose either Perspective or Orthographic in the drop   down menu  Try switching between Perspective or Orthographic projection modes  and see the difference  Fig  5 7 12      In perspective mode  things closer to the camera appear larger  Perspective projection  provides strong size based visual depth cues  but the displayed image will not preserve  scale relationships or parallelism of lines  and objects very close to the camera may  appear distorted  Orthographic projection preserves scale and parallelism relationships  between objects in the displayed image  but greatly reduces depth perception  Hence   o
24.   Current Protocols in Bioinformatics       Molecule structure    name of the molecule varianta    number of helices  5        homooligomer     fuc O false          molecular structure information for helix 1  sequence   PME GLY GLY VAL ALA ALA LEU ILE LEU ILE  PHE VAL VAL SER THR TYR PHE GLY ALA ALA  ILE LEU    d More Lines 3    residue number at start of sequence 11  initial rotation offset around helix axis  0 0  direction of helix    up    down  initial translational offset for helix along the z axis  0 0    Search parameters    extent of the search      a full search will sample all pairwise interactions a symmetric search will limit the search to O fall  8 symmetric  symmetric pais    search left handed crossing angles  use O fabe       search right handed crossing angles  uus O false    type of molecular dynamics to use O  torsion cartesian    number of trials per structure k j   search parameters for helix 1  rotation start  degrees   0 0      rotation finish  degrees   360 0    rotational step size  degrees  10    Electrostic effects    value of dielectric constant 2    initial rotation and tilt  distance between centres of neighbouring helices   in Angstroms   10 4  left hand crossing angle wrt diad axis   in degrees 25 0    right hand crossing angle wrt diad axis   in degrees    25 0      Clustering parameters      cutoff for root mean square difference between two structures   Angstroms j1 25    minimum number of structures which define a cluster 9                
25.   D   Madhusudhan  M S   Fiser  A   Pazos  F    Valencia  A   Sali  A   and Rost  B  2001   EVA  Continuous automatic evaluation of pro   tein structure prediction servers  Bioinformatics  17 1242 1243     Felsenstein  J  1989  PHYLIP   Phylogeny Infer   ence Package  Version 3 2   Cladistics 5 164   166     Felts  A K   Gallicchio  E   Wallqvist  A   and Levy   R M  2002  Distinguishing native conformations  of proteins from decoys with an effective free  energy estimator based on the OPLS all atom  force field and the surface generalized born sol   vent model  Proteins 48 404 422     Fernandez Fuentes  N   Oliva  B   and Fiser  A   2006  A supersecondary structure library and  search algorithm for modeling loops in protein  structures  Nucl  Acids Res  34 2085 2097     Fidelis  K   Stern  P S   Bacon  D   and Moult   J  1994  Comparison of systematic search and  database methods for constructing segments of  protein structure  Protein Eng  7 953 960     Fine  R M   Wang  H   Shenkin  P S   Yarmush   D L   and Levinthal  C  1986  Predicting anti   body hypervariable loop conformations  II  Min   imization and molecular dynamics studies of  MCPC603 from many randomly generated loop  conformations  Proteins 1 342 362     Fischer  D  2006  Servers for protein structure pre   diction  Curr  Opin  Struct  Biol  16 178 182     Fischer  D   Elofsson  A   Rychlewski  L   Pazos   E  Valencia  A   Rost  B   Ortiz  A R   and  Dunbrack  R L  Jr   2001  CAFASP2  The sec   ond critical as
26.   Fortunately  when beginning to explore  the capabilities and possibilities of molecular  graphics  there is a rich tradition to build upon   As with other artistic techniques  a good way  to choose an approach for a particular appli   cation is by example  A number of reviews  are available  Goodsell  2003  2005  Olson  and Goodsell  1992a b  Richardson  1992  to  provide an overview of approaches and tech     Current Protocols in Bioinformatics    niques  It is also highly instructive to browse  through a few issues of Science  Nature  or  Structure  and look for figures that are partic   ularly effective  This is a good way to preview  the capabilities of different programs before  investing the necessary time to master them   But most important  have fun and explore the  many possibilities while developing an indi   vidual graphical style     Literature Cited   Goodsell  D S  2003  Looking at molecules  An es   say on art and science  ChemBioChem 4 1293   1298     Goodsell  D S  2005  Visual methods from atoms to  cells  Structure 13 347 354     Olson  A J  and Goodsell  D S  1992a  Macromolec   ular graphics  Curr  Opin  Struct  Biol  2 193   201     Olson  A J  and Goodsell  D S  1992b  Visualizing  biological molecules  Sci  Am  267 76 81     Richardson  J S  1992  Looking at proteins  Rep   resentations  folding  packing  and design  Bio   phys  J  63 1186 1209     Internet Resources  http   www rcsb org pdb    Web site for the Protein Data Bank  PDB    http   www rc
27.   R   Wu  C H   Barker  W C    Boeckmann  B   Ferro  S   Gasteiger  E   Huang   H   Lopez  R   Magrane  M   Martin  M J    Natale  D A   O Donovan  C   Redaschi  N   and  Yeh  L S  2005  The Universal Protein Resource   UniProt   Nucl  Acids Res  33 D154 D159     Baker  D  and Sali  A  2001  Protein structure pre   diction and structural genomics  Science 294 93   96     Barton  G J  and Sternberg  M J  1987  A strategy  for the rapid multiple alignment of protein se   quences  Confidence levels from tertiary struc   ture comparisons  J  Mol  Biol  198 327 337     Bateman  A   Coin  L   Durbin  R   Finn  R D    Hollich  V   Griffiths Jones  S   Khanna  A    Marshall  M   Moxon  S   Sonnhammer  E L    Studholme  D J   Yeats  C   and Eddy  S R  2004   The Pfam protein families database  Nucl  Acids  Res  32 D138 D141     Bates  P A   Kelley  L A   MacCallum  R M   and  Sternberg  M J  2001  Enhancement of protein  modeling by human intervention in applying  the automatic programs 3D JIGSAW and 3D   PSSM  Proteins 5 39 46     Benson  D A   Karsch Mizrachi  I   Lipman  D J    Ostell  J   and Wheeler  D L  2005  GenBank   Nucl  Acids Res  33 D34 D38     Blundell  T L   Sibanda  B L   Sternberg  M J   and  Thornton  J M  1987  Knowledge based predic   tion of protein structures and the design of novel  molecules  Nature 326 347 352     Boeckmann  B   Bairoch  A   Apweiler  R   Blatter   M C   Estreicher  A   Gasteiger  E   Martin  M J    Michoud  K   O Donovan  C   Phan  I   Pilbou
28.   RasMol gt  wireframe 150  This represents the heme with a thick wireframe  values go from 1  thin  to 500  thick    RasMol gt  select iron  This selects the iron ion     RasMol gt  cpk 150       This represents the iron as a sphere  The command cpk  which represents atoms as  spheres  refers to the plastic Corey Pauling Koltun models used for building small organic  molecules  which were the first models that used a spacefilling representation  The units  used by RasMol are integers that correspond to 1 250th of an Angstrom  A      The display should look like Figure 5 4 2  The protein is displayed with a wireframe        colored by the atom type  and thicker bonds are used to make the heme group more Modeling   apparent Structure from   i Sequence  5 4 3    Current Protocols in Bioinformatics Supplement 11    Representing  Structural  Information with  RasMol    5 4 4       Supplement 11             r  v 2HHB      xX                   Figure 5 4 2 Hemoglobin with the heme groups in thick bonds and the iron ions shown as small  spheres     c  Rotate the display and notice the following   1  Individual amino acids may be  identified from their shape and chemical composition  For instance  look for  aromatic amino acids while rotating the structure   2  The overall conformation  of the backbone is difficult to comprehend  Wireframe images often look like a  tangle of atoms  not a folded chain   3  Zoom the molecule to higher magnification  and notice that the wireframe works 
29.   Structure determina   tion of turkey egg white lysozyme using Laue  diffraction data  Acta Crystallogr  B 48 200   207     Jacobson  M P   Pincus  D L   Rapp  C S   Day  T J    Honig  B   Shaw  D E   and Friesner  R A  2004   A hierarchical approach to all atom protein loop  prediction  Proteins 55 351 367     Jaroszewski  L   Rychlewski  L   Li  Z   Li  W   and  Godzik  A  2005  FFAS03  A server for profile     profile sequence alignments  Nucl  Acids Res   33 W284 W288     John  B  and Sali  A  2003  Comparative pro   tein structure modeling by iterative alignment   model building and model assessment  Nucl   Acids Res  31 3982 3992     Jones  D T  1999  GenTHREADER  An efficient  and reliable protein fold recognition method  for genomic sequences  J  Mol  Biol  287 797   815     Jones  D T  2001  Evaluating the potential of us   ing fold recognition models for molecular re   placement  Acta Crystallogr  D Biol  Crystal   logr  57 1428 1434     Jones  D T   Taylor  W R   and Thornton  J M  1992   A new approach to protein fold recognition  Na   ture 358 86 89     Jones  T  A  and Thirup  S  1986  Using known sub   structures in protein model building and crystal   lography  Embo J  5 819 822     Kabsch  W  and Sander  C  1984  On the use of se   quence homologies to predict protein structure   Identical pentapeptides can have completely dif   ferent conformations  Proc  Natl  Acad  Sci   U S A  81 1075 1078     Kahsay  R Y   Wang  G   Dongre  N   Gao  G   and  Dunbrack  R
30.   The input script  file for the command is shown in Figure 5 6 3     The script  build profile py  does the following     1  Initializes the    environment    for this modeling run by creating a new environ  object  called env here   Almost all MODELLER scripts require this step  as the  new object is needed to build most other useful objects     2  Creates a new sequence db object  calling it sdb  which is used to contain large  databases of protein sequences     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 6 5       Supplement 15       env    sdb    aln    prf    prf    prf    aln       log     sdb     sdb     sdb     aln      build sdb  matrix offset  450  rr_file  S LIB  blosum62 sim mat       write file  build profile prf      aln     verbose      environ        sequence db env   read seq database filez pdb 95 pir   seq database format  PIR    chains list  ALL   minmax db seg len  30  4000   clean sequences True        write seq database file  pdb 95 bin   seq database format  BINARY    chains list  ALL      read  seq_dcitabase_file  pdb_95 bin   seq database format  BINARY    chains list  ALL           alignment  env   append file  TvLDH ali   alignment format  PIR   align codes  ALL        aln to profile      gap penalties ld   500   50   n prof iterations 1   check profile False  max aln evalue z0 01       prf to alignment      write file z build profile ali   alignment format  PIR            Figure 5 6 3 File build profile  py  Inpu
31.   major protein family within a decade  This  wealth of data needs to be organized and cor   related using automated methods  Nearly all  proteins have structural similarities to other  proteins  General similarities arise from prin   ciples of physics and chemistry that limit the  number of ways in which a polypeptide chain  can fold into a compact globule  Evolutionary  relationships result in surprising similarities   which are even stronger than similarity due to  convergence caused by physical principles    Because structure tends to diverge more con   servatively than sequence during evolution   structure alignment is a more powerful method  than pairwise sequence alignment for detect   ing homology and aligning the sequences of  distantly related proteins  In favorable cases   comparing 3D structures may reveal biolog   ically interesting similarities that are not de   tectable by comparing sequences and may help  to infer functional properties of hypothetical  proteins    Automatic methods enable exhaustive all   against all structure comparisons  As a result   each structure in the PDB can be represented  as a node in a graph where similar structures  are neighbors of each other and structurally  unrelated proteins are not neighbors  Cluster   ing the graph at different levels of granular   ity removes redundancy and aids navigation in  protein space  At long range  the overall dis   tribution of folds is dominated by secondary  structure composition  e g   all alpha
32.   or color blue  These tend to be saturated colors  however  which rapidly become  confusing in complex pictures  For instance  the pictures of hemoglobin shown in the  figures illustrating the previous protocols use the default chain colors  which are all  bright primary and secondary colors  Saturated colors compete with each other on the  screen and often confuse the perception of the relative depth of different portions of  the molecule  It is possible to use custom colors to design a picture that minimizes  these artifacts and focuses more attention on the functional details  Pastel colors are  often easier to read  and they do not compete with each other in the display  RasMol  does not contain a graphical color browser  but it does allow the user to design custom  colors     1  Restart RasMol with the file 2hhb  and in the Command Line window type the  following series of commands     Current Protocols in Bioinformatics    BASIC  PROTOCOL 3    Modeling  Structure from  Sequence    5 4 17       Supplement 11    Representing  Structural  Information with  RasMol    5 4 18       Supplement 11       RasMol gt  select protein or ligand  RasMol gt  cpk  RasMol gt  select   A    This selects all atoms in chain A   RasMol gt  color  100 100 255   This colors the chain light blue   RasMol gt  select   C  This selects chain C   RasMol gt  color  100 150 255   This colors the chain blue green   RasMol gt  select   B  This selects chain B   RasMol gt  color  100 255 100   This col
33.   position specific  gap penalties and weight matrix choice  Nucleic  Acids Res  22 4673 4680     T  rnroth Horsefield  S   Wang  Y   Hedfalk  K    Johanson  U   Karlsson  M   Tajkhorshid  E    Neutze  R   and Kjellbom  P  2006  Structural  mechanism of plant aquaporin gating  Nature  439 688 694    Vijay Kumar  S   Bugg  C E   and Cook  W J  1987   Structure of ubiquitin at 1 8A resolution  J  Mol   Biol  194 531 544    Wang  Y   Cohen  J   Boron  W F   Schulten  K   and  Tajkhorshid  E  2007  Exploring gas permeabil   ity of cellular membranes and membrane chan   nels with molecular dynamics  J  Struct  Biol   157 534 544    Yin  Y  Jensen  M      Tajkhorshid  E   and  Schulten  K  2006  Sugar binding and protein    conformational changes in lactose permease   Biophys  J  91 3972 3985    Yu  J   Yool  A J   Schulten  K   and Tajkhorshid  E   2006  Mechanism of gating and ion conductivity    of a possible tetrameric pore in Aquaporin 1   Structure 14 1411 1423     Supplemental Files   Supplemental files can be downloaded from  http   www currentprotocols com by clicking     Current Protocols    beneath the Bioinformatics  head and following the Sample Datasets link   lfqy pdb   pdb coordinate file for human aquaporin  Murata  et al   2000    1j4n pdb   pdb coordinate file for bovine aquaporin  Sui et al    2001    11da pdb   pdb coordinate file for E  coli GlpF  Tajkhorshid  et al   2002     lrc2 pdb  pdb coordinate file for E  coli aquaporin  Savage  et al   2003     lubq p
34.   the identification of a sequence as being an important contributor to  for example  a  human disease  but there is no information from sequence comparisons about what the  biochemical or biological function s  of the gene product might be  The hope is that   since structure changes much more slowly than sequence  similarity to a structure of  known function might provide a valuable clue     Iam not completely sanguine about this belief  On the one hand  there are some impressive  examples of its success  Kim et al   2004   On the other hand  it is clear that the coupling    Current Protocols in Bioinformatics    between overall fold and biochemical function is often quite loose  especially for some  protein superfamilies  Hegyi and Gerstein  2001   Nevertheless  comparing a protein s  fold with those already known is an important and sometimes powerful method  Liisa  Holm  whose program DALI  uwir 5 5  is the most widely used tool for this purpose   describes in her unit in this chapter how that tool should be employed  As the pace of  structure determination increases  DALI will be in the vanguard not only for comparison  of structures but also for assembling the database of fold libraries and assessing fold  divergence     The growth of structure determination has turned most biochemists and biologists into  consumers of structural information  Genomics is accelerating this trend  As the demand  for such information continues to outstrip the supply  all aspects of structu
35.   where the  search term is a PDB identifier  e g   2kau or 2kauC      Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 5 11       Supplement 14           cons   iqkuA  lqknA  1tfcA  112jA  1xb7A  1s9qB  le3kA  1m22A    P    SKKNSLALSLTADQ  E AEP    100 XSKKNELADALSPNQLVSHLLVAEPEKIYAMPDPTVPDSEASAM    P    4  Dali Database Dali Database  multiple structure alignment   Microsoft Internet Explorer    Bak v 9   Q  3 B  Qseach GiFavortes eda J Gr I v  d    l    EAS    PEKIYAMPDPTVPDS     ITTLCDLADRELVHMIGWAKHIPGFSELSLADQMSLLQSAWMEILMLGLVWRSLDXXHPXXDELIFAPDLILDEDQGKCAEC    Y DPTRPF SEAS LTNLA LVHMI K SFVDLTLHDQVHLL                  100   P s    AWLEILMIGLVWRS HP LLFAPNLLLDRNQGKCVEC  LIFAPDLVLDRDEGKCVEC   DELV YADDYIMDEDQSKLA  LIFAPDLVLDRDEGKCVEC    DELAFASDLVLDE    LYFAPDLIL  LCFAPDLI       LLLHHHHHHHHHHHLLLL  LLLLHHHHHHHHHHHLLLL  LLLHHHHHHHHHH   HHHHHHHH   LLLLHHHHHHI  lHHHHLLLHHHHHHHHHH  HHHHHHHHLLLLLLLLL    HHHHHHHHHL    LLLLLLL    HHHHHHHHLLLLLLLL         IHHHHHHHHHHHHHHHHHHHHHH      LHHHHHHHHHHHHHHHHHHHHHHHHI  LHHHHHHHHHHHHHHHHHHHHHHHHHT    LHHHHHHHHHHHHHHHHHHHHHHH  LHHHHHHHHHHHHHHHHHHHHHHHE  HHHHHHHHHHHHHHHHHHE  LHHHHHHHHHHHHHHHHHHHHHHH  LHHHHHHHHHHHHHHHHHHHHHHHE  LHHHHHHHHHHHHHHHHHHHHHHH       HHHLLHHHHHHHHHHHHHHHHHHHHHHHLLL     L    LHHHLLHHHHHHHHHHHHHHHHHHHHHHHH          100 3    HHHLLHHHHHHHHHHHHHHHHHHHHHHHHLLXXLLXXLEEEEELLEEEEHHHHHHLLI  L  LEE LLLLEELLHHHHLLI  HHHLLHHHHHHHHHHHHHHHHHHHHHHHHLL EEEEELLEEEEHHHHHLLLI   LLHHHHHHHHHHHHHHHHHHHHHHHLL
36.  1       Supplement 4    BASIC  PROTOCOL    Modeling  Membrane  Proteins    5 3 2       Supplement 4       SELECTING A CORRECT PROTEIN STRUCTURE USING CHI    CHI is a series of user friendly task files and modules written by Adams  1995  to be used  in the general software suite CNS  Crystallography and NMR System  Briinger et al    1998   CHI constructs multiple bundles of helices  each differing from the other by the  rotation of the helices about their axes  as well as the bundle handedness  These are then  used as starting positions for molecular dynamics simulations and energy minimization  protocols  The output structures from these simulations are compared and grouped into  clusters that contain similar structures  An average of the structures forming a cluster  represents a model with characteristic interhelical interactions and helix tilt  The    Silent  Amino Acid Substitution Protocol    performs the above simulations on close sequence  variants that are likely to share the same structure  followed by a comparison of the clusters  from the different variants  in an attempt to find a common cluster into which all these  variants fold     In the protocol it will be assumed that the user is using a generic Unix system employing  the csh or tesh shell  The commands are entered at a terminal with the  gt  command  prompt  Text files are edited using a text editor  Those who are unfamiliar with the Unix    environment should refer to APPENDIX 1C  amp  APPENDIX ID   Neces
37.  1999    This task can be achieved by a genetic algo   rithm protocol that starts with a set of ini   tial alignments and then iterates through re   alignment  model building  and model assess   ment to optimize a model assessment score   John and Sali  2003   During this iterative  process   1  new alignments are constructed  by the application of a number of genetic al   gorithm operators  such as alignment muta   tions and crossovers   2  comparative models  corresponding to these alignments are built  by satisfaction of spatial restraints  as im   plemented in the program MODELLER  and   3  the models are assessed by a composite  score  partly depending on an atomic statisti   cal potential  Melo et al   2002   When test   ing the procedure on a very difficult set of 19  modeling targets sharing only 4  to 27  se   quence identity with their template structures     Modeling  Structure from  Sequence    5 6 19       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller    5 6 20       Supplement 15    the average final alignment accuracy increased  from 37  to 45  relative to the initial align   ment  the alignment accuracy was measured  as the percentage of positions in the tested  alignment that were identical to the reference  structure based alignment   Correspondingly   the average model accuracy increased from  43  to 54   the model accuracy was mea   sured as the percentage of the C  atoms of  the model that were within 5    of the corre   spondi
38.  3     Address    http   ekhidna biocenter helsinki f dal daiquery find estradol receptor z  eco   Links         Glsearch    amp    H  E Popups okay    ff Check      Autolink v ElAutoFil             Dali database query  estradiol receptor    Click on the Repres  links to browse the alignments and structural neighbours of the representative   Click on the Fold link to view all members ofthe fold class     PDB chain Repres  Browse Interact Compound   lqkuA 1 250 lqkuA 1 browse interact ESTRADIOL RECEPTOR  lqktA 1 250 1lqkuA 1 browse interact ESTRADIOL RECEPTOR  lqkuB 1 250 1lqkuA 1 browse interact ESTRADIOL RECEPTOR  1qkuC 1 250 browse interact ESTRADIOL RECEPTOR                Figure 5 5 8 The result of the query for  estradiol receptor  structures     Browse the Dali database    1  Go to the Dali database at Attp   www bioinfo biocenter helsinki fi dali start  The  home page is shown in Figure 5 5 7     The set of representative structures is called PDB90  and it contains all polypeptide chains  from the PDB with less than 90  sequence identity to each other  The representative  structures are decomposed into 14 020 domains  Hierarchical clustering reveals 3 107  fold types  Fold types are defined as clusters of structural neighbors in fold space with  average pairwise Dali Z scores above 2  The threshold has been chosen empirically and  groups together structures that have topological similarity  Higher Z scores correspond to  structures that agree more closely in architectu
39.  5 7 23   Blue areas indicate that the molecules are structurally  conserved at those points  red areas indicates that there is no correspondence in structure  at those points  As you can see  the a helices that form the pore are well conserved  structurally among the four aquaporins  while there are more structural differences in the  less functionally relevant loops     Sequence Alignment with MultiSeq    Besides revealing structural similarities  MultiSeq also allows comparison of proteins  based on their sequences  Sequence alignment is often used to identify conserved residues  among similar proteins  as such residues are likely of functional importance     Aligning and coloring molecules by degree of conservation  1  In the MultiSeq window  select Tools     ClustalW Sequence Alignment     2  In the ClustalW Alignment Options window  make sure the Align All Sequences  option is checked  and go to the bottom of the window and select OK  Now the four  aquaporins have been aligned according to their sequence using the ClustalW tool   Thompson et al   1994      3  Let us color the aligned molecules by their sequence similarity  In the MultiSeq  window  choose View     Coloring     Sequence identity     Now  each amino acid is colored according to the degree of conservation within the  alignment  blue means highly conserved  red means low or no conservation  Your MultiSeq  window and OpenGL window should resemble Figure 5 7 24     You have now aligned the four aquaporins acco
40.  A R  2003  Finding weak similarities  between proteins by sequence profile compari   son  Nucl  Acids Res  31 683 689     Park  J   Karplus  K   Barrett  C   Hughey  R    Haussler  D   Hubbard  T   and Chothia  C   1998  Sequence comparisons using multiple se   quences detect three times as many remote ho   mologues as pairwise methods  J  Mol  Biol   284 1201 1210     Pawlowski  K   Bierzynski  A   and Godzik  A   1996  Structural diversity in a family of homol   ogous proteins  J  Mol  Biol  258 349 366     Pearl  F   Todd  A   Sillitoe  I   Dibley  M   Redfern   O   Lewis  T   Bennett  C   Marsden  R   Grant   A   Lee  D   Akpor  A   Maibaum  M   Harrison   A   Dallman  T   Reeves  G   Diboun  I   Addou   S   Lise  S   Johnston  C   Sillero  A   Thornton   J   and Orengo  C  2005  The CATH Do   main Structure Database and related resources  Gene3D and DHS provide comprehensive do   main family information for genome analysis   Nucl  Acids Res  33 D247 D251     Pearson  W R  1994  Using the FASTA program  to search protein and DNA sequence databases   Methods Mol  Biol  24 307 331     Pearson  W R  2000  Flexible sequence similarity  searching with the FASTA3 program package   Methods Mol  Biol  132 185 219     Petrey  D  and Honig  B  2005  Protein structure pre   diction  Inroads to biology  Mol  Cell  20 811   819     Petrey  D   Xiang  Z   Tang  C L   Xie  L   Gim   pelev  M   Mitros  T   Soto  C S   Goldsmith   Fischman  S   Kernytsky  A   Schlessinger  A    Koh  I Y 
41.  A template  structure and the alignment in file TvLDH  1bdmA ali  file model single py      The first line  Fig  5 6 9  loads the automodel class and prepares it for use  An  automodel object is then created and called    a     and parameters are set to guide the  model building procedure  alnfile names the file that contains the target template  alignment in the PIR format  knowns defines the known template structure s  in  alnfile  TvLDH 1bdmA ali  and sequence defines the code of the target se   quence  starting model and ending model define the number of models that  are calculated  their indices will run from 1 to 5   The last line in the file calls the  make method that actually calculates the models  The most important output files are  model single log  which reports warnings  errors and other useful information  including the input restraints used for modeling that remain violated in the final model   and TvLDH B9999000  1 5   pdb  which contain the coordinates of the five pro   duced models  in the PDB format  The models can be viewed by any program that  reads the PDB format  such as Chimera  http   www cgl ucsf edu chimera   or RasMol   Attp llwww rasmol org      Current Protocols in Bioinformatics       from modeller automodel import      log verbose     env   environ    env libs topology read file  S   LIB  top heav lib    env libs parameters read file    LIB  par lib      env io atom_files_directory          atom files     mdl   model  env   mdl read file  TvL
42.  ALTERNATE  PROTOCOL 2    Modeling  Structure from  Sequence    5 5 13       Supplement 14    Using Dali for  Structural  Comparison of  Proteins    5 5 14       Supplement 14    3  The program automatically generates a data file for each chain in the PDB entry  In  the above examples  3ubpA dat  3ubpB dat  and 3ubpC dat are created in  the DAT subdirectory  The system uses the DSSP program by Kabsch and Sander   included in the DaliLite distribution package  to parse the information out of the  PDB file     DSSP requires that the complete backbone  N  C    C  O atoms  is present or it will skip  the residue  The MaxSprout server  http    www ebi ac uk maxsprout  can be used to build  full coordinates from a C   trace     4  The DAT file includes information about the Ca coordinates  primary structure   secondary structure elements  from DSSP  Kabsch and Sander  1983   and putative  folding pathway of the protein  from PUU  Holm and Sander  1994   The first line  of a properly formed DAT file is shown in Figure 5 5 12     If reading of the coordinates fails for any reason  only zeros will appear on the first line  of the DAT file     Generate structural alignments  5  There are options for pairwise  one against many  and many against many compar   isons  The structures are specified using the unique identifiers  introduced in step 2  when reading in PDB structures using the     readbrk option     Pairwise alignments of two structures are generated using exhaustive search  Par
43.  Alignment       In the Stamp Structural Alignment window  select All Structures  and keep the default    values for the rest of the parameters  Press the OK button to align the structures       In the MultiSeq program window  choose Tools     Phylogenetic Tree  The Phyloge     netic tree window will open       Select Structural tree using Qu  and press the OK button     A phylogenetic tree based on the Qg values should be calculated and drawn as shown in  Figure 5 7 26A  Here you can see the relationship between the four aquaporins  e g   how  the E  coli AqpZ  1r2c  is related to human AQPI  1fqy        You can also construct the phylogenetic tree of the four aquaporins based on their    sequence information  Close the Tree Viewer window       You need to perform the sequence alignment again for the four aquaporin proteins  In    your MultiSeq window  choose Tools     ClustalW Sequence Alignment  and make  sure the Align All Sequences option is checked  and press OK       In the MultiSeq program window  choose Tools     Phylogenetic Tree to open the    Phylogenetic tree window again       Select Sequence tree using ClustalW  and press the OK button  A phylogenetic tree    based on ClustalW will be calculated and drawn as shown in Figure 5 7 26B       Quit VMD              A B    O O O Tree Viewer   Qh Structure Tree eoo Tree Viewer   CLUSTALW Sequence Tree    fida A E  cot Thay  H sapiens   IHAA B  tuns tee2  E  cot   12  E cot Ids A Ecot  IHAA B  bonus    Le ty  M sapiens 
44.  B D   Bormann  B J   Dempsey  C E   and Engel   man  D M  1992a  Glycophorin A dimerization  is driven by specific interactions between trans     membrane alpha helices  J  Biol  Chem   267 7683 7689     Lemmon  M A   Flanagan  J M   Treutlein  H R    Zhang  J   and Engelman  D M  1992b  Sequence  specificity in the dimerization of transmembrane  alpha helices  Biochemistry 31 12719 12725     Lemmon  M A   Treutlein  H R   Adams  P D    Br  nger  A T   and Engelman  D M  1994  A  dimerization motif for transmembrane alpha   helices  Nat  Struct  Biol  1 157 163     MacKenzie  K R   Prestegard  J H   and Engelman   D M  1997  A transmembrane helix dimer   Structure and implications  Science 276 131   133     Rice  L M  and Br  nger  A T  1994  Torsion angle  dynamics  Reduced variable conformational  sampling enhances crystallographic structure re   finement  Proteins 19 277 290     Stevens  T J  and Arkin  I T  2000  Do more complex  organisms have a greater proportion of mem   brane proteins in their genomes  Proteins  39 417 420     Current Protocols in Bioinformatics    Torres  J   Adams  P D   and Arkin  I  T  2000  Use of   a new label  Boo  in the determination of a  structural model of phospholamban in a lipid  bilayer  Spatial restraints resolve the ambiguity  arising from interpretations of mutagenesis data     J  Mol  Biol  300 677 685     Torres  J   Kukol  A   and Arkin  I T  2001  Mapping  the energy surface of transmembrane helix helix  interactions  Biophys  J
45.  FAMSBASE for  Protein Structure    5 2 6          Supplement 4 Current Protocols in Bioinformatics       THO WAD M BRAID FWD AIH EI    Em   ES A E 2 E A 2 RER De E  E ER Ei Ee EP    n  Gogke  om                    rz  wm HOM Ore GE c AME         FAMSBASE    Clicking    Search    button invokes  AND    search for ORFs satisfying seven conditions below Help      Eam font      Order by  Gene Name z      asc C desc       ARCHAEA      V  Aeropyrum pernix Gero   V  Archaecglobus Arigidys  ata    Halobacterium sp tbsp   V Methanococcus jannaschii  mjan   V  Methanobecterium thermoautotrophicum  the      Pyrococcus abyssi pays   V  Pyrococcus horikoshii  pyro     V  Thermoplasma acidoghitum acid   BAGTERIA       Aquifex aeolicus  qua    V  Borrelia burgdorferi bbur    V  Bacillus hatodyrans thad   F  Bacillus subtilis bsub    V  Buchnera sp APS buch    V  Campylobacter jejuni  c je     V  Chlamycophila pneumoniae  ceneu   V Chlamyda trachomatis  otra   V  Chlamydia muricrum  ctraM     Deinococcus radiodurans  ar sd   V Escherichia cof ecol     Domem TS     wears                Figure 5 2 3 The upper part of the search page of FAMSBASE  41 species whose genome ORFs  have been determined are listed with check boxes on the left hand side  More details of the 41  species are described in http   spock  genes nig ac jp  gtop old org  html     Current Protocols in Bioinformatics       Modeling  Structure from  Sequence    52 7       Supplement 4        Eie amsbase bo naeoya vac p osrbelamd
46.  GMDS is that it  is possible to exhaustively search the configu   ration space of a transmembrane helical bundle  and come up with several candidate structures   One of these structures is presumed to be that  which is found in nature    The underlining premise of silent substitu   tion modeling is that silent substitutions do not  disrupt the native structure  but may destabilize  non native structures  Thus  it is possible to  select the correct structure among several can   didate structures using silent substitution mod   eling by looking for a model that is present in  all of the homologs    When will this procedure fail  There are  several possible situations in which this may  occur   1  where no single structure is found to  be in all of the homologs   2  where more than  one structure is found in all of the homologs   and  3  where the structure that is found in all  of the sequences is not the native one  Below   the potential causes of these failures are ana   lyzed and the ways to avoid them are suggested     No structure is found   There can be two simple reasons for the  failure to find a structure that persists in all of  the homologs    GMDS was not able to identify the native  structure in at least one homolog and perhaps  in all of them  The authors have found from  experience that this may happen when the tilt  of the helices is relatively large  as is the case  in the Influenza A M2 H  channel  Kukol et al    1999   This problem may be overcome by in   cr
47.  If the best Z score lies between  cutl and  cut2  then the  search list is restricted to the second neighbor shells of all hits      nbest 1 This parameter controls the number of hits in output  All hits with a Z score  above 2  or at least  nbest hits  will be reported     Current Protocols in Bioinformatics    Comparative Protein Structure Modeling  Using Modeller    Functional characterization of a protein sequence is one of the most frequent problems in  biology  This task is usually facilitated by an accurate three dimensional  3 D  structure of  the studied protein  In the absence of an experimentally determined structure  comparative  or homology modeling often provides a useful 3 D model for a protein that is related  to at least one known protein structure  Marti Renom et al   2000  Fiser  2004  Misura  and Baker  2005  Petrey and Honig  2005  Misura et al   2006   Comparative modeling  predicts the 3 D structure of a given protein sequence  target  based primarily on its  alignment to one or more proteins of known structure  templates      Comparative modeling consists of four main steps  Marti Renom et al   2000  Figure  5 6 1    1  fold assignment  which identifies similarity between the target and at least one    UNIT 5 6       identify related structures    target template  sequence structure       select templates                align target sequence with alignment    template structures        target  template        build a model for the target using    info
48.  L  Jr  2002  CASA  A server for the  critical assessment of protein sequence align   ment accuracy  Bioinformatics 18 496 497     Karchin  R   Cline  M   Mandel Gutfreund  Y   and  Karplus  K  2003  Hidden Markov models that  use predicted local structure for fold recogni   tion  Alphabets of backbone geometry  Proteins  51 504 514     Karchin  R   Diekhans  M   Kelly  L   Thomas  D J    Pieper  U   Eswar  N   Haussler  D   and Sali  A   2005  LS SNP  Large scale annotation of cod   ing non synonymous SNPs based on multiple  information sources  Bioinformatics 21 2814   2820     Karplus  K   Barrett  C   and Hughey  R  1998   Hidden Markov models for detecting remote  protein homologies  Bioinformatics 14 846   856     Karplus  K   Karchin  R   Draper  J   Casper   J   Mandel Gutfreund  Y   Diekhans  M   and  Hughey  R  2003  Combining local structure   fold recognition  and new fold methods for pro   tein structure prediction  Proteins 53 491 496     Kelley  L A   MacCallum  R M   and Sternberg   M J  2000  Enhanced genome annotation us   ing structural profiles in the program 3D PSSM   J  Mol  Biol  299 499 520     Koehl  P  and Delarue  M  1995  A self consistent  mean field approach to simultaneous gap closure  and side chain positioning in homology mod   elling  Nat  Struct  Biol  2 163 170     Current Protocols in Bioinformatics    Koh  L Y Y  Eyrich  V A    Marti Renom   M A   Przybylski  D   Madhusudhan  M S    Narayanan  E   Grana  O   Pazos  F   Valencia   A   Sali
49.  Query button  for Search for ORFs by Amino  Acid Sequence      In the Search for ORFs by Hetero Atom of Reference Protein text box  the Hetero Atom  refers to the HETATM line in PDB format  An amino acid sequence search using FASTA   UNIT 3 9  is performed by the Search for ORFs by Amino Acid Sequence text box  Fig   5 2 7   Users can search by several criteria at once  but the Amino Acid Sequence search  is exclusive     Select a model  3  Examine the model list that appears  Fig  5 2 8  with annotations of ORFs  model  lengths  number of amino acid residues   and identity percentages of amino acid  sequence alignments  with experimentally known structure      4  Select one line in the model list by clicking on a template ID  in the PSIBlast column  in Fig  5 2 8  from the model list  which will then bring up the amino acid alignment  view page  Fig  5 2 9   Display the selected model structure by clicking on the View  Target button    Both the model and the template will be displayed simultaneously  Fig  5 2 10  by clicking  the Superimpose button when using an appropriate model viewer  e g   RasMol   http   www umass edu microbio rasmol   The model file  not containing the template  can  also be downloaded by clicking on the View Target button  Fig  5 2 11      GUIDELINES FOR UNDERSTANDING RESULTS    Once the required model has been obtained  whether from FAMSBASE or from  FAMS  one may wonder about its accuracy  Generally  if the query sequence and the  amino acid sequen
50.  The snapshots shown are  from left to right  for frames  0  17  and 99  For the color version of this figure go to http  Avww currentprotocols com     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 21       Supplement 24    BASIC  PROTOCOL 4    Using VMD  An  Introductory  Tutorial    5 7 22       Supplement 24       The Basics of Movie Making in VMD    The following protocol describes how to make a movie in VMD     1  Startanew VMD session  Repeat steps 1 to 5 of Basic Protocol 3 to load the ubiquitin  trajectory into VMD and display the protein in a secondary structure representation     2  To make movies  we will use the VMD Movie Maker plugin  In the VMD Main  window  go to menu item Extension     Visualization     Movie Maker  The VMD  Movie Generator window will appear     Making single frame movies  3  Click on the Movie Settings menu in the VMD Movie Generator window  take a look  at the options     You can see that in addition to a trajectory movie  Movie Maker can also make a movie by  rotating the view point of a single frame  In the Renderer menu  one can choose the type  of renderer for making the movie  While renderers other than Snapshot  e g   Tachyon   generally provide more visually appealing images  they also take longer to render  The  rendering time is also affected by the size of the OpenGL window  since it takes more  computing time to render a larger image  We will first make a movie of just one frame of  the trajecto
51.  a section of the Fold Index  All members  of the fold class can be seen here at a glance  Fig  5 5 9      Domains in the Fold Index are annotated by the sequence family to which they belong   Sequence families are defined in the ADDA database  Heger and Holm  2003  based  on shared sequence motifs  ADDA unifies many structural neighbors with little overall    Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 5 9       Supplement 14    Using Dali for  Structural  Comparison of  Proteins    5 5 10       Supplement 14          Dali fold query  1060    Fold Index Adda Browse Interact Compound   1060 313 browse interact RETINOIC ACID RECEPTOR RXR ALPHA  1060 313 browse interact RETINOIC ACID RECEPTOR RXR ALPHA  1060 313 browse interact VITAMIN D3 RECEPTOR   1060 313 browse interact OXYSTEROLS RECEPTOR LXR BETA  1060 313 browse interact RETINOIC ACID RECEPTOR RXR BETA  1060 313 browse interact ECDYSONE RECEPTOR   1060 313 browse interact ORPHAN NUCLEAR RECEPTOR PXR   1060 313 browse interact BILE ACID RECEPTOR   1060 313 browse interact BILE ACID RECEPTOR   1060 313 interact NUCLEAR RECEPTOR ROR ALPHA   1060 313 i NUCLEAR RECEPTOR ROR BETA   313 i PEROXISOME PROLIFERATOR ACTIVATED REC  313 i HORMONE RECEPTOR ALPHA 1  THRA1  313 i THYROID HORMONE RECEPTOR BETA 1  313 i ORPHAN NUCLEAR RECEPTOR NR4A1  313 i NUCLEAR HORMONE RECEPTOR HR38  313 browse i ORPHAN NUCLEAR RECEPTOR NURR1  MSE 41  313 i ESTROGEN RECEPTOR BETA   313 interact ESTROGEN RECEPTOR BETA 
52.  as a tube  This is probably the most  popular drawing method to view the overall architecture of a protein     19  In the Graphical Representations window  choose Drawing Method     NewCartoon   The helices    sheets  and coils of the protein can now be easily identified     Ubiquitin has three and one half turns of a helix  residues 23 to 34  three of them  hydrophobic   one short piece of 3 jo helix  residues 56 to 59  and a mixed  sheet with  five strands  residues 1 to 7  10 to 17  40 to 45  48 to 50  and 64 to 72   and seven  reverse turns  VMD uses the program STRIDE  Frishman and Argos  1995  to compute  the secondary structure according to a heuristic algorithm     Exploring different coloring methods  In this series of steps  different coloring methods are explored     20  In the Graphical Representations window  the default coloring method is Coloring  Method     Name  In this coloring method  choose a drawing method that shows  individual atoms  each atom will have a different color  i e   O is red  N is blue  C is  cyan  and S is yellow     21  Choose Coloring Method     ResType  Fig  5 7 5C   This allows nonpolar residues   white  to be distinguished from basic residues  blue   acidic residues  red   and polar  residues  green      22  Select Coloring Method     Structure  Fig  5 7 5C  and confirm that the NewCartoon  representation displays colors consistent with secondary structure     Displaying different selections    To display only parts of the molecule of
53.  by the pro   grammer for a good reason  they are the best  guess for what the user will most often need   In many cases  they will also define a rep   resentation that corresponds to what a viewer  will expect to see  For instance  most programs    Current Protocols in Bioinformatics    provide a familiar atomic coloring scheme  us   ing black gray for carbon  red for oxygen  blue  for nitrogen  and so on  Before changing this  default coloring scheme  it is worth thinking  about how the picture will be viewed  Many  of the defaults provided by graphics programs  are designed to create familiar images with  chemical features that are recognizable at a  glance  For instance  if the color of all of the  oxygens is changed to yellow  most viewers  will automatically assume that they are sul   furs  potentially causing confusion  The radii  of spacefilling representations are another ex   ample of defaults that should be respected   since they are designed to show a particular  physical characteristic of the molecule    That being said  default parameters are only  guidelines  and should be modified to suit the  current goal  Color  in particular  is a powerful  tool for directing attention to key features  and  default parameters are rarely able to draw at   tention to exactly the feature that needs to be  highlighted  The width of cylinders in back   bone and bond diagrams provide another effec   tive avenue for customizing a representation     Suggestions for Further Analysis
54.  compact substructures at  the weakest interface  A number of postprocessing rules were introduced to supplement  numerical criteria  The whole procedure is fully described in the original publication   Holm and Sander  1995      Program Parameters    The following parameters are set at the top of the main Perl script  The default values   as used by the Dali server  are indicated  These parameters mainly affect the pruning of  search space in the database search     SMINLEN 30 Structures with fewer residues are excluded from comparison  Dali was  designed to detect similarities at the level of globular domain folding patterns that involve  several secondary structure elements  It is not designed to compare conformations of short  peptides     SMINSSE 2 The Wolf and Parsi methods reduce the complexity of the structural com   parison by representing structures  partly  as secondary structure elements  If there are  fewer than  MINSSE secondary structure elements in the protein  then the Soap method  is used     Scut0 20 0   cuti1 4 0   cut2 2 0 The database search by the Dali server uses  a set of rules to prune search space after a strong similarity has been found  If a similarity  has been found that is above a Z score equal to  cut0  then the search is stopped  completely because the query is structurally almost identical to the best hit  If similarities  have been found with Z scores above  cut1  then the search list is restricted to the first  neighbor shells of all hits 
55.  contains all the information needed to reproduce the same VMD session     42  Go to the OpenGL Display window  use the mouse to find a nice view of the protein   We will save this viewpoint using the VMD ViewMaster     43  In the VMD Main window  Fig  5 7 2   select Extension     Visualization     View   Master  This will open the VMD ViewMaster window     44  In the VMD ViewMaster window  click on the Create New button  The OpenGL  Display viewpoint has now been saved     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 11       Supplement 24    BASIC  PROTOCOL 2    Using VMD  An  Introductory  Tutorial    5 7 12       Supplement 24       45     46     47     48     Table 5 7 3 Secondary Structure Codes Used by STRIDE          Letter code Secondary structure  T Turn  E Extended conformation  Q sheets   B Isolated bridge  H Alpha helix  G 3 10 helix  I Pi helix  C Coil       Go back to the OpenGL Display window and use the mouse to find another nice view   If desired  you can add delete modify a representation in the Graphical Representa   tions window  When a good view has been found  save it by returning to the VMD  ViewMaster window and clicking on the Create New button     Create as many views as desired by repeating the previous step  All of the viewpoints  are displayed as thumbnails in the VMD ViewMaster window  A previously saved  viewpoint can be opened by clicking on its thumbnail     To save the entire VMD session  in the VMD Main w
56.  display and note that this representation quickly makes it possible to see  that   1  hemoglobin is composed of four similar chains with lots of alpha helices  and  2  there are four hemes that are sandwiched between alpha helices     Finding key residues   When looking for a particular amino acid  it is possible to examine a wireframe repre   sentation  This tends to be rather confusing  however  and it may be difficult to find the  desired amino acid among the many surrounding ones  By using a simple combination of  selection and representation commands  this process may be facilitated  The following  example shows an easy way to find the histidine residues in hemoglobin that interact  with the iron ions  without the need to go to the literature to find the residue number     4  From the overview representation presented above  type  in the Command Line  window     RasMol gt  select his    This selects all histidines     RasMol gt  cpk  This draws the histidines with spheres   The display should look like Figure 5 4 7     Current Protocols in Bioinformatics                         Figure 5 4 7 All histidines in hemoglobin are shown with spacefilling spheres  For the color  version of this figure go to http  Avww currentprotocols com     5  At this point it is fairly simple to zoom in on one of the hemes  as in Figure 5 4 8   and click on one of the histidine atoms to find the residue number  This is a bit tricky  with hemoglobin  since it has histidines on both sides of th
57.  every    time the distance is within the respective bin   for  set i 0    i  lt   nf   incr i     set k  expr int   simdata  i r     r_min    Sdr    incr distribution  k          Current Protocols in Bioinformatics    1  Write out the file with the distribution   set outfile  open  fd out w   for  set k 0    k  lt   N d   incr k     puts  outfile        expr  rmin    k    dr    distribution S k            close Soutfile    6  Now  run the script by typing in the TkConsole window     distance   protein     protein and resid 76   10   res76 r dat res76 d dat    This will compute the distance between the center of the protein and center of the terminal  residue 76  and write the distance versus time and its distribution to files res76 r dat  and res76 d dat     7  Repeat the same for the protein s residue 10 by typing in the TkConsole window     distance   protein     protein and resid 10   10   resl0 r dat resl0 d dat    The data in files produced by the script distance tcl are in two column format   Compare the outputs for residue 76 and 10 using your favorite external plotting program   Fig  5 7 30               n2  oa    distance  A   hv   e             frame      distribution  arb  u            14 16 18 20 22 24 26 28    distance  A           Figure 5 7 30 Distance between a residue and the center of ubiquitin  The distances analyzed  are those for residue 76  black  and residue 10  green   For the color version of this figure go to  http   www currentprotocols com     Cur
58.  file format    It is first necessary to convert the target TVLDH sequence into a format that is readable  by MODELLER  file TvLDH ali  Fig  5 6 2   MODELLER uses the PIR format to  read and write sequences and alignments  The first line of the PIR formatted sequence  consists of  gt P1  followed by the identifier of the sequence  In this example  the sequence  is identified by the code TvLDH  The second line  consisting of ten fields separated by  colons  usually contains details about the structure  if any  In the case of sequences with  no structural information  only two of these fields are used  the first field should be  Sequence  indicating that the file contains a sequence without a known structure  and  the second should contain the model file name  TVLDH in this case   The rest of the file  contains the sequence of TvLDH  with an asterisk     marking its end  The standard  uppercase single letter amino acid codes are used to represent the sequence     Searching for suitable template structures    A search for potentially related sequences of known structure can be performed us   ing the profile build   command of MODELLER  file build profile py    The command uses the local dynamic programming algorithm to identify related se   quences  Smith and Waterman  1981  Eswar  2005   In the simplest case  the command  takes as input the target sequence and a database of sequences of known structure  file  pdb_95 pir  and returns a set of statistically significant alignments
59.  files  ubiquitin psf and equilibration dcd      2  Open the TkConsole window by selecting Extension     Tk Console in the VMD  Main menu     3  In the TkConsole window  load the script into VMD by typing  source  distance tcl  make sure that the file distance tcl is in the current folder    This will load the procedure defined in distance tcl into VMD     4  One can now invoke the procedure by typing distance in the TkConsole window   In fact  the correct usage is    distance seltextl seltext2 Nd f_r_out fdout    where seltextl1 and seltext2 are the selection texts for the groups of atoms  between which the distance is measured  N  d is the number of bins for the distribution   and    r out and f  d  out are the file names to where the output distance versus  time and distance distribution will be written     5  Open the script file distance tcl with a text editor  You can see that the script  does the following     Current Protocols in Bioinformatics    BASIC  PROTOCOL 16    Modeling  Structure from  Sequence    5 7 43       Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 44       Supplement 24         Choose atom selections     set sell  atomselect top   S seltextl     set sel2  atomselect top   S seltext2         Get the number of frames in the trajectory and assign this value to the variable    nf   set nf  molinfo top get numframes       Open file specified by the variable f   r out     set outfile  open  f rout w       Loop over all frames    for  set i 0
60.  in  X ray crystallography          designing chimeras  stable   crystallizable variants    supporting site directed  mutagenesis     gt    5    lt   tc      5    5    lt   E  Ww  Qa   e   z      Sequence identity    refining NMR structures    fitting into low resolution  electron density   30 Structure from sparse  experimental restraints    functional relationships  from structural similarity    identifying patches of  conserved surface residues    finding functional sites by  3 D motif searching             Figure 5 6 13  ptAccuracy and application of protein structure models  The vertical axis indi   cates the different ranges of applicability of comparative protein structure modeling  the cor   responding accuracy of protein structure models  and their sample applications   A  The do   cosahexaenoic fatty acid ligand  violet  was docked into a high accuracy comparative model of  brain lipid binding protein  right   modeled based on its 62  sequence identity to the crystal   lographic structure of adipocyte lipid binding protein  PDB code 1adl   A number of fatty acids  were ranked for their affinity to brain lipid binding protein consistently with site directed mu   tagenesis and affinity chromatography experiments  Xu et al   1996   even though the ligand  specificity profile of this protein is different from that of the template structure  Typical overall  accuracy of a comparative model in this range of sequence similarity is indicated by a com   parison of a model fo
61.  interest  one can specify their selection in the  Graphical Representations window  Fig  5 7 5F      23  In the Graphical Representations window  there is a Selected Atoms text entry   Fig  5 7 5F   Delete the word all  type helix  and press the Apply button or  hit the Enter return key  remember to do this whenever a selection is changed   VMD  will show just the helices present in the molecule     24  In the Graphical Representations window  choose the Selections tab  Fig  5 7 7A   In  the section Singlewords  Fig  5 7 7B   a list of possible selections that can be entered  is provided     Combinations of Boolean operators can also be used when writing a selection     25  In order to see the molecule without helices and f sheets  type the following in the  Selected Atoms field   not helix  and  not betasheet   Remember to  press the Apply button or hit the Enter return key     26  In the section Keyword  Fig  5 7 7C  of the Selections tab  the properties that can be  used to select parts of a molecule are listed along with their possible values  Look at  possible values of the keyword    resname     Fig  5 7 7D      27  Display all the lysines and glycines present in the protein by typing  resname  LYS  or  resname GLY  in the Selected Atoms field     Lysines play a fundamental role in the configuration of polyubiquitin chains     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 7       Supplement 24    Using VMD  An  Introductory  Tutorial 
62.  interface in VMD  You will see that everything you can do in VMD interactively can also  be done with Tcl commands and scripts  We will also demonstrate how the extensive list  of Tcl text commands can help you investigate molecule properties and perform various  types of analysis     Necessary Resources    Hardware    Computer    Current Protocols in Bioinformatics    Software    Files    VMD  and a text editor    lubq pdb and beta tcl  which can be downloaded from    http   www currentprotocols com       The Basics of Tcl Scripting BASIC  Tcl is a rich language that contains many features and commands  in addition to the PROTOCOLS  typical conditional and looping expressions  Tk is an extension to Tcl that permits the  writing of graphical user interfaces with windows and buttons  etc  More information and  documentations about the Tcl Tk language can be found at Attp   www tcl tk doc  Let us  start with the basic commands   1  Start anew VMD session  In the VMD Main menu  select Extensions     Tk Console  to open the VMD TkConsole window  You can now start entering Tcl Tk commands  here   2  Try entering the following commands in the VMD TkConsole window  Remember  to hit enter after each line and take a look at what you get after each input   set x 10  puts   the value of x is   x    set text   some text        puts   the value of text is  S text    As you can see  the Tcl set and put commands have the following syntax   set variable value   sets the value of variable  p
63.  introduced in Basic Pro   tocols 11 to 13 should cite     Current Protocols in Bioinformatics    Roberts  E   Eargle  J   Wright  D  and  Luthey Schulten Z      MultiSeq  Unifying se   quence and structure data for evolutionary  analysis   BMC Bioinformatics  2006  7 382    Please see http   www ks uiuc edu   Research vmd allversions cite html for more  information on how to cite VMD and its tools     Literature Cited   Cruz Chu  E R   Aksimentiev  A   and Schulten  K   2006  Water silica force field for simulating nan   odevices  J  Phy  Chem  B  110 21497 21508     Eastwood  M P   Hardin  C   Luthey Schulten  Z    and Wolynes  P G  2001  Evaluating protein  structure prediction schemes using energy land   scape theory  IBM J  Res  Dev  45 475 497     Freddolino  P L   Arkhipov  A S   Larson  S B    McPherson  A   and Schulten  K  2006  Molecu   lar dynamics simulations of the complete satel   lite tobacco mosaic virus  Structure 14 437   449     Frishman  D  and Argos  P  1995  Knowledge   based secondary structure assignment  Proteins  23 566 579     Humphrey  W   Dalke  A   and Schulten  K  1996   VMD Visual Molecular Dynamics  J  Mol   Grap  14 33 38     Isralewitz  B   Gao  M   and Schulten  K  2001   Steered molecular dynamics and mechanical  functions of proteins  Curr  Opin  Struct  Biol   11 224 230     Murata  K   Mitsuoka  K   Hirai  T   Walz  T   Agre   P   Heymann  J B   Engel  A   and Fujiyoshi   Y  2000  Structural determinants of water per   meation through 
64.  is  plain text  as encoded messages  e g   MIME  or BinHex  are rejected by the server     Complex comparison   Each chain is compared separately  For ex   ample  similarities to structural units made up  of a dimer of two different chains  e g   A  and B  will not be detected  There is a way  around this limitation  which requires manual  editing of the PDB entry by the user  renumber  the residues in a sequential order and give all  chains the same chain identifier     Multidomain proteins   Itis advisable to break a multidomain query  structure into its constituent domains  because  the Dali server is designed to report all matches  only to the first found structural neighbor   hood  That is  if the query protein has one  common domain that is found by the fast fil   ters  the search termination criteria are satis   fied without a more unique domain in the same  query being tested systematically     Which Z score threshold implies homology    This varies for each protein family   Dietmann and Holm  2001   The topology of  the fold dendrogram  hierarchical clustering  of domains based on structure similarity  rep   resents evolutionary relationships fairly faith   fully  so that homologous structures are found  collected in one branch of the tree  However   the borders of the homologous families might  be found at Z scores around 4  helix turn helix  DNA binding domains  or around 14  TIM  barrels      Technical failures   The Dali server at the EBI is running au   tomatic
65.  modeling by FAMS  from sequence to structure  Basic Protocol outlines  the searching of FAMSBASE     Current Protocols in Bioinformatics       Modeling  Structure from  Sequence    5 2 5       Supplement 4       THD BAD KTV BWULADUD F D ANT        Al e Jn P       FAMSBASE    nmesyin5si  UPC SRBAEANT Far  HEMT AMS  SLAMDAEDIMNIAG     Il you want to use this server  fill out the form below  Access from    non profit site i only accepted    o     password   E eA  Vevk j EASE j    Wm RHT2 12 5   Fer the test user    Public login  EFVORB ALT   You can search ORFs with 30 structure models    Veter iat Model Use list       This server uses Java JavaScript and Cookies Change Preferences of the browser and enable Java and  JavaScript  nd accept Cookies  For a proper operation of this server  the following browser is required     Windows  Netscape 4 7 or later  hternet Explorer 5 0 or later  Macintosh  Internet Explorer 45 or later    Access policy of FAMSBASE    The coordnate of protein 3D structure models budt by FAMS  and the system of FAMSBASE are legally protected by  copyright and unauthor ced access to the database is an act of ntringement of the right    Access from academia                 Figure 5 2 2 The login page of FAMSBASE  As stated on the page  one must first obtain an ID  and password from an administrator of FAMSBASE  If time is a factor or one just wishes to check  the contents of the database  click on the  Public login  link to go to the search page     FAMS and 
66.  molecules  or trajectory frames  except available memory   7  molecular analysis  commands   8  rendering high resolution  publication quality molecule images   9   movie making capability   10  building and preparing systems for molecular dy   namics simulations   11  interactive molecular dynamics simulations   12  extensions  to the Tcl Python scripting languages  and  13  extensible source code written in  C and C       This unit will serve as an introductory VMD tutorial  Itis impossible to cover all of VMD s  capabilities in one unit  instead  we will present several step by step examples of VMD s  basic features  Topics covered in this tutorial include visualizing molecules in three  dimensions with different drawing and coloring methods  rendering publication quality  figures  animating and analyzing the trajectory of a molecular dynamics simulation   scripting in the text based Tcl Tk interface  and analyzing both sequence and structure  data for proteins        Current Protocols in Bioinformatics 5 7 1 5 7 48  December 2008   Published online December 2008 in Wiley Interscience  www interscience wiley com    DOI  10 1002 0471250953 b10507s24   Copyright O 2008 John Wiley  amp  Sons  Inc     UNIT 5 7    Modeling  Structure from  Sequence  es    5 7 1    Supplement 24                   Figure 5 7 1 Example renderings made with VMD  Cruz Chu et al   2006  Freddolino et al   2006  Yin et al   2006  Yu et al    2006  Sotomayor et al   2007  Wang et al   2007   For the c
67.  mouse key   B   The rotation axes when holding down the right mouse key  For the color version of this figure go  to http  www currentprotocols com               vYv  oouvz                Figure 5 7 4 Mouse modes and their characteristic cursors       Holding down the right mouse button and repeating the previous step will cause    rotation around an axis perpendicular to the screen  Fig  5 7 3B      For Mac users who have a single button mouse or a trackpad  the right mouse button is  equivalent to holding down the command key while pressing the mouse trackpad button       In the VMD Main window  look at the Mouse menu  Fig  5 7 4   Here  the user is    able to switch the mouse mode from Rotation to Translation or Scale modes       Choose the Translation mode and go back to the OpenGL Display  It is now possible    to move the molecule around when you hold the left mouse button down       Go back to the Mouse menu and choose the Scale mode this time  This will allow    the user to zoom in or out by moving the mouse horizontally while holding down the  left mouse button     It should be noted that these actions performed with the mouse only change the viewpoint  and do not change the actual coordinates of the molecule s atoms     Also note that each mouse mode has its own characteristic cursor and its own shortcut  key  r  Rotate  t  Translate  s  Scale   When the OpenGL Display window is the active  window  these shortcut keys can be used instead of the Mouse menu to change
68.  necessary software resources  Test examples are included in the distribution package     Current Protocols in Bioinformatics    3a  To unpack the distribution package using Linux  Enter the following user input after  the Linux prompt     Linux prompt gt  tar  zxvf DaliLite 2 4 2 tar gz    Linux prompt gt  cd   DaliLite 2 4 2 Bin    3b  To unpack the distribution package using Cygwin  Enter the following user input  after the Linux prompt     Linux prompt gt  mv  f Makefile_cygwin Makefile    4  Use a text editor to set proper HOMEDIR and ESCAPED  HOMEDIR in Makefile  by typing the following commands     Linux prompt   make clean  Linux prompt   make install  Linux prompt   make test  Linux prompt   cd        Linux prompt    DaliLite  help  Note that the maximum acceptable length of the HOMEDIR path is 70 characters     GUIDELINES FOR UNDERSTANDING RESULTS    As in sequence analysis  the goal of structural database searching is usually to identify  homologous proteins that might provide clues to the function of the query protein   Homology means descent from a common ancestor  One can infer homology from  sequence or structural similarities that are so strong they would not be expected to have  arisen by chance  The structural neighbors reported by Dali  Basic Protocol 2  are ranked  in order of decreasing structural similarity  Z score   Basic Protocol 3 allows browsing a  precomputed clustering of all structures into groups with similar folds  The clustering is  hierarchic
69.  of what would seem to be the easiest situation  homology  modeling of a protein structure from a sequence that displays significant identity to one  adopting a known fold  This is the subject of unir 5 6 by Andrej Sali and colleagues  who  have made some of the most important contributions to homology modeling  They discuss  every aspect of the procedure  from fold assignment to alignment of the target with the  template to model construction and validation  They emphasize that even very similar  sequences may have regions of structure that diverge significantly  principally loops         Contributed by Gregory A  Petsko  Current Protocols in Bioinformatics  2006  5 1 1 5 1 3  Copyright    2006 by John Wiley  amp  Sons  Inc     UNIT 5 1    Modeling  Structure from  Sequence  EE    5 1 1    Supplement 15    An Introduction  to Modeling  Structure from  Sequence    5 1 2       Supplement 15       They show how multiple sequence alignments and the use of a family of templates can  improve the accuracy of such regions  They also explain how to decide what size grain of  salt should be used in taking the results of a homology model as factual  Their program   MODELLER  is one of the most widely used tools for homology model construction   and they describe in detail how to use it     A different approach to model construction is discussed in the unit by Umeyama and  Iwadate  Their program FAMS  uir 5 2  uses a simulated annealing algorithm to    refine     the model so as to impr
70.  om  0 0  Gem OM           0 10 changes per sae                Figure 5 7 26  A  A structure based phylogenetic tree generated by Quy values   B  A sequence   based phylogenetic tree generated by ClustalW     Current Protocols in Bioinformatics    DATA ANALYSIS IN VMD    VMD is a powerful tool for analysis of structures and trajectories  Numerous tools for  analysis are available under the VMD Main menu item Extension     Analysis  In addition  to these built in tools  VMD users often use custom written scripts to analyze desired  properties of the simulated systems  VMD Tcl scripting capabilities are very extensive   and provide boundless opportunities for analysis  In this section  we will learn how to  use built in VMD features for standard analysis  as well as consider a simple example of  scripting     Necessary Resources    Hardware    Computer    Software    VMD  a text editor  and a plotting application    Files    ubiquitin psf pulling dcd equilibration dcd and distance   tcl  which can be downloaded at http  Avww currentprotocols com    Adding Labels in VMD    Labels can be placed in VMD to get information on a particular selection  to be used  during visualization and quantitative analysis  Labels are selected with the mouse and  can be accessed in Graphics     Labels menu  We will cover labels that can be placed  on atoms and bonds  although angle and dihedral labelings are also available  In this  context  labels for    bonds    or    angles    actually mean dist
71.  on residue substi   tution tables dependent on structural features  such as solvent exposure  secondary structure  type  and hydrogen bonding properties  Shi  et al   2001  Karchin et al   2003  McGuffin  and Jones  2003  Zhou and Zhou  2005   or on  statistical potentials for residue interactions  implied by the alignment  Sippl  1990  Bowie  et al   1991  Sippl  1995  Skolnick and Kihara   2001  Xu et al   2003   The use of structural  data does not have to be restricted to the struc   ture side of the aligned sequence structure  pair  For example  SAM TO2 makes use of  the predicted local structure for the target  sequence to enhance homolog detection and  alignment accuracy  Karplus et al  2003    Commonly used threading programs are  GenTHREADER  Jones  1999  McGuffin and  Jones  2003   3D PSSM  Kelley et al   2000    FUGUE  Shi et al   2001   SP3  Zhou and    Modeling  Structure from  Sequence    5 6 15       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller    5 6 16       Supplement 15       Zhou  2005   and SAM TO2 multi track HMM   Karchin et al   2003  Karplus et al   2003      Iterative sequence structure alignment  and model building    Yet another strategy is to optimize the align   ment by iterating over the process of calcu   lating alignments  building models  and eval   uating models  Such a protocol can sample  alignments that are not statistically significant  and identify the alignment that yields the best  model  Although this pr
72.  or  deleted using the  A  Create Rep and  B  Delete Rep buttons  Screen also shows  C  the Material  pull down menu and  D  list of representations     Creating multiple representations   The button Create Rep  Fig  5 7 8A  in the Graphical Representations window allows  creation of multiple representations  Therefore  users can have a mixture of different  selections with different styles and colors  all displayed at the same time     32  For the current representation  in the Selected Atoms field type protein  set the  Drawing Method to NewCartoon and the Coloring Method to Structure in the Draw  style tab     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 9       Supplement 24    Using VMD  An  Introductory  Tutorial       5 7 10    Supplement 24       Table 5 7 2 Examples of Representations          Selection Coloring method Drawing method  Water Name CPK  resid 1 76 and name CA ColorID 1 VDW       33  Press the Create Rep button  Fig  5 7 8A   A new representation will be created     34  Modify the new representation to get VDW as the Drawing Method  ResType as the  Coloring Method  and resname LYS as the current selection     35  Repeating the previous procedure  create the following two new representations in  Table 5 7 2  These two representations show water molecules and the C   atoms of  the first and last residues of the protein     36  Create the last representation by pressing the Create Rep button again  Select Drawing  Method 
73.  or al   ternating alpha beta   At intermediate range   clusters are related by shape similarity that  does not necessarily reflect similarity of bi   ological function  for example  globins and  colicin A   At close range  clusters represent  protein families related through strong func   tional constraints  for example  hemoglobin  and myoglobin   Evolutionary relationships  can be recovered by searching for continuous  neighborhoods  Dietmann and Holm  2001     In order to identify natural groupings of  any set of objects  one needs a measure of  distance or similarity  Structure comparison  programs derive a structural alignment  which  maximizes similarity or minimizes distance   The alignment defines a one to one correspon   dence of amino acid residues  sequence posi   tions  in two proteins  This is analogous to  sequence alignment except that the notion of  similarity or dissimilarity is much more com   plex between three dimensional objects than  between linear strings  For example  the con   formation of a point mutant usually differs  from the wild type protein only locally and    Current Protocols in Bioinformatics    only by a few tenths of an angstrom  Much  larger deviations are commonly observed in  pairs of homologous proteins  and with in   creasing sequence dissimilarity small shifts in  the relative orientations of secondary struc   ture elements accumulate and reach several  angstroms and tens of degrees  At the largest  evolutionary distances  only the 
74.  parameters should be set for each  helix individually  otherwise they should only be set once    vi  Rotation start  default is 0      vii  Rotation finish  default is 360       viii  Rotational step size  increment step default is 45    the authors suggest setting it  to 10  for a symmetric search      14b  Set other restraints  it is not necessary to use these parameters in the Silent Amino    Acid Substitution Protocol      Electrostatic effects   Value of the dielectric constant  for a membrane matrix enter 2 0  for a vac   uum matrix enter 1 0     Initial rotation and tilt    Distance between centers of neighboring helices  default is 10 4 A   Left hand crossing angle  default is 25      Right hand crossing angle  default is    25       Clustering parameters    Cutoff for root mean square difference between two structures  indicates  structure similarity of two structures  default is 1 A  a larger number  would result in finding more clusters that are not as well grouped    Minimum number of structures which define a cluster  default is 10      15b  Click the    Save updated file    at the bottom of the screen  which will download a    new  updated chi_param file into the local computer  Save it to the correct  directory  e g     MyProtein variantA       Run the GMDS search    16     17     Change to the correct working directory  e g     MyProtein variantA  with  the following command         cd   MyProtein variantA    All commands should be issued form this directory 
75.  proteins  one  needs to use a different method  A good tool is the VMD MultiSeq plugin  which we will  discuss in the following section     COMPARING PROTEIN STRUCTURES AND SEQUENCES WITH THE  MultiSeq PLUGIN    MultiSeq  Roberts et al   2006  is a bioinformatics analysis environment developed in  the Luthey Schulten Group at the University of Illinois in Urbana Champaign  MultiSeq  allows users to organize  display  and analyze both sequence and structure data for proteins  and nucleic acids  and has been incorporated in VMD as a plugin tool starting with VMD  version 1 8 5  MultiSeq homepage  hitp   www scs uiuc edu  schulten multiseq   In this  section  you will learn how to compare protein structures and sequences with the VMD  MultiSeq plugin  We will again use the water transporting channel protein  aquaporin  as  an example     Current Protocols in Bioinformatics    Necessary Resources    Hardware    Computer    Software  VMD  and a text editor    Files    lfqy pdb  1rc2 pdb  11da pdb  1j4n pdb  and spinach  aqp   fasta  which can be downloaded at Attp   www currentprotocols com    Structure Alignment with MultiSeq    Very often comparing structures of different proteins reveals important information   For example  proteins with similar functions tend to exhibit similar structural features   MultiSeq structure alignment is useful for this reason  We will compare the structures of    four aquaporin proteins listed in Table 5 7 8     Loading aquaporin structures    1  S
76.  second  tenth  eleventh  and twelfth columns  The second column reports the code of  the PDB sequence that was aligned to the target sequence  The eleventh column reports  the percentage sequence identities between TvLDH and the PDB sequence normalized  by the length of the alignment  indicated in the tenth column   In general  a sequence  identity value above  25  indicates a potential template  unless the alignment is too  short  i e     100 residues   A better measure of the significance of the alignment is given  in the twelfth column by the E value of the alignment  lower the E value the better      In this example  six PDB sequences show very significant similarities to the query se   quence  with E values equal to 0  As expected  all the hits correspond to malate dehydro   genases  1bdm A  5mdh A  1b8p A  Iciv A  7mdh A  and 1smk A   To select the appro  Modeling    priate template for the target sequence  the alignment   compare_structures    Structure from  Sequence    5 6 7       Current Protocols in Bioinformatics Supplement 15            environ        alignment  env  file  S LIB  CHAINS all seq      align codes   1b8pA    lbdmA    icivA     malign     malign3d     compare structures     id table       dendrogram       5mdhA       mdhA       lgmkA             Figure 5 6 5 Script file compare   py     Comparative  Protein Structure  Modeling Using  Modeller       5 6 8    Supplement 15       command will first be used to assess the sequence and structure similarit
77.  server for comparing two  structures to each other and visualizing the struc   tural superimposition     http   www ebi ac uk dali    The Dali e mail server for comparing a new struc   ture against the database of known structures     http   www  bioinfo  biocenter helsinki fi dali  The Dali database for browsing structural and se   quence neighbors of proteins     http   www  bioinfo  biocenter helsinki fi sqgraph   pairsdb    The ADDA classification assigns every residue of  known protein sequences into a domain family and  interactively visualizes the sequence neighbors of  any query protein in a multiple alignment     http   srs ebi ac uk  http   www ncbi nlm nih gov    SRS at EBI and Entrez at NCBI are comprehen   sive search engines that cross reference the PDB  identifier of a protein to many other databases        Contributed by Liisa Holm  Sakari  K    ri  inen  and Chris Wilton   Institute of Biotechnology   University of Helsinki   Helsinki  Finland    Dariusz Plewczynski   Interdisciplinary Centre for Mathematical  and Computation Modeling   University of Warsaw   Warsaw  Poland    Current Protocols in Bioinformatics    APPENDIX    Objective Function    The objective function of the Dali algorithm and the normalization of structural similarity  Scores to obtain the Z score are described below     Consider two proteins labeled A and B  The match of two substructures is evaluated  using an additive similarity score S of the form     s Y Y      L  i l j l    Equatio
78.  struc   ture and is read into VMD   s    Beta    field  Since we are not currently interested in this  information  we can use this field to store our own numerical values  VMD has a    Beta     coloring method  which colors atoms according to their B factors  By replacing the Beta  values for various atoms  you can control the color in which they are drawn  This is very  useful when you want to show a property of the system that you have computed     6  Return to the Tk Console window and type Scrystal set beta 0     This resets the    beta    field  which is displayed  to zero for all atoms  As you do this  you  should observe that the atoms in your OpenGL window will suddenly change to a uniform  color  since they all have the same beta values now      You can obtain and set many atomic properties using atom selections  including segment   chain  residue  atom name  position  x  y and z   charge  mass  occupancy and radius  just  to name a few     7 In the Tk Console window  type set sel  atomselect top        hydrophobic              This creates a selection  sel  that contains all the atoms in the hydrophobic residues     8  Let us label all hydrophobic atoms by setting their beta values to 1  type Ssel set  beta 1 inthe Tk Console window  If the colors in the OpenGL Display do not get  updated  go to the Graphical Representations window and click on the Apply button  at the bottom     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 25  
79.  that are  found in all of the homologs  If more than one structure is found  try to enforce a stricter  threshold by reducing the number       MyProtein cns rmsd result  This file lists all pairwise RMSD results       MyProtein rmsd_calculation_list  This is an accessory file to be used by  CNSsolve       MyProtein log  This is the CNSsolve log file     As stated above  view the file compare rmsd out in order to decide whether to repeat  the previous step with a different threshold or not     32  Repeat the above steps until a single cluster is identified that is found in all variants     GUIDELINES FOR UNDERSTANDING RESULTS    The procedure outlined in the Basic Protocol is a relatively simple one that involves two  steps   1  generate possible structures for each of the variants and  2  check if there is one  structure that persists in all of the different variants  There are few key points to which  one should pay close attention  and these are outlined below     How Well Do the Individual Variants Cluster     The clustering parameters  i e   the RMSD threshold and the minimal number of structure  per cluster  are chosen arbitrarily  They will obviously change the outcome of the  simulation in that they will change the number of possible structures from each variant   The authors have tended not to extend the RMSD threshold beyond 1 25     and the number  of structures per cluster is not lowered beyond 7     How Well Does a Single Structure Persist in All of the Variants 
80.  the Cartesian coordinates of  10 000 atoms   3 D points  that form the modeled molecules   For a 10 000 atom system  there can be on  the order of 200 000 restraints  The functional  form of each term is simple  it includes a  quadratic function  harmonic lower and up   per bounds  cosine  a weighted sum of a few  Gaussian functions  Coulomb law  Lennard   Jones potential  and cubic splines  The geo   metric features presently include a distance  an  angle  a dihedral angle  a pair of dihedral an   gles between two  three  four  and eight atoms   respectively  the shortest distance in the set of  distances  solvent accessibility  and atom den   sity that is expressed as the number of atoms  around the central atom  Some restraints can be  used to restrain pseudo atoms  e g   the gravity  center of several atoms     Modeling  Structure from  Sequence    5 6 17       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller       5 6 18    Supplement 15       Optimization of the objective function  Fi   nally  the model is obtained by optimizing the  objective function in Cartesian space  The op   timization is carried out by the use of the vari   able target function method  Braun and Go   1985   employing methods of conjugate gra   dients and molecular dynamics with simulated  annealing  Clore et al   1986   Several slightly  different models can be calculated by varying  the initial structure  and the variability among  these models can be used to estimate t
81.  the mouse  mode     Another useful option is the Mouse     Center menu item  It allows you to specify the point  around which rotations are done     9  Select the Center menu item and pick one atom at one of the ends of the protein  the    cursor should display a cross     Current Protocols in Bioinformatics          F  B  C  D   E                Figure 5 7 5 The Graphical Representations window   A  List of representations   B  the tabs  for Draw Style  Selections  Trajectory  and Periodic   C  Coloring Method pull down menu   D   Drawing Method pull down menu   E  user adjustable parameters for different drawing methods   and  F  selection text entry box     10  Now  press r  and rotate the molecule with the mouse and see how the molecule  moves around the selected point     11  In the VMD Main window  select the Display     Reset View menu item to return to  the default view  You can also reset the view by pressing the         key when you are  in the OpenGL Display window     Graphical representations   VMD can display molecules in various ways by setting the Graphical Representations  window shown in Figure 5 7 5  Each representation is defined by four main parameters   the selection of atoms included in the representation  the drawing style  the coloring  method  and the material  The selection determines which part of the molecule is drawn   the drawing method defines which graphical representation is used  the coloring method  gives the color of each part of the repr
82.  they use  Pietrokovski   1996  Rychlewski et al  1998  Yona and  Levitt  2002  Panchenko  2003  Sadreyev  and Grishin  2003  von Ohsen et al   2003   Edgar and Sjolander  2004  Marti Renom  et al   2004  Zhou and Zhou  2005   However   several analyses have shown that the overall  performances of these methods are compara   ble  Edgar and Sjolander  2004  Marti Renom  et al   2004  Ohlson et al   2004  Wang and  Dunbrack  2004   Some of the programs that  can be used to detect suitable templates are  FFAS  Jaroszewski et al   2005   SP3  Zhou  and Zhou  2005   SALIGN  Marti Renom  et al   2004   and PPSCAN  Eswar et al    2005      Sequence structure threading methods   As the sequence identity drops below  the threshold of the twilight zone  there is  usually insufficient signal in the sequences or  their profiles for the sequence based methods  discussed above to detect true relationships   Lindahl and Elofsson  2000   Sequence   structure threading methods are most useful  in this regime  as they can sometimes  recognize common folds even in the absence  of any statistically significant sequence  similarity  Godzik  2003   These methods  achieve higher sensitivity by using structural  information derived from the templates  The  accuracy of a sequence structure match is  assessed by the score of a corresponding  coarse model and not by sequence similarity   as in sequence comparison methods  Godzik   2003   The scoring scheme used to evaluate  the accuracy is either based
83.  to  split this into two figures  an overview to show the context and a close up to show the  details     COMMENTARY    Background Information  A decade ago  molecular graphics was the  domain of experts in computer graphics  but    resentations  making it possible to tailor the  image to one s own application        today a wide variety of molecular graphics pro   grams are available  allowing researchers  stu   dents  and educators to create their own molec   ular illustrations  Since molecules are them   selves smaller than the wavelength of light  a  metaphor must be employed to create a model  that captures some properties of the molecule  in visual form  Several of these metaphors have  had lasting success  bond diagrams to show the  covalent geometry of the molecule  spacefill   ing diagrams to show the shape and form of  the molecule  and backbone representations to  show the topology and folding of a macro   molecular chain  Most molecular graphics pro   grams allow the user to create an image of a  molecule using a combination of these rep     Critical Parameters and  Troubleshooting   Most computer graphics programs contain  hundreds of user controlled parameters for se   lecting and displaying different portions of  molecules  These programs also provide de   fault values for these parameters  so that an  initial image may be generated rapidly  These  defaults should provide a guide  but not a lim   itation  to the creative process    Default parameters are chosen
84.  to install the MODELLER executa   bles  The default choice will place it in the directory indicated  but any directory  to which the user has write permissions may be specified    Full directory name for the installed MODELLER8v2    lt YOUR HOME DIRECTORY gt  bin modeller8v2      c  For the prompt below  enter the MODELLER license key obtained in step 3   KEY MODELLER8v2  obtained from our academic       i i Modeling  license server at http   salilab org modeller  Structure from  registration shtml  Sequence   5 6 13    Current Protocols in Bioinformatics Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller       5 6 14    Supplement 15    8  The installer will now confirm the answers to the above prompts  Press Enter to  begin the installation  The mod8v2 script installed in the chosen directory can now  be used to invoke MODELLER     Other resources    9  The MODELLER Web site provides links to several additional resources that can  supplement the tutorial provided in this unit  as follows     a     News about the latest MODELLER releases can be found at http   salilab org   modeller news html       There is a discussion forum  operated through a mailing list  devoted to providing    tips  tricks  and practical help in using MODELLER  Users can subscribe to the  mailing list at http   salilab org modeller discussion forum html  Users can also  browse through or search the archived messages of the mailing list       The documentation section of the web 
85. 1   into which the chain A of the 1bmd structure is read  are created  append model     transfers the PDB sequence of this model to aln and assigns it the name of 1bdmA   align codes   The TvLDH sequence  from file TvLDH  ali  is then added to aln  using append     The align2d   command aligns the two sequences and the align   ment is written out in two formats  PIR  TvLDH 1bdmA ali and PAP  TvLDH   lbdmA pap  ThePIR format is used by MODELLER in the subsequent model building  stage  while the PAP alignment format is easier to inspect visually  In the PAP format   all identical positions are marked with a    file TvLDH 1bdmA pap  Fig  5 6 8   Due  to the high target template similarity  there are only a few gaps in the alignment     Current Protocols in Bioinformatics    Modeling    Structure from    Sequence    5 6 9       Supplement 15          _aln pos  ThdmA  TvLDH  _consrvd    _aln p  ThdmA  TvLDH  _consrvd    _aln pos  lbdmA  TvLDH  _consrvd    _aln pos  ThdmA  TvLDH  _consrvd    _aln pos  ThdmA  TvLDH  _consrvd       MKAPVRVAVTGAAGOIGY SLLFRIAAGEMLGKDQPVILQLLEI POAMKALEGVVMELEDCAFPLLAGL  MSEAAHVLITGAAGOIGYILSHWIASGELYG DROVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGF      KKK kk ke ke ke ko kk kk 0k k  0k X k k X ko kk o o KOR KOK RK ke ke x  70 80 90 100 110 120 130   EATDDPDVAFKDADYALLVGAAPRL          OVNGKIFTEQGRALAEVAKKDVKVLVVGNPANTN   VATTDPKAAFKDIDCAFLVASMPLKPGOVRADLISSNSVIFKNTGEYLSKEWAKPSVKVLVIGNPDNTN  kk ko ke x ko koX kx   xo o xx   o    x kx    X k ck ko ko XXK X ke x  1
86. 2001     PSIPRED  McGuffin et al   2000   RAPTOR  Xu et al   2003   SUPERFAMILY  Gough et al   2001   SAM T02  Karplus et al   2003    SP3  Zhou and Zhou  2005   SPARKS2  Zhou and Zhou  2004   THREADER  Jones et al   1992     UCLA DOE FOLD SERVER  Mallick et al    2002     Target template alignment  BCM SERVERF  Worley et al   1998     BLOCK MAKERF  UNIT 2 2  Henikoff et al    2000     CLUSTALW  UNIT 2 3  Thompson et al   1994   COMPASS  Sadreyev and Grishin  2003     http   bips u strasbg fr en Products Databases BAliBASE   http   www biochem ucl ac uk bsm cath   http   www salilab org dbali  http   www ncbi nlm nih  gov Genbank   http   bioinfo mbb yale edulgenome   http   www salilab org modbasel  http llwww rcsb orglpdbl  http llwww sanger ac uk Softwarel Pfam   http   scop mrc lmb cam ac uk scop   http    www expasy org   http    www uniprot org    http    123d ncifcrf gov   http   www sbg bio ic ac uk  3dpssm  http   www ncbi nlm nih  gov BLAST    http   www2 ebi ac uk dali   http   www ebi ac uk fasta33    http    ffas ljcrf edul  http   cubic bioc columbia edulpredictprotein     http   www  bioinformatics   buffalo edu   new_buffalo services threading html    http  lIbioinf cs ucl ac uklpsipred   http l genome math uwaterloo ca    raptor   http l supfam mrc Imb cam ac uk SUPERFAMILY   http lIwww soe ucsc edulresearchlcompbiol HMM  apps   http   phyyz4 med buffalo edu    http    phyyz4 med buffalo edu   http   bioinf cs ucl ac uk threader threader html   http    fold d
87. 40 150 160 170 180 190 200    ALIAYKNAPGLNPRNFTAMTRLDHNRAKAQLAKKTGTGVDRIRRMTVWGNHSSIMFPDLFHAEVD       10 20 30 40 50 60                                           CEIAMLHAKNLKPENFSSLSMLDONRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTOATFTKEG    kk     kek k  kk KKK     k   KKKKK           210 220 230 240 250 260 270   GRPALELVDMEWYEKVFIPTVAORGAAIIQARGASSAASAANAAIEHIRDWALGTPEGDWVSMAVPS  KTOKVVDVLDHDYVFDTFFKKIGHRAWDILEHRGFTSAASPTKAAIOHMKAWLFGTAPGEVLSMGIPV                 kkk      x xk kxk   Kk                  280 290 300 310 320 330    GEYGIPEGIVYSFPVTAK DGAYRVVEGLEINEFARKRMEITAOELLDEMEOVKAL  GLI  EGNPYGIKPGVVFSFPCNVDKEGKIHVVEGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG    KKK   k KKK   KKKK           Ok                                                             Figure 5 6 8 The alignment between sequences TvLDH and 1bdma  in the MODELLER PAP format  File TvLDH     lbmdA pap     Comparative  Protein Structure  Modeling Using  Modeller    5 6 10          Supplement 15       from modeller automodel import      env   environ     a   automodel env  alnfile  TvLDH lbdmA ali    knowns  lbdmA   sequence  TvLDH     a starting model   1   a ending model   5   a make               Figure 5 6 9 Script file  model single py  that generates five models     Model building    Once a target template alignment is constructed  MODELLER calculates a 3 D model  of the target completely automatically  using its automodel class  The script in Figure  5 6 9 will generate five different models of TvLDH based on the 1bdm
88. 5 6 24       Supplement 15       both for refinement and interpretation of the  models     Acknowledgments   The authors wish to express gratitude to  all members of their research group  This re   view is partially based on the authors  previous  reviews  Marti Renom et al   2000  Eswar  et al   2003  Fiser and Sali  2003a   They wish  acknowledge funding from Sandler Family  Supporting Foundation  NIH R01 GM54762   POL GM71790  POL A135707  and U54  GM62529  as well as hardware gifts from IBM  and Intel     Literature Cited   Abagyan  R  and Totrov  M  1994  Biased proba   bility Monte Carlo conformational searches and  electrostatic calculations for peptides and pro   teins  J  Mol  Biol  235 983 1002     Alexandrov  N N   Nussinov  R   and Zimmer  R M   1996  Fast protein fold recognition via sequence  to structure alignment and contact capacity po   tentials  Pac  Symp  Biocomput  1996 53 72     Altschul  S F   Madden  T L   Schaffer  A A   Zhang   J   Zhang  Z   Miller  W   and Lipman  D J  1997   Gapped BLAST and PSI BLAST  A new gener   ation of protein database search programs  Nucl   Acids Res  25 3389 3402     Andreeva  A   Howorth  D   Brenner  S E   Hubbard   T J   Chothia  C   and Murzin  A G  2004  SCOP  database in 2004  Refinements integrate struc   ture and sequence family data  Nucl  Acids Res   32 D226 D229     Aszodi  A  and Taylor  W R  1994  Secondary struc   ture formation in model polypeptide chains  Pro   tein Eng  7 633 644     Bairoch  A   Apweiler
89. 81  Comparative model building of  the mammalian serine proteases  J  Mol  Biol   153 1027 1042     Gribskov  M   McLachlan  A D   and Eisenberg   D  1987  Profile analysis  Detection of distantly  related proteins  Proc  Natl  Acad  Sci  U S A   84 4355 4358     Havel  T F  and Snow  M E  1991  A new method for  building protein conformations from sequence  alignments with homologues of known struc   ture  J  Mol  Biol  217 1 7     Henikoff  J G  and Henikoff  S  1996  Using substi   tution probabilities to improve position specific  scoring matrices  Comput  Appl  Biosci  12 135   143     Henikoff  J G   Pietrokovski  S   McCallum  C M    and Henikoff  S  2000  Blocks based methods  for detecting protein homology  Electrophoresis  21 1700 1706     Henikoff  S  and Henikoff  J G  1994  Position   based sequence weights  J  Mol  Biol  243 574   578     Higo  J   Collura  V   and Garnier  J  1992  De   velopment of an extended simulated annealing  method  Application to the modeling of comple   mentary determining regions of immunoglobu   lins  Biopolymers 32 33 43     Holm  L  and Sander  C  1991  Database algorithm  for generating protein backbone and side chain  co ordinates from a C alpha trace application  to model building and detection of co ordinate  errors  J  Mol  Biol  218 183 194     Hooft  R W   Vriend  G   Sander  C   and Abola   E E  1996  Errors in protein structures  Nature  381 272     Howell  P L   Almo  S C   Parsons  M R   Hajdu   J   and Petsko  G A  1992
90. An Introduction to Modeling Structure  from Sequence    There are literally millions of protein sequences in the various sequence databases  but  there are only a few tens of thousands of protein structures in the Protein Data Bank   The rate of growth of new sequences is a steeply rising exponential curve  that of new  structures  if exponential at all  is much shallower  There is no possibility that the number  of structures will ever approach  much less equal  the number of sequences  So what is  the point of initiatives such as Structural Genomics  What sense does it make to be the  tortoise in a race in which the hare has already won     The underlying premise behind all attempts to determine a large number of diverse  structures is that the total number of protein domain folds is much smaller  by many  orders of magnitude  than the total number of sequences  in other words  that many  sequences adopt essentially the same fold  If the fold of a protein could be recognized  from sequence information alone  then a complete database of all possible folds would  allow the structure corresponding to any sequence to be modeled  to at least some level  of accuracy     How reasonable is this assumption  It depends first of all on the reality of the limited  universe of domain folds   For the purpose of this discussion  the term    domain    means  any part of the structure of a protein that is sufficiently compact so as to give the  impression that it could fold stably without t
91. Contributed by Liisa Holm  Sakari K    ri  inen  Dariusz Plewczynski  and Chris Wilton 5 5 1    Current Protocols in Bioinformatics  2006  5 5 1 5 5 24    Copyright    2006 by John Wiley  amp  Sons  Inc     Supplement 14    When it is necessary to query many structures  it may be convenient to download the  DaliLite stand alone program package  This package uses the same comparison algo   rithms as the Dali Web servers but can be run locally on Linux based computers  see  Alternate Protocol 1  Alternate Protocol 2  and the Support Protocol      BASIC USING THE INTERACTIVE DaliLite SERVER FOR PAIRWISE  PROTOCOL 1 COMPARISONS    This interactive Web server provides a quick  convenient means for checking the structural  alignment of two known protein structures and for visualizing their structural superimpo   sition  Only the PDB identifiers of the structures are required  It is also possible to upload  user specific structures  A fast server can be accessed at http   www ebi ac uk DaliLite     Necessary Resources    Hardware    Computer connected to the Internet    Software    Internet browser  e g   Internet Explorer  http   www microsoft com  Netscape   http   browser netscape com   or Firefox  http   www mozilla org firefox    RasMol  unit 5 4  downloadable from Attp   www bernstein plus sons com   softwarelrasmol   or other PDB viewer    Files  User specific PDB files  optional    1  Go to http   www ebi ac uk DaliLite  The submission page for pairwise comparison  of protei
92. DH B99990001 pdb    aln   alignment env    code    TvLDH     mdl generate topology aln  sequence code   ini    mdl transfer xyz aln                 normalize profile True  smoothing window 15        mdl assess dope output 2  ENERGY PROFILE NO REPORT   file  TvLDH profile      aln append model mdl  atom files  TvLDH B99990001 pdb   align codes code   aln append model mdl  atom files  TvLDH B99990001 pdb   align codes code   ini         Figure 5 6 10 File evaluate model py  used to generate a pseudo energy profile for the model     Evaluating a model    If several models are calculated for the same target  the best model can be selected  by picking the model with the lowest value of the MODELLER objective function   which is reported in the second line of the model PDB file  In this example  the first  model  TvLDH  B99990001 pdb  has the lowest objective function  The value of the  objective function in MODELLER is not an absolute measure  in the sense that it can  only be used to rank models calculated from the same alignment     Once a final model is selected  there are many ways to assess it  In this example  the  DOPE potential in MODELLER is used to evaluate the fold of the selected model  Links  to other programs for model assessment can be found in Table 5 6 1  However  before any  external evaluation of the model  one should check the log file from the modeling run for  runtime errors  model single 109g  and restraint violations  see the MODELLER  manual for details   
93. Figure 5 3 6 CHI    Edit file    screen with structure parameters     Direction of helix  up down   Initial translational offset for helix along the z axis  default is 0 0 A      13b  Set search parameters  Fig  5 3 6      i  Extent of the search  full or symmetric     In a symmetric search  all of the helices rotate about their axis concomitantly            due  to the symmetry assumption in homo oligomers  Arkin  2002   In a full search all rotation  combinations are examined               The default for a homo oligomeric complex is a   symmetric search  while a  full search  is the default when analyzing hetero oligomers   A full search will obviously take much longer than a symmetric search due to the larger  number of structures generated  see below      ii  Search left handed crossing angles  true false  default is True     iii  Search right handed crossing angles true false  default is  True     iv  Type of molecular dynamics to use  torsion cartesian  default is torsion       The reader is referred to Rice and Br  nger  1994  in order to evaluate which type of  molecular dynamics to choose     Current Protocols in Bioinformatics       Modeling  Structure from  Sequence    5 3 7       Supplement 4    Modeling  Membrane  Proteins    5 3 8       Supplement 4       v  Number of trials per structure  i e   number of searches to perform using different  initial random velocities for each structure  default 1s 4      If one has chosen to simulate a hetero oligomer  the next
94. HHB   pdb  Finally  go to the Window menu and select the Command Line  window  This will open a new window that contains the Command Line interface   With the Mac OS X operating system one can also simply drag the icon for the  desired PDB file on to the icon for RasMol and the program will automatically open   load the coordinates  and display the default wireframe representation     At this point  the screen should look like Figure 5 4 1  with a graphics window that shows  the hemoglobin structure and a Command Line window  The molecule may be rotated  by holding down the mouse button in the graphics window and dragging the cursor in  different directions  Other transformations  such as scaling the image to different sizes  and translating the molecule to different locations in the screen  are accessible through  different buttons  if using a three button mouse  and through combinations of holding the  mouse button and depressing the Shift or Control keys  if using a one button mouse      The pull down menus in the graphics window make it possible to change the represen   tations used to display the molecule  as well as to change some common parameters  used to create the display     a  In the Display menu  several common representations of the structure may be  chosen  as described more fully in the Alternate Protocol                    Ble Edt yew Jemina Go Help    jezebel  1s  2hhb  pdb   jezebel  rasmol 2hhb pdb   RasMol Molecular Renderer   Roger Sayle  August 1995   C
95. HIRHMSNKGMEH  ident I H Fg I d  Hg     l  Sbjct  DSSP    HHHHHHLL    llLLLHHHHHLLL11111  LYSMKCKN    vvPLYDLLLEMLDahrlh 250  I   ot  LQVEKQSHpdivntLFPPLYKELEN  HHHHHHHLhhhhhhLLLHHHHHHHL       44    53             Figure 5 5 3 Structural alignment by the DaliLite server     Current Protocols in Bioinformatics                   Figure 5 5 4 Superimposition of the two protein chains in RasMol  stereo view  obtained by  clicking on the  Superimposed C alpha traces  link on view shown in Figure 5 5 2  The query  structure  mol1  is blue  and the second structure  mol2  is red  For the color version of this figure  go to http   www currentprotocols com     In the example in Figure 5 5 2 the link to the first structure file  unchanged  is called  moli original pdb  The second structure file with all ATOM coordinates of the  indicated chain rotated translated to match the first structure is called mo12 1 pdb     Note that only the indicated chains are superimposed  e g   moll A with mol2B   However   since any other chains will still be contained in the structure files  it may be desirable to  remove unwanted chains using a text editor before viewing the structures     7   To view the files for Rotation translation matrices for each alignment  Listing of  structurally equivalent residue ranges  and View the log  indicating all the steps  taken by the DaliLite application   click on the hyperlinks under the    Additional  data  header     These files are included for completeness but 
96. Hits further down the list have a much lower Z score than the nuclear receptors and  represent biologically noninteresting hits that match in a helical bundle motif     Typically secondary structure assignments agree very well even though sequence identity  may be low  see Fig  5 5 10      The StructurelSequence Alignment button  shown in Figure 5 5 10  augments the struc   tural alignment by additionally displaying related sequences  which are detected by PSI   Blast and stored in the ADDA database  Heger and Holm  2003   This view is useful for  checking sequence patterns that are conserved across distantly related protein families   Conserved functional sites are a strong hint at common evolutionary origins     In the alignment  residues are colored if the frequency of the amino acid type in the column  is above 50      7  Go back to the previous page and click on the 3D Superimposition button to view the  superimposed Ca traces of the selected structures in 3D using RasMol or another  PDB viewer  The 3D Superimposition button launches a RasMol script if the browser  is appropriately configured  Use the    PDB format  button to download the Ca  coordinates of selected neighbors superimposed onto the query structure     Make external links to the Dali database    8  External sites may be linked directly to the query engine of the Dali  database  To make a link from a PDB identifier to the database  use  the call Attp   www bioinfo biocenter helsinki fildaliquery search term
97. I EEEELLEEE HHHHHHLL  IHHHHHHHHL   HHHHHHHHHHHH  EBELLLLLLLEELHHHHHLLI  IHHHHHHHHHHHHHHHHHHHHHE  EEEEELLEEEEHHHHHHLL  IHHHHHHHHHHHHHHHHHHHLLL EEEELLEEE HHHHHHLL  HHHLLHHHHHHHHHHHHHHHHHHHHHHHH LL  LLEEEELLEEE  LLEEEELLEEE  HHHH                   Figure 5 5 11    ALTERNATE  PROTOCOL 1    Using Dali for  Structural  Comparison of  Proteins    5 5 12       Supplement 14    Multiple structure alignment of estradiol receptor and selected structural neighbors  Notation  three   state secondary structure definitions by DSSP  reduced to H helix  E sheet  L coil  are shown below the amino acid  sequences  For the color version of this figure go to http www currentprotocols com     OF DaliLite    comparison     Hardware    Software    Files       Necessary Resources    Download data from the Dali database  9  For noninteractive use  comprehensive computer readable database dumps are pro   vided for large scale studies  These are accessed from the link to Downloads from the  home page of the Dali database  Attp   www bioinfo biocenter helsinki fil dali start      COMPARING TWO STRUCTURES USING THE STAND ALONE VERSION    This simple protocol is the command line version of that performed online by the DaliLite  server for pairwise structure comparison  Basic Protocol 1   The inputs are two protein  structures in PDB format  The output is a set of HTML files  which should be viewed  from a browser  Rough timings are from a few seconds up to tens of seconds per pairwise    Computer that oper
98. Literature Cited   Adams  P D   Arkin  I T   Engelman  D M   and  Br  nger  A T  1995  Computational searching  and mutagenesis suggest a structure for the pen   tameric transmembrane domain of phospholam   ban  Nat  Struct  Biol  2 154 162     Arkin  I T  2002  Structural aspects of oligomeriza   tion taking place between the transmembrane  alpha helices of bitopic membrane proteins  Bio   chim  Biophys  Acta 1565 347 363     Arkin  I T   Adams  P D   MacKenzie  K R   Lem   mon  M A   Br  nger  A T   and Engelman  D M   1994  Structural organization of the pentameric  transmembrane alpha helices of phospholam   ban  a cardiac ion channel  EMBO J  13 4757   4764     Br  nger  A T   Adams  P D   Clore  G M   DeLano   W L   Gros  P   Grosse Kunstleve  R W   Jiang   J S   Kuszewski  J   Nilges  M   Pannu  N S    Read  R J   Rice  L M   Simonson  T   and War   ren  G L  1998  Crystallography  amp  NMR system   A new software suite for macromolecular struc   ture determination  Acta Crystallogr  D Biol   Crystallogr  54 905 921     Kukol  A   Adams  P D   Rice  L M   Br  nger  A T    and Arkin  T I  1999  Experimentally based ori   entational refinement of membrane protein mod   els  A structure for the Influenza A M2 H  chan   nel  J  Mol  Biol  286 951 962     Kukol  A   Torres  J   and Arkin  I T  2002  A struc   ture for the trimeric MHC class Il associated  invariant chain transmembrane domain  J  Mol   Biol  320 1109 1117     Lemmon  M A   Flanagan  J M   Hunt  J F   Adair  
99. R GAMMA  m 16  lie9A 22 3 21 0 0 223 255 PDB VITAMIN D3 RECEPTOR  T 17  indh   22 2 18 2 4 217 244 PDB NUCLEAR RECEPTOR ROR BETA  T 18  luhlJB 22 2 17 2 5 212 219 PDB RETINOIC ACID RECEPTOR RXR BETA  T 19  lovlE 22 0 23 2 7 215 250 PDB ORPHAN NUCLEAR RECEPTOR NURR1  MSE 414  496  T 20  inavA 21 9 16 2 5 215 253 PDB HORMONE RECEPTOR ALPHA 1  THRAl  T 21  lng2A 21 9 17 0 0 213 255 PDB THYROID HORMONE RECEPTOR BETA 1  C 22  ls0xA 21 9 17 2 4 214 251 PDB NUCLEAR RECEPTOR ROR ALPHA  T 23  lhg4A 21 7 26 2 5 203 240 PDB ULTRASPIRACLE  T 24  lyjeA 21 6 25 2 9 208 226 PDB ORPHAN NUCLEAR RECEPTOR NR4Al  T 25  IxlsF 21 5 19 2 8 221 242 PDB RETINOIC ACID RECEPTOR RXR ALPHA  T 26  lg2n   21 3 26 3 0 204 246 PDB ULTRASPIRACLE PROTEIN  C 27  JoshA 21 1 18 2 3 202 216 PDB BILE ACID RECEPTOR  T 28  lot7A 21 0 19 2 7 213 229 PDB BILE ACID RECEPTOR  m 29  ldkfB 21 0 19 2 7 208 232 PDB RETINOID X RECEPTOR ALPHA  T 30  Ixv9B 20 5 20 2 7 220 246 PDB RETINOIC ACID RECEPTOR RXR ALPHA  T 31  3qwxB 20 4 15 2 9 222 271 PDB PEROXISOME PROLIFERATOR ACTIVATED RECEPTOR  PPAR  m 32  lnr A 20 3 15 2 6 213 278 PDB ORPHAN NUCLEAR RECEPTOR PXR  T 33  llbd  19 0 29 2 9 194 238 PDB RETINOID X RECEPTOR  T 34  lkkgA 18 1 15 2 9 214 269 PDB PEROXISOME PROLIFERATOR ACTIVATED RECEPTOR xl        4             Figure 5 5 10 Clicking on the  interact  link in Figure 5 5 8 or 5 5 9 leads to the list of structural  neighbors of estradiol receptor  Hits 1 34 are members of the same fold class comprising nuclear  receptors  
100. RasMol gt  wireframe off    This turns off the wireframe on histidine 92     RasMol gt  select HIS92 D and  sidechain or alpha     This selects only the sidechain atoms and the alpha carbon atom in histidine 92     RasMol gt  wireframe 100    This draws the histidine 92 sidechain with a thick wireframe     VIEWING THE APPROPRIATE BIOLOGICAL UNIT BASIC    Coordinate files from the PDB are full of surprises  This is sometimes a source of de  PROTEGER    light  but often a source of frustration  A major challenge when examining a structure  is determining whether it includes an appropriate biological unit  The biological unit is  defined as the physiologically relevant state of the molecule  such as a complex of four  chains in hemoglobin or an entire icosahedral structure in a viral capsid  Unfortunately   the coordinate sets obtained from the PDB  since they are subject to the methodol   ogy used in the structure determination  do not always include exactly one biological  unit  The challenge is to generate a file that includes coordinates for the entire bio   logical unit  This protocol describes how to view the appropriate biological unit using  RasMol     Necessary Resources  Hardware    RasMol runs on a variety of computer hardware  including personal computers     Software    Operating system  RasMol runs under Microsoft Windows and Apple Macintosh  OS 7 0 or higher  including Mac OS X   It may also be run on workstations  under Unix  Linux  or VMS    RasMol  Binary versi
101. Rychlewski  L  2001  LiveBench 1  Continu   ous benchmarking of protein structure predic   tion servers  Protein Sci  10 352 361     Bystroff  C  and Baker  D  1998  Prediction of local  structure in proteins using a library of sequence   structure motifs  J  Mol  Biol  281 565 577     Canutescu  A A   Shelenkov  A A   and Dunbrack   R L  Jr  2003  A graph theory algorithm for  rapid protein side chain prediction  Protein Sci   12 2001 2014     Chinea  G   Padron  G   Hooft  R W   Sander  C   and  Vriend  G  1995  The use of position specific ro   tamers in model building by homology  Proteins  23 415 421     Chothia  C  and Lesk  A M  1987  Canonical  structures for the hypervariable regions of im   munoglobulins  J  Mol  Biol  196 901 917     Current Protocols in Bioinformatics    Chothia  C   Lesk  A M   Tramontano  A   Levitt   M   Smith Gill  S J   Air  G   Sheriff  S   Padlan   E A   Davies  D   Tulip  W R   Colman  P M    Spinelli  S   Alzari  P M   and Poljak  J  1989   Conformations of immunoglobulin hypervari   able regions  Nature 342 877 883     Claessens  M   Van Cutsem  E   Lasters  I   and  Wodak  S  1989  Modelling the polypeptide  backbone with    spare parts    from known pro   tein structures  Protein Eng  2 335 345     Claude  J B   Suhre  K   Notredame  C   Claverie   J M   and Abergel  C  2004  CaspR  A web  server for automated molecular replacement  using homology modelling  Nucl  Acids Res   32 W606 W609     Clore  G M   Brunger  A T   Karplus  M   an
102. T searches for DNA queries  Bioinformat   ics 14 890 891     Wu  G   Fiser  A   ter Kuile  B   Sali  A   and  Muller  M  1999  Convergent evolution of Tri   chomonas vaginalis lactate dehydrogenase from  malate dehydrogenase  Proc  Natl  Acad  Sci   U S A  96 6285 6290     Xiang  Z   Soto  C S   and Honig  B  2002  Evaluat   ing conformational free energies  The colony  energy and its application to the problem of  loop prediction  Proc  Natl  Acad  Sci  U S A   99 7432 7437     Xu  J   Li  M   Kim  D   and Xu  Y  2003  RAP   TOR  Optimal protein threading by linear pro   gramming  J  Bioinform  Comput  Biol  1 95   117     Xu  L Z   Sanchez  R   Sali  A   and Heintz  N  1996   Ligand specificity of brain lipid binding protein   J  Biol  Chem  271 24711 24719     Ye  Y   Jaroszewski  L   Li  W   and Godzik  A  2003   A segment alignment approach to protein com   parison  Bioinformatics 19 742 749     Yona  G  and Levitt  M  2002  Within the twi   light zone  A sensitive profile profile compar   ison tool based on information theory  J  Mol   Biol  315 1257 1275     Zheng  Q   Rosenfeld  R   Vajda  S   and DeLisi  C   1993  Determining protein loop conformation  using scaling relaxation techniques  Protein Sci   2 1242 1248     Zhou  H  and Zhou  Y  2002  Distance scaled  fi   nite ideal gas reference state improves structure   derived potentials of mean force for structure  selection and stability prediction  Protein Sci   11 2714 2726     Zhou  H  and Zhou  Y  2004  Single b
103. UPPORT  PROTOCOL    Using Dali for  Structural  Comparison of  Proteins       5 5 16    Supplement 14       The database search option    search  uses the same shortcuts as the Dali server   Note that using this option is dependent on an up to date list of representative  structures and the complete database of precomputed structural alignments  This  database resides in the DCCP  subdirectory  Updates of the database are available  for download  Click the Downloads link on the home page of the Dali database  http   www  bioinfo biocenter helsinki fildali start     11  Convert the alignment file  files with the extension  dccp in DaliLite   s internal  format  to a readable format using the     format option    The arguments to the      format option are the identifier of the query structure  the alignment datafile  a  listfile of valid identifiers  and the name of the output file illustrated in the following  command     Linux prompt   perl DaliLite  format 3ubpC 3ubpC dccp  representatives list 3ubpC html  Only comparisons to structures listed in the listfile will be output     12  The output file is in HTML format  It contains the list of structural neighbors and  links to the structural alignments similar to Figure 5 5 2     13  To construct a similarity matrix of a large set of proteins  extract the DCCP lines  from the alignment data files      dccp    The similarity matrix can be used as input data for hierarchical clustering     Note that several alternative alignment
104. abase searches and simulated annealing  J   Mol  Graph  Model  18 258 272  305 256     Pearson  W R  and Lipman  D J  1988  Improved  tools for biological sequence comparison   Proc  Natl  Acad  Sci  U S A  85 2444 2448        Modeling  Structure from  Sequence    5 2 3       Supplement 4    FAMS and  FAMSBASE for  Protein Structure    5 2 4       Supplement 4       Yamaguchi  A   Iwadate  M   Suzuki  E  I   Yura   K   Kawakita  S   Umeyama  H   and Go  M   2003  Enlarged FAMSBASE  Protein 3D  structure models of genome sequences for 41  species  Nucleic Acids Res  31 1 6     Internet Resources  http   physchem pharm kitasato u ac jp FAMS     FAMS Web site     http   famsbase bio nagoya u ac jp famsbase   FAMSBASE Web site     http   spock genes nig ac jp  genome   gtop html    GTOP Web site        Contributed by Hideaki Umeyama and  Mitsuo Iwadate   Kitasato University   Tokyo  Japan    Figures 5 2 1 5 2 12 appear on the following pages     Current Protocols in Bioinformatics          protein sequence                check FAMSBASE  at  http   famsbase  bio nagoya u ac jp famsbase              3 D structure found  v       good structure           3 D structure not found                                                       no  yes  v  v  E model a protein structure  at  http   physchem pharm kitasato u ac jp FAMS   good structure   yes no  protein  report to developer of FAMS  structure at                   fams  pharm kitazato u ac jp                Figure 5 2 1 Flowchart of
105. ackbone representation of the hemoglobin protein chains  with the hemes still  shown as spacefilling spheres  For the color version of this figure go to http   www currentprotocols   com     a  In the Command Line window  type   RasMol gt  select protein  This selects the protein   RasMol gt  cpk off  This turns off the spheres  the spheres for the heme remain on      RasMol gt  backbone 100       This draws a tube along the backbone   The display should look like Figure 5 4 4    b  Rotate the display and notice the following   1  Backbone representations show  the folding of the protein chain  making it easy to recognize the many alpha helices  in this globin fold   2  Backbone representations typically under represent the size  of the protein  and ignore the dense packing of atoms in the structure  Explore  this by flipping the spacefilling representation on and off by typing cpk and then  cpk off in the Command Line window   3  The position of each alpha carbon  is retained in the diagram  so it is possible to identify the location of each amino  acid    c  Next  in the Command Line window  type the following commands    RasMol gt  backbone off    This turns off the protein backbone     Current Protocols in Bioinformatics                         Figure 5 4 5 Ribbon diagram  cartoon  of the hemoglobin protein chains  with the hemes as  spacefilling spheres  For the color version of this figure go to htip  Awww currentprotocols com     RasMol gt  cartoon  This turns on the r
106. ain menu animation tools    You can now play the movie of the loaded trajectory back and forth  using the animation  tools in Figure 5 7 15     7  By dragging the slider  Fig  5 7 15   one navigates through the trajectory  The buttons  to the left and to the right from the slider panel allow one to jump to the end of the  trajectory or go back to the beginning     8  For example  create another representation for water in the Graphical Represen   tations window  click on the Create Rep button  in the Selected Atoms field  type  water and hitenter  for Drawing Method  choose Lines  for Coloring Method  select  Name     This representation of water shows the water droplet present in the simulation  Using  the slider  observe the behavior of the water around the protein  The shape of the water  droplet changes throughout the simulation  because water molecules follow the protein as  it unfolds  driven by the interactions with the protein surface     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 19       Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 20       Supplement 24       When playing animations  you can choose between three looping styles  Once  Loop  and  Rock  You can also jump to a frame in the trajectory by entering the frame number in the  window on the left of the  slider  panel     Smoothing trajectories    9  For clarity  turn off the water representation by double clicking on it in the Graphical  Representations w
107. al   1998   in annotating single  nucleotide polymorphisms  Mirkovic et al    2004  Karchin et al   2005   in structural char   acterization of large complexes by docking  to low resolution cryo electron density maps   Spahn et al   2001  Gao et al   2003   and in ra   tionalizing known experimental observations    Fortunately  a 3 D model does not have to  be absolutely perfect to be helpful in biol   ogy  as demonstrated by the applications listed  above  The type of a question that can be ad   dressed with a particular model does depend  on its accuracy  Fig  5 6 13     At the low end of the accuracy spectrum   there are models that are based on less than  25  sequence identity and that sometimes  have less than 5096 of their C  atoms within  3 5    of their correct positions  However  such  models still have the correct fold  and even  knowing only the fold of a protein may some   times be sufficient to predict its approximate  biochemical function  Models in this low range  of accuracy  combined with model evaluation   can be used for confirming or rejecting a match  between remotely related proteins  Sanchez  and Sali  1997a  1998     In the middle of the accuracy spectrum are  the models based on approximately 35  se   quence identity  corresponding to 8596 of the  C  atoms modeled within 3 5    of their correct  positions  Fortunately  the active and binding  sites are frequently more conserved than the  rest of the fold  and are thus modeled more ac   curately  Sanch
108. al  so that the most similar structures are found near the tips of the    fold tree      and more general similarities of fold types are found nearer the root  The organization of  fold space is based on Z scores     The Z Score is the most important measure of quality of the structural alignment  Homol   ogous proteins cluster at the top of the ranked list  but the boundary between homologous  and unrelated proteins varies from one family to another  As a general rule  a Z score  above 20 means the two structures are definitely homologous  between 8 and 20 means  the two are probably homologous  between 2 and 8 is a grey area  and a Z Score below  2 is not significant     The size of the proteins influences Z scores  small structures will tend to have small  Z Scores  whereas a medium Z Score for very large structures need not imply a bio   logically interesting relationship  Fold type also has an effect     8 proteins also usually  have higher Z scores than all  proteins  For example  TIM barrel proteins have about  sixteen secondary structure elements in a similar  B     8 barrel topology and are unified  at Z scores above 10  In contrast  two small avian polypeptides  PDB codes 1ppt and  1bba  contain only one helix and a proline rich loop and get a Z score around 4  In  view of the Z score  it is much more improbable to observe sixteen helices and strands  arranged in a similar fold than to find a similar arrangement of just a helix and a loop     Current Protocols in Bi
109. al contains instructions for using a variety of  other specialized commands once the basics are mastered     Take some time to explore the options available in the pull down menus and to become  familiar with manipulating the molecule  When ready to move on to the next step  quit the  program by typing  in the Command Line window    RasMol gt  quit    The remainder of this protocol will discuss a few useful representations and provide a  few tips to solve common problems  The Command Line window will be used to change  representations and colors  allowing more control than that available through the pull   down menus    Representations and their uses   Three basic types of representations are commonly used to display biological molecules    Each has its own strengths and weaknesses  and each is designed for a specific use     6  Wireframe diagrams  The default representation in RasMol is a wireframe diagram   Each line represents a covalent bond between atoms  This representation is ideal for  examination of the atomic details of the structure  However  wireframe representa   tions tend to be very complicated  This is acceptable when examining the structure  interactively  but wireframe representations are generally too crowded for printed  images  The following describes how to create a wireframe diagram    a  Restart RasMol using the 2hhb coordinates    b  In the Command Line window  type the following series of commands   RasMol gt  select HEM  This selects the heme group 
110. ally with minimal human administra   tive effort  The assumption that the fold space  graph is complete is critical to exhaustive  database searching but can sometimes be vio   lated for the following reasons  unpredictable  failure of the database update  blackouts  com   puter crashes  network failures  over running  disk space  etc    failure to process the PDB  entry  for example  chains longer than 1000    Modeling  Structure from  Sequence    5 5 21       Supplement 14    Using Dali for  Structural  Comparison of  Proteins    5 5 22       Supplement 14       residues are not handled well   or program  bugs  Please report unexpected behavior to  dali help ebi ac uk     Literature Cited   Chothia  C  and Lesk  A M  1986  The relation be   tween the divergence of sequence and structure  in proteins  EMBO J  5 823 826     Dietmann  S  and Holm  L  2001  Identification  of homology in protein structure classification   Nat  Struct  Biol  8 953 957     Falicov  A  and Cohen  F E  1996  A surface of min   imum area metric for the structural comparison  of proteins  J  Mol  Biol  258 871 892     Heger  A  and Holm  L  2003  Exhaustive enumer   ation of protein domain families  J  Mol  Biol   328 749 767     Holm  L  and Sander  C  1993  Protein structure  comparison by alignment of distance matrices   J  Mol  Biol  233 123 138     Holm  L  and Sander  C  1994  Parser for protein  folding units  Proteins 19 256 268     Holm  L  and Sander  C  1995  3 D lookup  Fast  protein stru
111. ances between two atoms or  angles between three atoms  the atoms do not have to be physically connected by bonds  in the molecule     1  Start a new VMD session  Load the ubiquitin trajectory into VMD  using the files  ubiquitin psf and pulling dcd   For graphical representation  display pro   tein only  using NewCartoon for drawing method and Structure for coloring method   If you need help  see Basic Protocol 3  steps 1 to 6     2  Choose the Mouse     Labels     Atoms menu item from the VMD Main menu  The  mouse is now set to the mode for displaying atom labels  You can click on any atom  on your molecule and a label will be placed for this atom  Clicking again on it will  erase the label     3  We will now try the same for bonds  Choose the Mouse     Label     Bonds menu  item from the VMD Main menu  This selects the Display Label for Bond mode     We will consider the distance between the    carbon of Lysine 48 and of the C terminus   In the pulling simulation  the former is kept fixed  and the latter is pulled at a constant  force of 500 pN  In reality  polyubiquitin chains can be linked by a connection between the  C terminus of one ubiquitin molecule and the Lysine 48 of the next  The simulation then  mimics the effect of pulling on the C terminus with this kind of linkage     4  Open the TkConsole window by selecting Extensions     Tk Console in the VMD  Main menu  We will make a VDW representation for the    carbons of Lysine 48 and  of the C terminus  To find out 
112. and  Blundell  1993  Sanchez and Sali  1997a b     Errors in regions without a template   Fig  5 6 12C   Segments of the target se   quence that have no equivalent region in the  template structure  i e   insertions or loops  are  the most difficult regions to model  If the in   sertion is relatively short   lt 9 residues long   some methods can correctly predict the con   formation of the backbone  van Vlijmen and  Karplus  1997  Fiser et al   2000  Jacobson  et al   2004   Conditions for successful pre   diction are the correct alignment and an accu   rately modeled environment surrounding the  insertion    Errors due to misalignments  Fig 5 6 12D     The largest single source of errors in compar   ative modeling is misalignments  especially  when the target template sequence identity de   creases below 3096  However  alignment er   rors can be minimized in two ways  First   it is usually possible to use a large number  of sequences to construct a multiple align   ment  even if most of these sequences do  not have known structures  Multiple align   ments are generally more reliable than pair   wise alignments  Barton and Sternberg  1987   Taylor et al   1994   The second way of im   proving the alignment is to iteratively modify  those regions in the alignment that correspond  to predicted errors in the model  Sanchez and  Sali  1997a b  John and Sali  2003     Incorrect templates  Fig  5 6 12E   This is a  potential problem when distantly related pro   teins are used as te
113. and changing itto E  coli aquaporin in the pop up window     Drawing different representations for different molecules   Before we continue exploring other features in the Molecule List Browser  take a look  at your OpenGL Display window  You have two aquaporin structures  but since they are  both shown in the same default representation  it is difficult to distinguish them  To tell  them apart  you can assign them different representations     6  Open the Graphical Representations window via Graphics     Representations       from the VMD Main menu  Make sure    O human aquaporin    is selected in the  Selected Molecule pull down menu on top  Select NewCartoon for Drawing Method   and ColorID     1 red for Coloring Method     7  In the Graphical Representations window  select    1 E  coli aquaporin    in the Selected  Molecule pull down menu on top  Select NewCartoon for Drawing Method  and  ColorID     4 yellow for Coloring Method  Close the Graphical Representations  window     Now  your OpenGL Display window should show a human aquaporin colored in red and  an E  coli aquaporin colored in yellow     Molecule status flags    In your OpenGL Display window  try moving the aquaporins around with your mouse  in different mouse modes  rotating  scaling  and translating   You can see that both  aquaporins move together  You can fix any molecule by double clicking the    F     fixed   flag in the Molecule List Browser on the left of the molecule name     Current Protocols in Bioi
114. apep   tides in different proteins do not always have  the same conformation  Kabsch and Sander   1984  Mezei  1998   Some additional restraints  are provided by the core anchor regions that  span the loop and by the structure of the rest  of the protein that cradles the loop  Although  many loop modeling methods have been de   scribed  it is still challenging to correctly and  confidently model loops longer than  8 to 10  residues  Fiser et al   2000  Jacobson et al    2004     There are two main classes of loop   modeling methods   i  database search ap   proaches that scan a database of all known  protein structures to find segments fitting  the anchor core regions  Jones and Thirup   1986  Chothia and Lesk  1987    ii  confor   mational search approaches that rely on opti   mizing a scoring function  Moult and James   1986  Bruccoleri and Karplus  1987  Shenkin  et al   1987   There are also methods that com   bine these two approaches  van Vlijmen and  Karplus  1997  Deane and Blundell  2001     Loop modeling by database search   The  database search approach to loop modeling  is accurate and efficient when a database of  specific loops is created to address the mod   eling of the same class of loops  such as  p hairpins  Sibanda et al   1989   or loops on  a specific fold  such as the hypervariable re   gions in the immunoglobulin fold  Chothia  and Lesk  1987  Chothia et al   1989   There  are attempts to classify loop conformations  into more general categories  thus 
115. aquaporin 1  Nature 407 599   605     Phillips  J C   Braun  R   Wang  W   Gumbart  J    Tajkhorshid  E   Villa  E   Chipot  C   Skeel   R D   Kale  L   and Schulten  K  2005  Scalable  molecular dynamics with NAMD  J  Comput   Chem  26 1781 1802     Roberts  E   Eargle  J   Wright  D   and Luthey   Schulten  Z  2006  MultiSeq  Unifying sequence  and structure data for evolutionary analysis   BMC Bioinformatics  7 382     Russell  R B  and Barton  G J  1992  Multiple pro   tein sequence alignment from tertiary structure  comparison  Assignment of global and resiude  confidence levels  Proteins 14 309 323     Savage  D F   Egea  P F   Robles Colmenares  Y    O Connell  J D  III  and Stroud  R M  2003  Ar   chitecture and selectivity in aquaporins  2 5 A  X ray structure of aquaporin Z  PLoS Biol   1 E72     Sotomayor  M   Vasquez  V   Perozo  E   and  Schulten  K  2007  Ion conduction through  MscS as determined by electrophysiology and  simulation  Biophys  J  92 886 902     Sui  H   Han  B  G   Lee  J K   Walian  P   and Jap   B K  2001  Structural basis of water specific    Tajkhorshid  E   Nollert  P   Jensen  M      Miercke   L J W   O Connell  J   Stroud  R M   and  Schulten  K  2002  Control of the selectiv   ity of the aquaporin water channel family by  global orientational tuning  Science 296 525   530     Thompson  J D   Higgins  D G   and Gibson  TJ   1994  CLUSTAL W  Improving the sensitiv   ity of progressive multiple sequence alignment  through sequence weighting
116. are almost identical  both in terms of sequence and structure  However  7mdh A  has a better crystallographic resolution than Iciv A  2 4 A versus 2 8 A   From the  second group of similar structures  Smdh A  Ibdm A  and 1b8p A   Ibdm A has the best  resolution  1 8      Ismk A is most structurally divergent among the possible templates   However  it is also the one with the lowest sequence identity  34   to the target sequence   build profile prf   lbdm A is finally picked over 7mdh A as the final template  because of its higher overall sequence identity to the target sequence  45       Aligning TvLDH with the template   One way to align the sequence of TvLDH with the structure of Ibdm A is to use  the align2d   command in MODELLER  Madhusudhan et al   2006   Although  align2d   is based on a dynamic programming algorithm  Needleman and Wunsch   1970   itis different from standard sequence sequence alignment methods because it takes  into account structural information from the template when constructing an alignment   This task is achieved through a variable gap penalty function that tends to place gaps in  solvent exposed and curved regions  outside secondary structure segments  and between  two positions that are close in space  In the current example  the target template similarity  is so high that almost any alignment method with reasonable parameters will result in  the same alignment     Current Protocols in Bioinformatics       Sequence identity comparison  ID TABLE 
117. are encouraged to take a look at that file using a text editor  Hopefully  by the end of  this section  you ll understand many of those commands  In fact  you can execute the file  in the Tk Console the same way as you execute other script files  i e   by typing source  myfirststate vmd in the Tk Console window     Many times when you write a script you might want to look up the command  for an interactive VMD feature  You can either find it in the VMD User s Guide   http    www ks uiuc edu Research vmd vmd 1 8 6 ug   or conveniently use the console  command  Try typing logfile console in your Console window  This creates a  logfile for all your actions in VMD and writes them in the Console window as command  lines  If you execute those command lines  you can repeat the exact same actions you  have performed interactively  To turn off logfile  type logfile off     Current Protocols in Bioinformatics    BASIC  PROTOCOL 7    Modeling  Structure from  Sequence    5 7 27       Supplement 24    BASIC  PROTOCOL 8    Using VMD  An  Introductory  Tutorial    5 7 28       Supplement 24       Drawing Shapes Using VMD Text Commands    VMD offers a way to display user defined objects built from graphics primitives such as  points  lines  cylinders  cones  spheres  triangles  and text  The command that can realize  those functions is graphics  the syntax of which is graphics molid command   where molid is a valid molecule ID and command is one of the commands shown  below  Let us try draw
118. are not important to most users     8  Check the data under the Inputs header at the bottom of the results page for a  summary of the two inputs  including header information and a report of the chains  found within each structure file     If these data are not as expected  it is apparent that file upload  rather than the program  itself  may have failed for one reason or another   Modeling    Structure from  Sequence       5 5 5    Current Protocols in Bioinformatics Supplement 14    BASIC  PROTOCOL 2    Using Dali for  Structural  Comparison of  Proteins    5 5 6       Supplement 14       SEARCHING FOR STRUCTURAL NEIGHBORS USING THE Dali E MAIL  SERVER    The Dali server is an easy to use network service for comparing protein structures  It  is routinely used by structural biologists to compare a newly solved structure against  previously known structures  In favorable cases  comparing 3D structures may reveal  biologically interesting similarities that are not detectable by comparing sequences   Submitting the coordinates of a query protein structure to Dali compares them to those in  the Protein Data Bank  and a multiple alignment of structural neighbors is e mailed back   Structural neighbors of a protein already in the Protein Data Bank can be found in the  Dali database  Basic Protocol 3   The Dali server  Attp   www ebi ac uk dali  is hosted  by the European Bioinformatics Institute  EBI   Structure submission can be made either  interactively or by e mail  E mail sub
119. artistry  and the fun  of molecular  graphics begins when displays are customized for specific applications  as described in  this protocol     To develop sophisticated displays  it is useful to use the scripting function of RasMol  This  makes it possible to type all of the commands in a separate text file and then read them  into RasMol  The command is RasMol gt  script file txt where file txt is  the name of the script file     Necessary Resources    Hardware    RasMol runs on a variety of computer hardware  including personal computers     Software    Operating system  RasMol runs under Microsoft Windows and Apple Macintosh  OS 7 0 or higher  including Mac OS X   It may also be run on workstations  under Unix  Linux  or VMS    RasMol  Binary versions of RasMol are available on the WWW at   http    www bernstein plus sons com software rasmol   Downloading and  installation instructions are given in Support Protocol 1     Files    Coordinate files are read in a variety of formats  including PDB  Mol2  CHARMm   and mmCIF  The program deals gracefully with a number of variations of these  files  including files containing coordinates for multiple conformers or multiple  models  In this example  coordinates for hemoglobin  2HHB   pdb  obtained  from the Protein Data Bank  PDB  unr 1 9  are used  instructions for  downloading the PDB coordinate file are given in Support Protocol 2     Color management   RasMol recognizes a number of common colors with commands such as color red
120. ates the Linux operating system  e g   Sun  Alpha  Silicon  Graphics  PC     DaliLite program  see Support Protocol    Perl interpreter  Perl v  5 0 or higher  Attp   www perl org    Internet browser  e g   Internet Explorer  http   www microsoft com  Netscape   http llbrowser netscape com  or Firefox  http   www mozilla org firefox     Two protein structures in PDB format files    1  Download and install DaliLite as described in the Support Protocol     Current Protocols in Bioinformatics    2  The option to run DaliLite is DaliLite    pairwise  lt pdbfilel gt   lt pdbfile2 gt   where  the arguments  lt pdbfilel gt   lt pdbfile2 gt  should be replaced by the PDB file names  entered as user input after the Linux prompt as in the example below     Linux prompt gt  perl DaliLite  pairwise  pdb lwsy brk   pdb 2kau brk  gt  log    Linux prompt gt  netscape index html    3  The program computes the structural alignments for all chains in pdbfilel against  all chains in pdbfile2 and creates a set of HTML pages linked from the top page     index html     The first structure is called moll  and the second     mol2     All data  are stored in the current work directory  overwriting any previous results generated  using this option  The output is identical to that from Basic Protocol 1  Figs  5 5 2  through 5 5 4      COMPARING LARGE SETS OF STRUCTURES USING THE STAND ALONE  VERSION OF DaliLite    This is a more advanced protocol that allows the systematic comparison of large sets of  
121. aw the beginning of the trajectory in red  the middle in white  and the end in  blue     14  We can also use smoothing to make the large scale motion of the protein more  apparent  Go back to the Trajectory tab  and set the smoothing window to 20  The  result should look like Figure 5 7 16     Updating selections  Now  we will see how to make VMD update the selection each frame     15  Hide the current representation showing all frames  and display only the water repre   sentation by double clicking on it  Change the text in the Selected Atoms field from  water to water and within 3 of protein and hit enter  This will show  all water atoms within 3    of the protein     16  Play the trajectory     As you can see  although the displayed water atoms may be near the protein for a little  while  they soon wander off  and are still shown despite no longer meeting the selection  criteria  The Update Selection Every Frame option in the Trajectory tab of the Graphical  Representations window remedies this  If the option box is checked  the selection is updated  every frame  See Figure 5 7 17     17  Quit VMD     Current Protocols in Bioinformatics                   Figure 5 7 16 Image of every tenth frame shown at once  smoothed with a 20 frame window   For the color version of this figure go to http  www currentprotocols com                    Figure 5 7 17 Water within 3 A of the protein  shown for a selection that is not updated  A  and  for the one that is updated  B  each frame 
122. bese  merce e ike  Eie    ae UR       I Bacteriophage T404     2  Search for ORFs by Gene ORF  Name      Gene Nome    es 00029  3  Search for ORFs by PDB ID of Reference Protein      pow      fxIRKO1EYH A   4  Search for ORFs by Motif Name      Keywords  Ee fexamino   5  Search for ORFs by Keywords in ORF Product         Keywords   f ex RNA factor   6  Search for ORFs by FAMS Results           0    C Search for ORFs with 3D structures produced by FAMS   C Search for ORFs without 3D structures produced by FAMS     both    Search for ORFs by Hetero Atom of Reference Protein    Hetero Atom Code    Cetero Atom List   Jess   fest    File Upioad  ej    Or Enter query sequence     zj  Subset ranee   Submit Query_  Clow      pric ber  D  D  0 4  7 Recistered bc      D O B  5 ce 20000                  Figure 5 2 4 The lower part of the search page of FAMSBASE  Text boxes and radio buttons for  searching the database are provided     FAMS and  FAMSBASE for  Protein Structure    5 2 8          Supplement 4 Current Protocols in Bioinformatics       With RRO RTUY BRCADA  D AWIW  s e 2   6 18   2  Re 3 4   PEUAQ   E nc  tamebasebionseoyacuac p ce beVtamsbase reistercei     Fr o t  Goge  e dom   BEA vc     Chlamydia trachomatis  tra   l  Chlamydia muridarum  ctraM   V  Deinococcus radodurans  dr ad    F Escherichia cow ecol     Escherichia coli 015747 lecok 0157      haemophilus influenzae hint   F  Helicobacter ptori hpy      Lactococcus lactis subsp lactis   lact   V  Mycoplasma genitaliu
123. best on close up pictures  which focus on a  few details     7  Spacefilling diagrams  Spacefilling representations show the size and shape of the  entire molecule  Each atom is represented by a sphere that represents the optimal  contact distance between nonbonded atoms  The following describes how to create  a spacefilling diagram    a  In the Command Line window  type the following series of commands   RasMol gt  select all  This selects all atoms   RasMol gt  wireframe off  This turns off the wireframe   RasMol gt  select protein or ligand    This selects only the protein atoms and the ligand  heme  atoms        RasMol gt  cpk    This displays atoms as spheres  using the default radius for the spheres     Current Protocols in Bioinformatics             r  v 2HHB   xXx                   Figure 5 4 3 Spacefilling  cpk  representation of hemoglobin with each chain colored differently   For the color version of this figure go to http  Wvww currentprotocols com     RasMol gt  color chain  This colors each chain a different color     The display should look like Figure 5 4 3  Now  the entire protein is displayed as space   filing spheres for each atom  The four individual polypeptide chains that make up the  hemoglobin tetramer are each given a different color  The logical operation used in the  selection command is a typical Boolean OR  so the command  select protein or ligand   will select all atoms in the protein and all atoms in the ligand  Similarly  the command   select p
124. ble settings  Click the Create New button  A new material   Material 12  will be created  Give it the settings listed in Table 5 7 4     Go back to the Graphical Representations window  In the Material menu  Material  12 is now on the list  Try using Material 12 for a representation and see what it looks  like  You can also rename the materials in the Material menu     Now is a good time to try out the GLSL Render Mode  if your computer supports it  In  the VMD Main window  choose Display     Rendermode     GLSL  This mode uses your  3D graphics card to render the scene with real time ray tracing of spheres and alpha   blended transparency  and can improve the visualization of transparent materials  See  Figure 5 7 11 for example renderings made in GLSL mode     If your computer supports GLSL Render Mode  you can try to reproduce  Figure 5 7 11  First  turn on the GLSL rendering mode by selecting Display      Rendermode     GLSL in the VMD Main window     Modify Material 12 to be more transparent by entering the values listed in Table 5 7 5  in the Materials window     Table 5 7 4 Example of a User   Defined Material          Setting Value  Ambient 0 30  Diffuse 0 30  Specular 0 90  Shininess 0 50  Opacity 0 95                         Figure 5 7 11 Examples of different material settings   A  The default transparent material   rendered in GLSL mode   B  A user defined material with high transparency  also rendered in  GLSG mode  For the color version of this figure go to htto
125. candidate structures with a  characteristic tilt and rotational pitch angle  Selection amongst the different candidate  structures can be done using variety of procedures  such as the fitting of each structure to  some experimental data  e g   mutagenesis  Lemmon et al   1992b  Treutlein et al   1992   Arkin et al   1994      In this unit  a different procedure is described for the selection the correct structure from a  list of plausible competing structures based on silent amino acid substitutions  see Basic  Protocol   This procedure makes use of homology data in an objective manner to select the  correct model  and  in principle  can be applied as a screening procedure whenever more than  one model exists                    Figure 5 3 1 In a bundle with n transmembrane a helices  helices i and j in this case   3n  parameters can be used to describe the general structure  assuming rigid helices   1  the inclination  of the helices with respect to the bundle axis  B   related to the commonly used crossing angle Q    2  the rotational angle about the helix director   j  which defines which side of helix i is facing towards  the bundle core  and  3  the helix register  rj  which defines the relative vertical position of the helix        Contributed by Uzi Kochva  Hadas Leonov  Paul D  Adams  and Isaiah T  Arkin  Current Protocols in Bioinformatics  2003  5 3 1 5 3 15  Copyright     2003 by John Wiley  amp  Sons  Inc        UNIT 5 3    Modeling  Structure from  Sequence    5 3
126. ce of the experimentally known structure shar a high percent  FAMS and identity  this strongly supports the accuracy of the model structure  Quantitatively  if    FAMSBASE for the percentage is 23096  the RMSD  root mean square distance  values are within  4  Protein Structure    5 2 2          Supplement 4 Current Protocols in Bioinformatics        over the C  backbone  of the true structure  Note that in low homology cases   regions of locally high homology exist that may contain important information in a  model  In cases of low percent identity   lt 30    statistically half of all models whose  alignment E values are low enough    10  will have a small enough RMSD  within  4      to be considered accurate models  The E value guarantees the length of the  model  In the case of alignments of low enough E value  the reliable region is  sufficiently large in comparison to the entire ORF region  After a few years  the  number of high identity percentage models will increase  and  at that time  the  homology modeling method will produce more accurate protein structures     COMMENTARY    Background Information   The authors of this unit developed a com   puter program  FAMS  Full Automatic Mod   eling System  to build model structures based  on reference structures solved using X ray  diffraction  NMR  or other experimental  methods  as well as amino acid sequence  alignment between a target and its reference  structure  FAMSBASE is a relational data   base of comparative protei
127. computational cost by not Modeling   calculating the RMSD between two clusters if their orientational parameters differ markedly  Structure from  Sequence    5 3 9          Current Protocols in Bioinformatics Supplement 4    Modeling  Membrane  Proteins    5 3 10       Supplement 4       20     2      22     In order to search for clusters to which structures have converged  run the following  command       gt  perl  usr local chi bin ak cluster pl    This file is different from the clustering file in the CHI package  chi cluster in that  all structures that it places in a cluster are similar to one another  In chi  cluster  all  structures are similar to at least one structure  but not necessarily to all of them  This step  is very fast  taking a few seconds      Using any text editor  view the output file    MyProtein variantA   results cluster out  to see how many clusters were obtained     The authors recommend creating at least 10 to 15 clusters for each variant in order to find  a  complete set  for all the variants  see below   This can be achieved by empirically  changing the clustering parameters in the file   MyProtein variantA   chi param  either using a text editor or through the CHI Web interface  see above    There are two methods for increasing the number of clusters   1  relaxing the RMSD  threshold  i e   increasing it  and  2  decreasing the required number of structure per  cluster  Both methods should be tried     In order to calculate an average  represen
128. cs top text  40 0 20        my drawing objects          7  In your OpenGL window  there are a lot of objects now  To find the list of ob   jects you ve drawn  use the command graphics top list  You ll get a list of  numbers  standing for the ID of each object     8  The detailed information about each object can be obtained by typing graphics  top info ID  For example  type graphics top info 0 to see the informa   tion on the point you drew     Current Protocols in Bioinformatics    9  You can also delete some of the unwanted objects using the command graphics  top delete ID    Using these basic shape drawing commands  you can create geometrical objects  as well  as text  to be displayed in your OpenGL window  When you render an image  as discussed  in Basic Protocol 2  steps 19 to 23   these objects will be included in the resulting image  file  You can hence use geometric objects and texts to point or label interesting features  in your molecule  for example  an arrow  a combination of a cylinder and a cone  can be  drawn this way to point at a region of interest of your molecule    10  Quit VMD     WORKING WITH MULTIPLE MOLECULES    In this section  you will learn to work with multiple molecules within one VMD session   We will use the water transporting channel protein  aquaporin  as an example     Necessary Resources    Hardware    Computer    Software  VMD    Files    lfqy  pdb and 1rc2  pdb  which can be downloaded at  http   www currentprotocols com    Molecule List B
129. cture  using systematic homologous model building   dynamical simulated annealing  and restrained  molecular dynamics  Biochemistry 31 2962   2970     Taylor  W R   Flores  T P   and Orengo  C A  1994   Multiple protein structure alignment  Protein  Sci  3 1858 1870     Modeling  Structure from  Sequence    5 6 29       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller    5 6 30       Supplement 15       Thompson  J D   Higgins  D G   and Gibson  T J   1994  CLUSTAL W  Improving the sensitiv   ity of progressive multiple sequence alignment  through sequence weighting  position specific  gap penalties and weight matrix choice  Nucl   Acids Res  22 4673 4680     Thompson  J D   Plewniak  F   and Poch  O  1999   BAIiBASE  A benchmark alignment database  for the evaluation of multiple alignment pro   grams  Bioinformatics 15 87 88     Topham  C M   McLeod  A   Eisenmenger  F    Overington  J P   Johnson  M S   and Blundell   T L  1993  Fragment ranking in modelling of  protein structure  Conformationally constrained  environmental amino acid substitution tables   J  Mol  Biol  229 194 220     Topham  C M   Srinivasan  N   Thorpe  C J    Overington  J P   and Kalsheker  N A  1994   Comparative modelling of major house dust  mite allergen Der p I  Structure validation using  an extended environmental amino acid propen   sity table  Protein Eng  7 869 894     Unger  R   Harel  D   Wherland  S   and Sussman   J L  1989  A 3D building blocks approach to  analyz
130. cture database searches at 90  reli   ability  pp  179 187  In Proceedings of the Inter   national Conference on Intelligent Systems for  Molecular Biology  AAAI Press  Menlo Park   Calif     Holm  L  and Sander  C  1996  Mapping the protein  universe  Science 273 595 602     Holm  L  and Sander  C  1997  An evolutionary  treasure  Unification of a broad set of amidohy   drolases related to urease  Proteins 28 72 82     Kabsch  W  and Sander  C  1983  Dictionary of  protein secondary structure  Pattern recognition  of hydrogen bonded and geometrical features   Biopolymers 22 2577 2637     Kolodny  R  and Linial  N  2004  Approximate pro   tein structural alignment in polynomial time   Proc  Natl  Acad  Sci  U S A  101 12201 12206     Novotny  M   Madsen  D   and Kleywegt  G J  2004   Evaluation of protein fold comparison servers   Proteins 54 260 270     Sierk  M L  and Kleywegt  G J  2004  Deja vu all  over again  Finding and analyzing protein struc   ture similarities  Structure 12 2103 2111     Key References  Holm and Sander  1993  See above     The original Dali reference    Holm and Sander  1996  See above    Reviews structure comparison methodology  key re   sults  and implications     Holm  L  and Park  J  2000  DaliLite workbench  for protein structure comparison  Bioinformatics  16 566 567     The main DaliLite reference  which should be cited  in any publication using DaliLite results     Internet Resources  http   www ebi ac uk DaliLite    The interactive DaliLite
131. d  Gronenborn  A M  1986  Application of  molecular dynamics with interproton distance  restraints to three dimensional protein structure  determination  A model study of crambin  J   Mol  Biol  191 523 551     Cohen  F E   Gregoret  L   Presnell  S R   and Kuntz   I D  1989  Protein structure predictions  New  theoretical approaches  Prog  Clin  Biol  Res   289 75 85     Collura  V   Higo  J   and Garnier  J  1993  Modeling  of protein loops by simulated annealing  Protein  Sci  2 1502 1510     Colovos  C  and Yeates  T O  1993  Verification of  protein structures  Patterns of nonbonded atomic  interactions  Protein Sci  2 1511 1519     Corpet  F  1988  Multiple sequence alignment  with hierarchical clustering  Nucl  Acids Res   16 10881 10890     Deane  C M  and Blundell  T L  2001  CODA  A  combined algorithm for predicting the struc   turally variable regions of protein models  Pro   tein Sci  10 599 612     de Bakker  P I   DePristo  M A   Burke  D F   and  Blundell  T L  2003  Ab initio construction of  polypeptide fragments  Accuracy of loop decoy  discrimination by an all atom statistical poten   tial and the AMBER force field with the Gen   eralized Born solvation model  Proteins 51 21   40     DePristo  M A   de Bakker  P I   Lovell  S C   and  Blundell  T L  2003  Ab initio construction  of polypeptide fragments  Efficient generation    of accurate  representative ensembles  Proteins  51 41 55     Deshpande  N   Addess  K J   Bluhm  W F   Merino   Ott  J C   Townse
132. d  WHATCHECK  Hooft et al  1996   AI   though errors in stereochemistry are rare  and less informative than errors detected  by statistical potentials  a cluster of stereo   chemical errors may indicate that there are  larger errors  e g   alignment errors  in that  region     Modeling  Structure from  Sequence    5 6 21       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller       5 6 22    Supplement 15       Applications   Comparative modeling is often an efficient  way to obtain useful information about the  protein of interest  For example  comparative  models can be helpful in designing mutants  to test hypotheses about the protein s func   tion  Wu et al   1999  Vernal et al   2002    in identifying active and binding sites  Sheng  et al   1996   in searching for  designing  and  improving ligand binding strength for a given  binding site  Ring et al   1993  Li et al   1996   Selzer et al   1997  Enyedy et al   2001  Que  et al   2002   modeling substrate specificity   Xu et al   1996   in predicting antigenic epi   topes  Sali and Blundell  1993   in simulat   ing protein protein docking  Vakser  1995    in inferring function from calculated electro   static potential around the protein  Matsumoto  et al   1995   in facilitating molecular replace   ment in X ray structure determination  Howell  et al   1992   in refining models based on  NMR constraints  Modi et al   1996   in test   ing and improving a sequence structure align   ment  Wolf et 
133. d  analysis of simulation results  and animation  of molecular dynamics trajectory  In addition   VMD can also work with volumetric data  and  provides a platform for bioinformatics analy   sis such as protein sequence alignment  What  we are able to present in this tutorial only  showcases a small part of VMD   s capabil   ity  But now that you have learned the ba   sics of VMD  you are ready to explore its  many other features most suitable for your re   search  For this purpose  there are many tu   torials available that aim at offering a more  focused training  either on a specific tool or  on a scientific topic  You can find many useful  documentations  including the comprehensive  VMD User s Guide  at the VMD homepage  http   www ks uiuc edu Research vmd      Critical Parameters and  Troubleshooting   Most parameters in VMD can be easily  adjusted to suit individual users  needs  For  example  when rendering molecules using a  representation  as described in Basic Protocol  1  users can adjust the resolution of the rep   resentation in the graphical user interface  as  well as many other parameters specific to the  drawing method of the representation  New  users of VMD might find default settings for  most parameters are good starting points  but  are also encouraged to change the parameters  and test the difference  If you have any ques     tions on using VMD  we encourage you to  subscribe to the VMD mailing list http   www   ks uiuc edu Research vmd mailing list    
134. db  pdb coordinate file for ubiquitin  Vijay Kumar et al    1987     beta tcl  An example tcl script     distance tcl  An example tcl script     equilibration dcd       transport through the AQPI water channel  dcd molecular dynamics trajectory file of an equi  RENE fr  mi  Nature 414 872 878  libration simulation Sequence  5 7 47    Current Protocols in Bioinformatics Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 48       Supplement 24       pulling dcd   dcd molecular dynamics trajectory file of a protein   pulling simulation   Spinach aqp fasta   An example fasta protein sequence file   ubiquitin psf   psf structure file for ubiquitin that defines connec   tivity of atoms    Current Protocols in Bioinformatics    
135. different drawing and coloring methods  rendering  publication quality figures  animating and analyzing the trajectory of a molecular dynam   ics simulation  scripting in the text based Tcl Tk interface  and analyzing both sequence  and structure data for proteins  Curr  Protoc  Bioinform  24 5 7 1 5 7 48      2008 by John  Wiley  amp  Sons  Inc     Keywords  molecular modeling e molecular dynamics visualization e  interactive visualization e animation       INTRODUCTION    VMD  Visual Molecular Dynamics  Humphrey et al   1996  is a molecular visualiza   tion and analysis program designed for biological systems such as proteins  nucleic  acids  lipid bilayer assemblies  etc  It is developed by the Theoretical and Computa   tional Biophysics Group at the University of Illinois at Urbana Champaign  Among  molecular graphics programs  VMD is unique in its ability to efficiently operate on  multi gigabyte molecular dynamics trajectories  its interoperability with a large number  of molecular dynamics simulation packages  and its integration of structure and sequence  information     Key features of VMD include methods   1  general 3D molecular visualization  with extensive drawing and coloring methods  e g   see Fig  5 7 1    2  exten   sive atom selection syntax for choosing subsets of atoms for display   3  visual   ization of dynamic molecular data   4  visualization of volumetric data   5  sup   port for most molecular data file formats   6  no limits on the number of atoms  
136. displaying them in Ras   Mol  Basic Protocol 1   Next  the advantages and limitations of different representations  will be discussed  Alternate Protocol   A common pitfall encountered in the display  of atomic coordinates  obtaining the proper biological unit  will be presented  Basic  Protocol 2   Finally  some ideas for customizing a molecular graphics session will be  presented  Basic Protocol 3      USING RasMol TO DISPLAY A PROTEIN STRUCTURE    In this protocol  the coordinates of hemoglobin will be downloaded from the Protein Data  Bank and the structure displayed in RasMol using a few basic representations  RasMol is  an open source program designed for the display of biological molecules  The program  reads molecular coordinates from a file and interactively displays the molecule in a variety  of representations  RasMol is an excellent place to start when learning about molecular  graphics  since the program has a number of useful options available in convenient  pull down menus  Then  as further functionality is needed for specific applications  the  Command Line interface allows additional selection and representation options     Necessary Resources    Hardware    RasMol runs on a variety of computer hardware  including personal computers     Software    Operating system  RasMol runs under Microsoft Windows and Apple Macintosh  OS 7 0 or higher  including Mac OS X   It may also be run on workstations  under Unix  Linux  or VMS    RasMol  Binary versions of RasMol a
137. dynamics search of configuration space  the protocol generates  a set of candidate structures  The best one is selected from among these using the silent  amino acid substitutions in the protein family as a stringent test for robustness  It seems  likely that this procedure is just the tip of the proverbial iceberg for membrane protein  prediction     Homology modeling demands that the model be inspected  not only by computer program  but also by eye  For this and numerous other reasons  the ability to display and manipulate  the three dimensional structures of proteins has passed from the province of a select few  into the routine toolkit of almost every biologist  Among the many public software  packages available for this purpose  RasMol  unr 54  is one of the oldest  most versatile   and easiest to use  In UNIT 54  David Goodsell gives an overview of its capabilities and  then describes a number of useful protocols that should not only familiarize readers with  RasMol  but also enable them to carry out many of the most common procedures     New units in this chapter address two other important issues in structure modeling  One  of the most frequently asked questions about any  new  protein structure is  does it  resemble any previously known fold  This is not just an academic matter  Increasingly   protein structures are being determined for gene products of unknown function  not only  because of the structural genomics initiatives but also because genetics often leads to
138. e     In this paper  results of global searching molecular  dynamics simulations are analyzed in terms of en   ergy  thereby enabling the user to further select  among candidate models     Torres et al   2002b  See above     In this work  silent substitution modeling is em   ployed to derive a structure of the TCR CD3C trans   membrane helical bundle  shown to coincide with  that obtained experimentally        Contributed by Uzi Kochva  Hadas Leonov   and Isaiah T  Arkin   The Hebrew University   Jerusalem  Israel    Paul D  Adams  Lawrence Berkeley Laboratory  Berkeley  California       Modeling  Structure from  Sequence    5 3 15       Supplement 4    Representing Structural Information with  RasMol    Thousands of atomic structures of proteins  nucleic acids  and other biomolecules are  available for use in research and education  Many effective tools are available for the  display of these structures  These tools run on popular computer hardware  and they  provide a standard set of options for representation of the molecule  This unit will  describe the use of a common program  RasMol  for the display of molecular structures   RasMol is simple to get started and provides a wide range of options as one explores  a molecular structure  Many of the principles of selection and display used in RasMol  will then be directly applicable when moving to other molecular graphics programs for  specific applications     The unit will begin with the basics of obtaining coordinates and 
139. e    spherical    atoms are not looking  very spherical  In the Graphical Representations window  click on the representa   tion you set up before for the protein to highlight it in yellow  Try adjusting the  Sphere Resolution setting to something higher  and see what a difference it can make   Fig  5 7 10     Most of the drawing methods have a geometric resolution setting  Try a few different    drawing methods and see how their resolutions can be easily increased  When producing  images  the resolution can be raised until it stops making a visible difference     Colors and materials   8  There is a Material menu in the Graphical Representations window  which by default  is set to Opaque material   Choose the protein representation you made before  and  experiment with the different materials in the Material menu                       Modelin  Figure 5 7 10 The effect of the resolution setting   A  Low resolution  Sphere Resolution set to SEI  from  8   B  High resolution  Sphere Resolution set to 28  Sequence   5 7 13    Current Protocols in Bioinformatics Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 14       Supplement 24    10     11     12     13     Besides the predefined materials in the Material menu  VMD also allows users to  create their own materials  To make a new material  in the VMD Main window choose  Graphics     Materials          In the Materials window that appears  you will see a list of the materials you just  tried out  and their adjusta
140. e  amp  Chain  mol1A       Lo feoeo i PDB Files  mol2 is  Structure  E ore aoned e meat c ona Traces  rotated   translated   amp  Chain mol1 position    Additional data    e Rotation translation matrices for superimposition  Listing of structurally equivalent residue ranges   e View the log   this is only informative to experts    Inputs    Here you can check that your PDB structures have been uploaded and parsed successfully  z                   274647  aln html soft Internet Explorer       No 1  Query mol1A Sbjct mol2A Z score 22 2    DSSP 11111111 LLLHHHHHHHHHHHL    LLLL llllLLLL  LLLHHHHHH  Query skknslalSLTADQMVSALLDAE  PPIL yseyDPTR    PPSEASMMG  ident   l  j tMSEIDRIAQNIIKSHleTCOYtmeelhqlawqthtyeEIKAyqSKSREALWQ  1lHHHHHHHHHHHHHHHhhLLLLlhhhhh1111111lhhHHHHhhLLLHHHHHH    HHHHHHHHHHHHHHHHHHHLLLHHHLLHHHHHHHHHHHHHHHHHHHHHHHLLLLLLEELL  Query LLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLF  ident Hp ott d H  d LU    Sbjct QCAIQITHAIQYVVEFAKRITGFMELCQNDQILLLKSGCLEVVLVRMCRAFNPLNNTVLF  DSSP  HHHHHHHHHHHHHHHHHHLLHHHHLLLHHHHHHHHHHHHHHHHHHHHHHHEELLLLEEEE    DSSP LlLLLEELLHHHHLLlHHHHHHHHHHHHHHHHHHLLLHHHHHHHHHHHHHHLLLLLLL 1  Query ApNLLLDRNQGKCVEgMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSS  ident I pog b  4 l   Sbjct E GKYGGMOMFKALG SDDLVNEAFDFAKNLCSLQOLTEEEIALFSSAVLISPDRAWLL    DSSP L LEEELHHHHHHHL   LHHHHHHHHHHHHHHHLLLLLHHHHHHHHHHHHLLLLLLLLL        DSSP llhhhhHHHHHHHHHHHHHHHHHHHHHLLLLLhhhhHHHHHHHHHHHHHHHHHHHHHHHH  Query tlksleEKDHIHRVLDKITDTLIHLMAKAGLTlqqqHORLAQLLLILS
141. e 5 5 9  the  nuclear receptors are unified by ADDA into one family     ADDA family indices are not stable  that is  they may change between releases of the  ADDA database     5  Go back to the previous page  Fig  5 5 8  and click on the  interact  link to see  details about the structural neighbors of each domain  The list of neighbors of  estradiol receptor is shown in Figure 5 5 10     The hits are ranked by Z Score with best hits at the top of the table  As a general rule  a  Z score above 20 means the two structures are definitely homologous  between 8 and 20  means the two are probably homologous  between 2 and 8 is a grey area  and below 2  is not significant  When structural similarity is due to homology  the proteins often have  related biochemical functions  e g   in Figure 5 5 10 the top hits are all nuclear receptors     Other listed parameters in Figure 5 5 10 are as follows   ide  percentage amino acid  identity in aligned positions   rmsd  root mean square deviation of Cx atoms in super   imposition  1ali  number of structurally equivalent positions   and 1seq2  length of  the structural neighbor       6  To display structural alignments between estradiol receptor and its neighbors as  one dimensional alignments or in three dimensional superimposition  select a few  structures by clicking on check boxes on the left  Then click on the Structure Align   ment button  which results in a multiple structure alignment page  Fig  5 5 11  similar  to a sequence alignmen
142. e NewCartoon  protein and helix and name CA ColorID    8 Surf  resname GLY and not resid 72 to 76 ColorID 7 VDW    resname LYS ColorID    18 Licorice    Opaque  Material 12  Opaque  Opaque       20  Once you have the scene set the way you like it in the OpenGL window  simply choose  File     Render      in the VMD Main window  The File Render Controls window will  appear on the screen     21  The File Render Controls allows you to choose which renderer you want to use and  the file name for your image  Select    snapshot    for the rendering method  type in a  filename of your choice  and click Start Rendering     22  If you are using a Mac or a Linux machine  an image processing application might  open automatically that shows you the molecule you have just rendered using Snap   shot  If this is not the case  use any image processing application to take a look at the  image file  Close the application when you are done to continue using VMD     The snapshot renderer saves exactly what is showing in your OpenGL display window   in  fact  if another window overlaps the display window  it may distort the overlapped region  of the image     23  Try to render again using different rendering methods  particularly TachyonInternal  and POV3  see Fig  5 7 14 for an example POV3 rendering   Compare the quality of  the images created by different renderers                    Figure 5 7 14 Example of a POV3 rendering  For the color version of this figure go to http   www currentprotocol
143. e analyzed is ub  quitin psf  the only one loaded    The selection for which RMSD will be computed is all of the protein atoms  excluding  hydrogens  since the  noh  checkbox is on   The RMSD will be calculated for each frame  with the reference to frame 0  Make sure the Plot checkbox is selected     3  Click the Align button     This will align each frame of the trajectory with respect to the reference frame  in this case   frame 0  to minimize the RMSD  by applying only rigid body translations and rotations   This step is not necessary  but is desirable in most cases  because we are interested only  in RMSD that arises from the fluctuations of the structure and not from the displacements  and rotations of the molecule as a whole  The result of the alignment can be seen in the  OpenGL display    4  Click the RMSD button in the RMSD Trajectory Tool window  The protein RMSD    in Angstrom  versus frame number is displayed in a plot  Fig  5 7 28     Over several initial frames  RMSD     0 because positions of the protein atoms are fixed  during that time in the simulation to allow water molecules around the protein to adjust  to the protein surface  After that  the protein is released  and the RMSD grows quickly to    around 1 5     At that point  the RMSD levels off and remains at   1 5    further on  This  is a typical behavior for molecular dynamics simulations  Leveling of the RMSD means    Current Protocols in Bioinformatics    BASIC  PROTOCOL 15    Modeling  Structure fro
144. e heme  However  by  looking closely  it is possible to see that one histidine is coordinated directly to the  iron  In this case  the view is centered on HIS92 in chain D     6  Clean up the picture by typing the following series of commands in the Command  Line window     RasMol gt  cpk off    This turns off the spheres on all the histidines     RasMol gt  select HIS92 D  This selects just histidine 92 in chain D   RasMol gt  wireframe 100    This draws a thick wireframe on this histidine     RasMol gt  color cpk  This colors the histidine by atom type     This should give a display like the one in Figure 5 4 9     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 4 11       Supplement 11    Representing  Structural  Information with  RasMol    5 4 12       Supplement 11                               Figure 5 4 8 Zooming in on one heme group  it is easy to locate histidines on either side of the  iron ion  The one on the right is histidine 92  which coordinates with the iron ion  For the color  version of this figure go to http  Avww currentprotocols com            v 2HHB    5x                      Figure 5 4 9 Histidine 92 is displayed with a thick wireframe representation  colored by atom  type  For the color version of this figure go to http  www currentprotocols com     Current Protocols in Bioinformatics    7  Notice that this display is a bit messy because the backbone atoms are included for  the histidine  This can be cleaned up with     
145. e it from MultiSeq by pressing    Table 5 7 8 The Four Aquaporins Used in this Section          PDB code Description Reference   lfqy Human AQP1 Murata et al   2000    lrc2 E  coli AqpZ Savage et al   2003    llda E  coli Glycerol Tajkhorshid et al   2002   Facilitator  GIpF    1j4n Bovine AQPI Sui et al   2001        Current Protocols in Bioinformatics    BASIC  PROTOCOL 11    Modeling  Structure from  Sequence    5 7 33       Supplement 24    Using VMD  An  Introductory  Tutorial       5 7 34    Supplement 24          eoo VMD Main  File Molecule Graphics Display Mouse Extensions Help      ID T A D F Molecule Atoms Frames Vol       M our  loop z  yT sa Te                 Figure 5 7 21 VMD Main menu after loading the four aquaporins     10    the delete or Backspace key on your keyboard  Do the same to remove the 1j4n X  detergent molecule     MultiSeq uses the program STAMP  Russell and Barton  1992  to align protein molecules   STAMP  Structural Alignment of Multiple Proteins  is a tool for aligning protein sequences  based on three dimensional structures  Its algorithm minimizes the Cy distance between  aligned residues of each molecule by applying globally optimal rigid body rotations and  translations  Note that you can only perform alignments on molecules that are structurally  similar  if you try to align proteins that have no common structures  STAMP will fail       In the MultiSeq window  select Tool     Stamp Structural Alignment  This will open    the Stamp Alignme
146. e program  in the src directory of CHI   usr local   chi src  type       gt  make    3  Place the three Perl scripts  ak_cluster pl  compare_rmsd pl  and  to_gly pl in usr local chi bin      4  Place the file cns  inp in  usr local chi bin      Modeling  Structure from  Sequence    5 3 3          Current Protocols in Bioinformatics Supplement 4    Modeling  Membrane  Proteins    5 3 4       Supplement 4       5  In order for the system to recognize both CNS and CHI  which have been recently  compiled  edit the   cshrc file  APPENDIX 1C  to include the following two lines     source  usr local chi chi env  source  usr local cns solve 1 1 cns solve env    Define the sequences for the GMDS   There are two considerations that one must take into account  The first is the identity of  the transmembrane segments to be simulated  The transmembrane a helices must there   fore be delineated from the rest of the protein  The second is  what are the homologous  sequences to one s protein of interest     6  Determine the transmembranal amino acids range  either by prior knowledge or by  using programs predicting transmembranal domains  e g   via the interactive programs  TMHMM  http   www cbs dtu dk services TMHMM   or PSIPRED  ttp   bioinf cs   ucl ac uk psipred       7  Search protein databases  e g   NCBI  PDB  or GeneBank  all accessible from the  NCBI home page  Attp   www ncbi nlm nih gov  for homologous sequences using the  transmembrane segments determined above     The minimal id
147. easing the crossing angle used for the right  and left handed searches from 25  by editing  the chi param file  Other options that may  be pursued are to increase the number of trials  and to reduce the rotational increment  again  in chi param   Obviously both of these  changes will be reflected in increased compu   tational time    Some ofthe mutations are not silent  In other  words  some of the homologs do not adopt the       Modeling  Structure from  Sequence    5 3 13       Supplement 4    Modeling  Membrane  Proteins    5 3 14       Supplement 4       same structure  Torres et al   2002a   Here the  authors suggest an increase in the similarity  threshold of the sequences used in the simula   tion  i e   sequences that are closer to the target  protein have a better chance of adopting the  same structure     More than one structure is found   In this instance  it is possible that the  filter   ing capabilities  of the silent mutant were not  sufficient  The recommendation is simply to  use more sequences  by potentially lowering  the identity threshold     The structure found is incorrect   In all of the cases in which the authors have  used the combination of GMDS and silent  amino acids substitution modeling  it produced  the correct structure as verified using other  experimental methods  Kukol et al   2002  Tor   res et al   2002b  Torres et al   2002c   However   this may not always be the case  Identifying  such a situation is difficult and requires the  applicat
148. eate  FASTA files similarly in this format  When you create a FASTA file  remember to save it in  plain text  and use     asta as the file extension     5  Close the text editor when you finish examining spinach aqp fasta     6  In the MultiSeq window  select File     Import Data     Select From File in the  Import Data window  and press the top Browse button on top to select the file  Spinach aqp fasta  Press OK on the bottom of the Import Data window     You have now loaded the sequence of a spinach aquaporin into MultiSeq  You can now per   form sequence alignment on the spinach aquaporin protein with other loaded aquaporin  molecules  Let us try a sequence alignment between a spinach and a human aquaporin     7  Click on the checkbox on the left of spinach  aqp  and click on the checkbox on  the left of 1  qy   pdb     8  Open the ClustalW Alignment Options window by selecting Tools     ClustalW  Sequence Alignment  Under the Multiple Alignment options on the top  check Align  Marked Sequences  Go to the bottom of the window and select OK     The sequence of spinach aquaporin is now aligned with the sequence of human aquaporin   and you can check how good the alignment is by obtaining its Qy and Sequence Identity  values  If you feel that the two molecules are listed too far apart in the MultiSeq window   you can move the molecules by dragging them with your mouse  Also  as you might have  noticed  in MultiSeq molecules can be Marked by checking their checkboxes  They can
149. ecessary to se   lect only one template  In fact  the use of  several templates approximately equidistant  from the target sequence generally increases  the model accuracy  Srinivasan and Blundell   1993  Sanchez and Sali  1997b      Model building    Modeling by assembly of rigid bodies   The first and still widely used approach in  comparative modeling is to assemble a model  from a small number of rigid bodies obtained  from the aligned protein structures  Browne  et al   1969  Greer  1981  Blundell et al   1987      The approach is based on the natural dissection  of the protein structures into conserved core  regions  variable loops that connect them  and  side chains that decorate the backbone  For  example  the following semiautomated pro   cedure is implemented in the computer pro   gram COMPOSER  Sutcliffe et al   19872    First  the template structures are selected and  superposed  Second  the  framework  is cal   culated by averaging the coordinates of the  C  atoms of structurally conserved regions in  the template structures  Third  the main chain  atoms of each core region in the target model  are obtained by superposing the core segment   from the template whose sequence is closest  to the target  on the framework  Fourth  the  loops are generated by scanning a database  of all known protein structures to identify the  structurally variable regions that fit the anchor  core regions and have a compatible sequence   Topham et al   1993   Fifth  the side chains  a
150. egions without  atemplate  The C  trace of the 112   117 loop is shown for the X ray structure of human eosinophil  neurotoxin  red   its model  green   and the template ribonuclease A structure  residues 111   117   blue    D  Errors due to misalignments  The N terminal region in the crystal structure of human  eosinophil neurotoxin  red  is compared with its model  green   The corresponding region of the  alignment with the template ribonuclease A is shown  The red lines show correct equivalences   that is  residues whose C  atoms are within 5 A of each other in the optimal least squares  superposition of the two X ray structures  The  a  characters in the bottom line indicate helical  residues and  b  characters  the residues in sheets   E  Errors due to an incorrect template  The  X ray structure of    trichosanthin  red  is compared with its model  green  that was calculated  using indole 3 glycerophosphate synthase as the template  For the color version of this figure go  to Attp   www currentprotocols com     Current Protocols in Bioinformatics    the template is locally different   gt 3 A  from  the target  resulting in errors in that region   The structural differences are sometimes not  due to differences in sequence  but are a con   sequence of artifacts in structure determination  or structure determination in different environ   ments  e g   packing of subunits in a crystal    The simultaneous use of several templates can  minimize this kind of error  Srinivasan 
151. enames in the lower row of input boxes in the  Figure 5 5 1     Searches for the PDB entry codes of known structures for a query protein can be performed  using Entrez at NCBI  http   www ncbi nlm nih gov   SRS  http   srs ebi ac uk   and other  similar database cross linking resources     For a structure file containing a number of different chains  a specific chain can be selected  in the submission page  If no chain is specified  structural comparisons will be performed  on every chain in the structure file  and the return of results will take much longer     Size limits for the comparison are between 30 and 1000 amino acid residues per chain     3  Click on the Run DaliLite button  The summary page for the results of a structure  comparison appears  the top part of the page is shown Figure 5 5 2     The page  Fig  5 5 2  includes the following information     Z Score  The Z Score is a measure of quality of the alignment   the higher  the better  As  a general rule  Z scores above 8 yield very good structural superimpositions  Z scores  between 2 and 8 indicate topological similarities  and Z scores below 2 are not significant     Aligned Residues  The number of aligned residues is the number of structurally equivalent  residue pairs     RMSD  The root mean square deviation  RMSD  is a measure of the average deviation in  distance between aligned alpha carbons in structural superimposition  Long alignments   e g   over 100 aligned residues with RMSD below 3 A  indicate sim
152. ence  identity  while they identify more than 9096  of the relationships when sequence identity  is between 30  and 40   Brenner et al    1998   Another benchmark  based on 200 ref   erence structural alignments with 096 to 4096    Current Protocols in Bioinformatics    sequence identity  indicated that BLAST is  able to correctly align only 2696 of the residue  positions  Sauder et al   2000      Profile sequence alignment methods   The sensitivity of the search and accuracy  of the alignment become progressively diffi   cult as the relationships move into the twilight  zone  Saqi et al   1998  Rost  1999   A sig   nificant improvement in this area was the in   troduction of profile methods by Gribskov et  al   1987   The profile of a sequence is de   rived from a multiple sequence alignment and  specifies residue type occurrences for each  alignment position  The information in a mul   tiple sequence alignment is most often en   coded as either a position specific scoring ma   trix  PSSM  Henikoff and Henikoff  1994   1996  Altschul et al   1997  or as a Hidden  Markov Model  HMM  Krogh et al   1994   Eddy  1998   In order to identify suitable tem   plates for comparative modeling  the profile of  the target sequence is used to search against a  database of template sequences  The profile   sequence methods are more sensitive in de   tecting related structures in the twilight zone  than the pairwise sequence based methods   they detect approximately twice the number  of hom
153. ent Protocols in Bioinformatics    Increasing geometric resolution    All VMD objects are drawn with an adjustable resolution  allowing users to balance  fineness of detail with drawing speed     5  Open the Graphical Representation window via Graphics     Representations     in  the VMD Main menu  Modify the default representation to show just the protein  and  display it using the VDW drawing method     6  Zoom in on one or two of the atoms by using Mouse Scale     Mode  shortcut s      You might notice that as you zoom into an atom closer and closer  the atom might be cut  off by an invisible clipping plane  which makes it difficult to focus on just one atom  This is  an OpenGL feature  You can move the clipping plane closer to you by doing the following   switch your mouse mode to the Translate mode  either by pressing the shortcut key  t  in  the OpenGL window or by selecting Mouse     Translate Mode  and dragging your mouse  in the OpenGL window while holding down the right mouse key  You can now move the  clipping plane closer to you  or away from you  If this does not work  here is an alternative  way  in the VMD Main window  choose Display     Display Settings      in the Display  Settings window that shows up you can see that many OpenGL options are adjustable   decrease the value for Near Clip  which will move the OpenGL clipping closer  allowing  you to zoom in on individual atoms without clipping them off     7  Notice that with the default resolution setting  th
154. entity between the sequences should be kept very high  in order to ensure  that all changes are indeed  silent   The authors typically use sequences that are at least  75  identical     8  Perform multiple sequence alignment  MSA  of the desired homologous sequences  using MSA programs   e g   Clustal X  ClustalW  uwrr 2 3   or PileUp  UNIT 3 6  from  the GCG Wisconsin package     No gaps should be allowed  i e   the length of all homologous sequences should be identical   The results of the MSA will make it possible to select the exact sequences from the homologous  proteins that correspond to the transmembrane domains of the protein of interest     Set up an appropriate directory structure  Since GMDS produces a large number of files  it is best to work in an orderly and  organized fashion  The authors therefore recommend the following directory setup     9  Create a directory that will contain all the subdirectories and files used in the  GMDS  it will be assumed that this directory is directly under the home directory   e g     MyProtein   Create a specific subdirectory in that directory for each  variant  e g     MyProtein variantA    MyProtein variantB        MyProtein variantN      Prepare the instructions file chi param   In order to run the GMDS using CHI  all that is needed is a single instructions file called  chi param  which  as its name suggests  contains the parameters needed for a CHI run   chi param can be generated by a Web server in the CHI site  Fig  5 3 2
155. er     and to play a movie of the trajectory in various modes   Once  Loop  or Rock  and at an adjustable speed     You will be able to see the frames as they are loaded into the molecule in the OpenGL  window  After the trajectory finishes loading  you will be looking at the last frame of your  trajectory  To go to the beginning  use the animation tools at the lower part of the VMD  Main menu  see Fig  5 7 15      5  Close the Molecule File Browser window     6  For a convenient visualization of the protein  choose Graphics     Representations  in the VMD Main menu  In the Selected Atoms text field  type protein and hit  Enter on your keyboard  in the Drawing Method  select NewCartoon  in the Coloring  Method  select Structure     The trajectory you just loaded is a simulation of an AFM  Atomic Force Microscopy   experiment pulling on a single ubiquitin molecule  performed using the Steered Molecular  Dynamics  SMD  method  Isralewitz et al   2001   We are looking at the behavior of the  protein as it unfolds while being pulled from one end  with the other end constrained  to its original position  Each frame corresponds to 10 picoseconds of simulation time   Ubiquitin has many functions in the cell  and it is currently believed that some of these  functions depend on the protein s elastic properties  which can be probed in AFM pulling  experiments  Such elastic properties are usually due to hydrogen bonding between residues  in B strands of the protein molecules     Using M
156. ered in the vmd console window   If you are using a Mac  your vmd console window is the terminal window that shows up  when you open VMD     Working with specific parts of a molecule  the atomselect command   Many times  you might want to perform operations on only a specific part a molecule   For this purpose  VMD s atomselect command is very useful  The atomselect  command has the following syntax     atomselect molid  selection command    creates a new atom selection that includes  all atoms described by  selection command      2  Type set crystal  atomselect top   all    in the Tk Console  window     This command allows you to select a specific part of a molecule  The first argument to  atomselect is the molecule ID  shown to the very left of the VMD Main window   the  second argument is a textual atom selection like what you have been using to describe  graphical representations in Basic Protocol 1  The selection returned by atomselect  is itself a command you will learn to use     Current Protocols in Bioinformatics    This step creates a selection  crystal  that contains all the atoms in the molecule and  assigns it to the variable crystal  Instead of a molecule ID  which is a number   we  have used the shortcut top to refer to the top molecule  A top molecule means that it is  the target for scripting commands  This concept is particularly important when multiple  molecules are loaded at the same time  see Basic Protocol 9 for dealing with multiple  molecules in VMD  
157. erforms homology modeling of protein structures by means of  an algorithm consisting of database searches and simulated annealing  FAMS pro   duces a model in which the torsion angles of the backbone and sidechains are highly  accurate     An overview of the processes for obtaining a protein model via FAMS is shown in  Figure 5 2 1  This unit describes a procedure for searching FAMSBASE  Yamaguchi  et al   2003   the database of structural models calculated by FAMS  see Basic  Protocol      CHECKING FAMSBASE FOR A PROTEIN MODEL    When a 3 D structural model is required for a particular protein  one should first check  whether or not the protein is already modeled  FAMSBASE is a relational database of  comparative protein structure models for the entire genomes of 41 species  as pre   sented in the GTOP  Genomes TO Protein structures and functions  database at  http   spock  genes nig ac jp  gtop old gtop html  The models in that database were  all calculated using FAMS  FAMSBASE provides versatile search and query func   tions  including searching by name of ORF  open reading frame   ORF annotation   Protein Data Bank  PDB  ID  and sequence similarity  FAMSBASE is available online  at http   famsbase bio nagoya u ac jp famsbase   The present percentage of ORFs  with 3 D protein models in FAMSBASE is 42   therefore  requested protein models  are currently available in approximately half of all cases     Necessary Resources    Hardware    Any computer with an Internet connecti
158. errors in three   dimensional structures of proteins  Proteins  17 355 362     Sippl  M J  1995  Knowledge based potentials for  proteins  Curr  Opin  Struct  Biol  5 229 235     Skolnick  J  and Kihara  D  2001  Defrosting the  frozen approximation  PROSPECTOR a new  approach to threading  Proteins 42 319 331     Smith  T F  and Waterman  M S  1981  Identi   fication of common molecular subsequences   J  Mol  Biol  147 195 197     Spahn  C M   Beckmann  R   Eswar  N   Penczek   P A   Sali  A   Blobel  G   and Frank  J   2001  Structure of the 80S ribosome from  Saccharomyces cerevisiae tRNA ribosome and  subunit subunit interactions  Cell 107 373   386     Srinivasan  N  and Blundell  T L  1993  An evalua   tion of the performance of an automated proce   dure for comparative modelling of protein ter   tiary structure  Protein Eng  6 501 512     Sutcliffe  M J   Haneef  I   Carney  D   and Blundell   T L  1987a  Knowledge based modelling of ho   mologous proteins  Part I  Three dimensional  frameworks derived from the simultaneous su   perposition of multiple structures  Protein Eng   1 377 384     Sutcliffe  M J   Hayes  F R   and Blundell  T L   1987b  Knowledge based modelling of homol   ogous proteins  Part II  Rules for the confor   mations of substituted sidechains  Protein Eng   1 385 392     Sutcliffe  M J   Dobson  C M   and Oswald  R E   1992  Solution structure of neuronal bungaro   toxin determined by two dimensional NMR  spectroscopy  Calculation of tertiary stru
159. esentation  and the material determines the effects  of lighting  shading  and transparency on the representation  Let us first explore different  drawing styles     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 5       Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 6       Supplement 24                      Figure 5 7 6  A  Licorice   B  Tube  and  C  NewCartoon representations of ubiquitin  For the  color version of this figure go to http  Avww currentprotocols com     Exploring different drawing styles   12  In the VMD Main window  choose the Graphics     Representations    menu item   A window called Graphical Representations will appear and the current default  representation will be highlighted in yellow  Fig  5 7 5A      13  In the Draw Style tab  Fig  5 7 5B   change the style  Fig  5 7 5D  and color   Fig  5 7 5C  of the representation  Here  we will focus on the drawing style  the  default is Lines      14  Each Drawing Method has its own parameters  For instance  change the thickness of  the lines by using the controls on the lower right hand side corner  Fig  5 7 5E  of  the Graphical Representations window     15  Click on the Drawing Method  Fig  5 7 5D  to see a list of options  Choose VDW   van der Waals   each atom is now represented by a sphere scaled to its van der Waals  radius  allowing the user to see the volumetric distribution of the protein     16  When choosing VDW as the drawing method  two new contr
160. etermined by the precomputed all   against all structural alignments between all  representative structures  Based on this map   the database search by the Dali server tries  shortcuts to quickly place the query structure  in a  known  location of fold space  If a strong  match is found to one database structure  then  the search can be restricted to the precom   puted neighborhood of this structure  Fast but  approximate methods can quickly find obvi   ous structural resemblances  Slower but most  sensitive algorithms need then only be applied  to a smaller set of candidates  DaliLite has  the core algorithmic functionality of the Dali  server  The DaliLite programs perform sys   tematic pairwise comparisons without short   cuts and can therefore be run independently of  database updates     Applications   The exponential growth in the number of  newly solved protein structures makes corre   lating and classifying the data an important  task  Dali is now used routinely by crystallo   graphers worldwide to screen the database of  known structures for similarity to newly de   termined structures  The application of Dali  to newly released structures led to a string of  discoveries of unexpected distant evolutionary  relationships  For example  a remarkably di   verse set of distant relatives of urease were          Unaligned                    Figure 5 5 14 Distance matrix representations  Unaligned  Distance matrix representation of  two different proteins  one in the upper a
161. ethod  so its location  within the protein can be visualized easily     39  Use the Zoom controls  Fig  5 7 9F  to display the entire list of residues in the window   This is particularly useful for larger proteins     40  Pick multiple residues by holding the shift key and clicking on the mouse button   Fig  5 7 9E      4    _      Look at the Graphical Representations window  a new representation with the residues  that have been selected using the Sequence Viewer Extension should be shown   Modify  hide  or delete this representation similar to the steps described above     Information about residues is color coded  Fig  5 7 9D  in columns and obtained from  STRIDE  The B value column  Fig  5 7 9B  shows the B value field  temperature factor   often provided in pdb files  The    struct    column shows secondary structure  Fig  5 7 9D    where each letter corresponds to a secondary structure  listed in Table 5 7 3     Current Protocols in Bioinformatics                   Figure 5 7 9  A  The VMD Sequence window displays properties of the protein sequence  inclu   ding  B  the B value and  C  the secondary structure  denoted by  D  the color codes   E  The list of  residues is displayed  with the selected residues highlighted in yellow   F  Zoom controls are also  shown in the window  For the color version of this figure go to http  www currentprotocols com     Saving your work  The viewpoints and representations created using VMD can be saved as a VMD state   This VMD state
162. extending  the applicability of the database search ap   proach  Ring et al   1992  Oliva et al   1997     Current Protocols in Bioinformatics    Rufino et al   1997  Fernandez Fuentes et al    2006   However  the database methods are lim   ited because the number of possible conforma   tions increases exponentially with the length  of a loop  As a result  only loops up to 4 to  7 residues long have most of their conceiv   able conformations present in the database of  known protein structures  Fidelis et al   1994   Lessel and Schomburg  1994   This limitation  is made even worse by the requirement for  an overlap of at least one residue between the  database fragment and the anchor core regions   which means that modeling a 5 residue inser   tion requires at least a 7 residue fragment from  the database  Claessens et al   1989   Despite  the rapid growth of the database of known  structures  it does not seem possible to cover  most of the conformations of a 9 residue seg   ment in the foreseeable future  On the other  hand  most of the insertions in a family of ho   mologous proteins are shorter than 10 to 12  residues  Fiser et al   2000     Loop modeling by conformational search   To overcome the limitations of the database  search methods  conformational search meth   ods were developed  Moult and James  1986   Bruccoleri and Karplus  1987   There are  many such methods  exploiting different pro   tein representations  objective functions  and  optimization or enumeratio
163. ez and Sali  1998   In general   medium resolution models frequently allow a    refinement of the functional prediction based  on sequence alone  because ligand binding is  most directly determined by the structure of  the binding site rather than its sequence  It is  frequently possible to correctly predict impor   tant features of the target protein that do not oc   cur in the template structure  For example  the  location of a binding site can be predicted from  clusters of charged residues  Matsumoto et al    1995   and the size of a ligand may be pre   dicted from the volume of the binding site cleft   Xu et al   1996   Medium resolution mod   els can also be used to construct site directed  mutants with altered or destroyed binding  capacity  which in turn could test hypothe   ses about the sequence structure function re   lationships  Other problems that can be ad   dressed with medium resolution comparative  models include designing proteins that have  compact structures  without long tails  loops   and exposed hydrophobic residues  for bet   ter crystallization  or designing proteins with  added disulfide bonds for extra stability    The high end of the accuracy spectrum  corresponds to models based on 50  se   quence identity or more  The average accu   racy of these models approaches that of low   resolution X ray structures  3    resolution  or  medium resolution NMR structures  10 dis   tance restraints per residue  Sanchez and Sali   1997b   The alignments on wh
164. from  ttp   salilab org modeller     Necessary Resources  Hardware    A computer running RedHat Linux  PC  Opteron  EM64T Xeon64 or Itanium 2  systems  or other version of Linux Unix  x86 x86 64 IA64 Linux  Sun  SGI   Alpha  AIX   Apple Mac OS X  PowerPC   or Microsoft Windows 98 2000 XP    Software    An up to date Internet browser  such as Internet Explorer   Attp ll www microsoft comlie   Netscape  http   browser netscape com    Firefox   http    www mozilla org firefox   or Safari  http   www apple com safari     Installation   The steps involved in installing MODELLER on a computer depend on its operating sys   tem  The following procedure describes the steps for installing MODELLER on a generic  x86 PC running any Unix Linux operating system  The procedures for other operating  systems differ slightly  Detailed instructions for installing MODELLER on machines  running other operating systems can be found at http   salilab org modeller release html     Current Protocols in Bioinformatics    1  Point browser to http   salilab org modeller download_installation html     2  On the page that appears  download the distribution by clicking on the link entitled     Other Linux Unix  under    Available downloads             3  A valid license key  distributed free of cost to academic users  is required to use  MODELLER  To obtain a key  go to the URL http   salilab org modeller    registration html  fill in the simple form at the bottom of the page  and read and  accept the lice
165. g options     TWO USEFUL VIEWS IN RasMol    This protocol includes two quick methods for creating RasMol images that fill spe   cific needs  The first method provides a fast overview of the structure  making it  possible to see the major structural features when exploring a new protein  The sec   ond method makes it possible to pinpoint key amino acids within a complex protein  structure     Current Protocols in Bioinformatics    Necessary Resources  Hardware    RasMol runs on a variety of computer hardware  including personal computers     Software    Operating system  RasMol runs under Microsoft Windows and Apple Macintosh  OS 7 0 or higher  including Mac OS X   It may also be run on workstations  under Unix  Linux  or VMS    RasMol  Binary versions of RasMol are available on the WWW at   http    www bernstein plus sons com software rasmol   Downloading and  installation instructions are given in Support Protocol 1     Files    Coordinate files are read in a variety of formats  including PDB  Mol2  CHARMm   and mmCIF  The program deals gracefully with a number of variations of these  files  including files containing coordinates for multiple conformers or multiple  models  In this example  coordinates for hemoglobin  2HHB   pdb  obtained  from the Protein Data Bank  PDB  unr 1 9  are used  instructions for  downloading the PDB coordinate file are given in Support Protocol 2     An overview representation   This representation is useful for the first look at a protein  to pr
166. h the other trajectory  in which the ubiquitin is pulled apart  Load  this trajectory into VMD using the files ubiquitin psf and pulling dcd   Make sure you load ubiquitin psf as a new molecule  You can change the  names of the molecules by double clicking on them in the VMD Main menu  see  Basic Protocol 9  steps 4 and 5      6  In the RMSD Trajectory Tool window  hit the button Add all to update the list of  molecules     7  Click the Align button and then click RMSD button     The new graph  Fig  5 7 29  displays two RMSD plots versus time  one for the equilibration  trajectory  and the other for the pulling trajectory  The RMSD for the pulling trajectory  does not level off and is much higher than that in the equilibration trajectory  since the  protein is stretched in the simulation     8  Quit VMD     Example of an Analysis Script    In many cases  one requires special types of trajectory analyses that are tailored for  certain needs  The Tcl scripting in VMD provides opportunities for such custom  tasks  Users commonly write their own scripts to analyze the features of interest  A  very extensive library of VMD scripts  contributed by many users  is available online   Attp   www ks uiuc edu Research vmd script library    Here  we will explore a very sim   ple exemplary script  distance  tc1  which computes the distance between two atom  selections vs  time and the distribution of the distances     1  Start a new VMD session  Load the ubiquitin equilibration trajectory 
167. he Graph tab  Select the bond you labeled between atoms 770 and 1242   Click on the Graph button  This will create a plot of the distance between these two  atoms over time  Fig  5 7 27   You can also save this data to a file by clicking on the  Save button  and then use an external plotting program to visualize the data     13  Quit VMD     Example of a Built In Analysis Tool  The RMSD Trajectory Tool    The built in analysis tools in VMD are available under the menu item Extension  gt   Analysis  These tools each feature a GUI window that allows one to enter parameters  and customize the quantities analyzed  In addition  all tools can be invoked in a scripting  mode  using the TkConsole window  We will learn how to work with one of the most  frequently used tools  the RMSD Trajectory Tool     In this example  we will analyze RMSD for two trajectories for the same system  ubiq   uitin psf  One of them is the already familiar pulling trajectory  pulling  dcd   and the other is the trajectory of a simulation in which no force was applied to the protein   equilibration dcd     1  Start a new VMD session  Load the ubiquitin equilibration trajectory into VMD   using the files ubiquitin psf and equilibration dcd      2  Choose Extension     Analysis     gt  RMSD Trajectory Tool in the VMD Main window   Fig  5 7 28   The RMSD Trajectory Tool window will show up     In the RMSD Trajectory Tool window  you can see many customization options  For the  default values  the molecule to b
168. he lower  bound on the errors in the corresponding re   gions of the fold    Restraints derived from experimental data   Because the modeling by satisfaction of spa   tial restraints can use many different types of  information about the target sequence  it is  perhaps the most promising of all compara   tive modeling techniques  One of the strengths  of modeling by satisfaction of spatial re   straints is that restraints derived from a num   ber of different sources can easily be added  to the homology derived restraints  For ex   ample  restraints could be provided by rules  for secondary structure packing  Cohen et al    1989   analyses of hydrophobicity  Aszodi  and Taylor  1994  and correlated mutations   Taylor et al  1994   empirical potentials  of mean force  Sippl  1990   nuclear mag   netic resonance  NMR  experiments  Sutcliffe  et al   1992   cross linking experiments  flu   orescence spectroscopy  image reconstruction  in electron microscopy  site directed mutagen   esis  Boissel et al   1993   and intuition  among  other sources  Especially in difficult cases   a comparative model could be improved by  making it consistent with available experimen   tal data and or with more general knowledge  about protein structure    Relative accuracy  flexibility  and automa   tion  Accuracies of the various model building  methods are relatively similar when used op   timally  Marti Renom et al   2002   Other fac   tors such as template selection and align   ment accuracy u
169. he rest of the protein  Although there are  various mathematical topological definitions of a domain  most domains are like Supreme  Court Justice Potter Stewart s 1964 explanation of pornography  we may not know how  to define it  but we usually know it when we see it   The best evidence that this universe  is indeed limited is the diminishing number of new folds found every year despite the  sharp increase in new structures  Hou et al   2005   Simple application of Fisher statistics  to this frequency distribution gives a crude estimate of the total number of folds  A recent  attempt at cataloging estimates this number to be around 4000  of which nearly half   1700  are already known  Sadreyev and Grishin  2006      Therefore  there is reason to assume that the total number of folds will be known  eventually and that it will indeed be many orders of magnitude less than the number  of sequences  The problem of assigning a fold for every sequence now reduces to two  steps  identifying the fold that corresponds to a given sequence  and deriving the best  possible atomic model for that the structure of that sequence given knowledge of its  domain fold s      That doesn t sound so difficult  but in practice it has proven to be a formidable challenge   Both steps are far from straightforward in all but the simplest cases  and both represent  very active areas of investigation  It is these steps that are the subjects of the protocols in  this chapter     We begin with a discussion
170. her  This helps the protein achieve proper folding and increases its stability     The get command    Atom selections are useful not only for setting atomic data  but also for getting atomic  information  For example  if you wish to communicate which residues are hydrophobic   all you need to do is to create a hydrophobic selection and use the get command     10  Try to use the get command with your se1 atom selection to obtain the names of  hydrophobic residues      sel get resname    But there is a problem  each residue contains many atoms  resulting in multiple repeated  entries  One way to circumvent this is to pick only the a carbons in the selection     11  Type the following in the Tk Console window  note  name CA   a carbons      set sel  atomselect top   hydrophobic and name CA      sel get resname    This should give you the list of hydrophobic residues     Current Protocols in Bioinformatics    12  You can also get multiple properties simultaneously  Try the following      sel get resid   sel get  resname resid    sel get  x y z     If you want to obtain some of the structural properties  e g   the geometric center or the  size of a selection  the command measure can do the job easily     13  Let us try using measure with the sel selection     measure center  sel  measure minmax Ssel    The first command above returns the geometric center of atoms in sel  And the second  command returns two vectors  the first containing the minimum x  y  and z coordinates of  all a
171. i   cal potentials  Sippl  1990  Luthy et al   1992   Melo et al   2002  to assess the compatibility  between the sequence and modeled structure  by evaluating the environment of each residue  in a model with respect to the expected en   vironment as found in native high resolution  experimental structures  These methods can be  used to assess whether or not the correct tem   plate was used for the modeling  They include  VERIFY3D  Luthy et al   1992   PROSAII   Sippl  1993   HARMONY  Topham et al    1994   ANOLEA  Melo and Feytmans  1998    and DFIRE  Zhou and Zhou  2002     Even when the model is based on align   ments that have  gt 30  sequence identity   other factors  including the environment  can  strongly influence the accuracy of a model   For instance  some calcium binding proteins  undergo large conformational changes when  bound to calcium  If a calcium free template  is used to model the calcium bound state of  the target  it is likely that the model will be in   correct irrespective of the target template sim   ilarity or accuracy of the template structure   Pawlowski et al   1996     Evaluations of self consistency  The model  should also be subjected to evaluations of  self consistency to ensure that it satisfies  the restraints used to calculate it  Addi   tionally  the stereochemistry of the model   e g   bond lengths  bond angles  backbone  torsion angles  and nonbonded contacts   may be evaluated using programs such as  PROCHECK  Laskowski et al   1993  an
172. iar with the basics of VMD may selectively pursue sec   tions of their interest  Several files have been prepared to accompany this tutorial  You  need to download these files at http   www currentprotocols com     WORKING WITH A SINGLE MOLECULE    In this section  the basic functions of VMD will be introduced  starting with loading a  molecule  displaying the molecule  and rendering publication quality molecule images   This section uses the protein ubiquitin as an example molecule  Ubiquitin is a small  protein responsible for labeling proteins for degradation  and is found in all eukaryotes  with nearly identical sequences and structures     Necessary Resources    Hardware    Computer    Software  VMD  and an image displaying program    Files    lubq pdb  which can be downloaded at Attp   www currentprotocols com    Current Protocols in Bioinformatics    Loading and Displaying the Molecule    A VMD session usually starts with loading structural information of a molecule into  VMD  When VMD loads a molecule  it accesses the information about the names and  coordinates of the atoms  Then  one can explore various VMD visualization features to  get a nice view of the loaded molecule     Loading a molecule    The first step is to load the molecule  The pdb file  1ubq   pdb  Vijay Kumar et al    1987  that contains the atomic coordinates of ubiquitin will be loaded     1  Start a VMD session  In the VMD Main window  choose File     New Molecule      Fig  5 7 2A   The Molecule Fi
173. ibbon diagram   The display should look like Figure 5 4 5     d  Rotate the display and notice the following   1  Ribbon diagrams make it easy to  identify secondary structural elements  such as the alpha helices in hemoglobin    2  Visual cues to amino acid positions are lost in the smooth ribbon  unless the  ribbon is colored to show the types of amino acids     9  When finished  type the following in the Command Line window to exit the program     RasMol gt  quit    DOWNLOADING AND INSTALLING RasMol ON A LOCAL COMPUTER    This protocol describes how to download and install RasMol on a local computer  Exe   cutable versions of RasMol are available on the WWW  so this is relatively straightfor   ward     Necessary Resources  Hardware    RasMol runs on a variety of computer hardware  including personal computers     Current Protocols in Bioinformatics    SUPPORT  PROTOCOL 1    Modeling  Structure from  Sequence    5 4 7       Supplement 11    SUPPORT  PROTOCOL 2    ALTERNATE  PROTOCOL    Representing  Structural  Information with  RasMol    5 4 8       Supplement 11       Software    Operating system  RasMol runs under Microsoft Windows and Apple Macintosh  OS 7 0 or higher  including Mac OS X   It may also be run on workstations  under Unix  Linux  or VMS     Browser  An Internet browser is required  1  Point the browser to Attp   www bernstein plus sons com softwarelrasmoll     2  Click on the appropriate version at the top of the page to download the executable  file     O
174. ich these mod   els are based generally contain almost no er   rors  Models with such high accuracy have  been shown to be useful even for refining  crystallographic structures by the method of  molecular replacement  Howell et al   1992   Baker and Sali  2001  Jones  2001  Claude  et al   2004  Schwarzenbacher et al   2004      Conclusion   Over the past few years  there has been a  gradual increase in both the accuracy of com   parative models and the fraction of protein se   quences that can be modeled with useful ac   curacy  Marti Renom et al   2000  Baker and  Sali  2001  Pieper et al   2006   The mag   nitude of errors in fold assignment  align   ment  and the modeling of side chains and  loops have decreased considerably  These im   provements are a consequence both of bet   ter techniques and a larger number of known  protein sequences and structures  Neverthe   less  all the errors remain significant and de   mand future methodological improvements  In  addition  there is a great need for more accurate  modeling of distortions and rigid body shifts   as well as detection of errors in a given pro   tein structure model  Error detection is useful    Current Protocols in Bioinformatics             100  APPLICATIONS    studying catalytic  mechanism    designing and improving  ligands    docking of macromolecules   prediction of protein partners    virtual screenings and  docking of small ligands    defining antibody epitopes    c  o  ak  a   gt     molecular replacement
175. ilar folds        Sequence identity  It is generally assumed that if sequences of two chains share over 4096  identity  then they are unambiguously homologous and structurally very similar  However   distantly related proteins may share very low sequence identity but still be structurally  similar     For each chain in the query structure  a table is presented showing significant hits against  each chain of the subject structure  Note that the first structure is named  moll   the  second structure is named    mol2     chain A of the first structure is  mollA    and so on   Suboptimal alignments are reported  the highest scoring alignment per any pair of chains  is highlighted by light blue background     4  To access information in the table for Results of Structure Comparison about struc   tural alignments  including secondary structure information  between the indicated  chains  click the    click here  link under the Structural Alignment category to generate  the alignment shown in Figure 5 5 3     5  To generate a coordinates file of the superimposed Ca traces for the indicated chains   viewable in RasMol  UNIT 5 4  or other PDB structure viewers  click the CA  1 pdb  link under Superimposed C alpha Traces  In the example  the C   trace shown in  Figure 5 5 4 is generated     Only the C   coordinates are transmitted  therefore use the backbone display in RasMol   Note that in the coordinates sent to RasMol the first structure chain  from moll   is renamed  Q  and the seco
176. ile format    compression PDB mmCIF Beta E     t       none  X x   Unix compressed  X x    GNU zipped Cazip X   X    ZiPped  x  X         Download the Biological Unit File                    xaaa                    Figure 5 4 11 The Download Display File page for oxyhemoglobin at the PDB  The link at the  bottom of the page allows access to coordinates of the biological unit     3  The opposite problem also occurs in other structure files  In these cases  there are  multiple biological units in the coordinate file  again due to the details of symmetry  and packing of molecules in the crystal  For instance  PDB entry 1hbs includes eight  chains  forming two complete hemoglobin tetramers  as shown in Figure 5 4 12  In  this case  however  the multiple structure is interesting  since it shows the presumed  stacking of this sickle cell hemoglobin  To show only the biological unit   i e   the  tetramer   the chain identifiers can be used to blank out one of the hemoglobin  tetramers  Alternatively  it is often easiest to edit the coordinates directly  using a  text editor to remove the unwanted chains     4  Another problem occurs when looking for proteins that are large or flexible  In these  cases  the researchers may have trimmed off flexible portions or cut the protein into  pieces for individual study  The example shown in Figure 5 4 13 is ATP synthase   which has been solved in different parts  These two pieces were taken from PDB  entries 1c17 and 1e79  There is no quick so
177. ill  contain the names of all the variants subdirectories     Each line should contain only a single variant  An example of the content of such a file with  three variants is listed below     variantA  variantB  variantC    Save the list file to the upper directory  i e     MyProtein list    29  Copy and paste the following file     gt  cp  usr local chi bin cns inp   MyProtein     30  Check that the parent directory    MyProtein  contains the appropriate files by  issuing the following command      5 ls   MyProtein     A typical directory listing with three homologs should be     continued    Current Protocols in Bioinformatics       Modeling  Structure from  Sequence    5 3 11       Supplement 4    Modeling  Membrane  Proteins    5 3 12       Supplement 4       variantA   variantB   variantC   GLY   GLY pdb  GLY psf  cns inp  list    3   Compare all the cluster averages from each homolog  obtained by GMDS by their  Ca RMSD  Look for the cluster that exists in all variants with a minimal RMSD  between every pair of variants  Issue all of the following commands from the parent  directory    MyProtein   To be sure that one is in the right directory  issue the  following command         cd   MyProtein     Run the following command to compare the different homologs        perl  usr local chi bin compare rmsd pl N   where N is a number that signifies the RMSD threshold in      There are several output files       MyProtein compare rmsd out  This file contains the list of clusters
178. in which Escherichia coli is selected     More details on the 41 species are described in the GTOP homepage   http   spock  genes nig ac jp  gtop old org html  which contains the results not only of  PSI BLAST but also of FASTA and normal BLAST  among others  Pearson and Lipman   1988  Altschul et al   1990      b  The lower part of the search page provides the following text boxes and radio  buttons for searching   2  Search for ORFs by Gene  ORF  Name   3  Search for  ORFs by PDB ID of Reference Protein   4  Search for ORFs by Motif Name  and   5  Search for ORFs by FAMS Results     The gene name used in the Search for ORFs by Gene  ORF  Name text box is based on the  gene names used in the GTOP Web site mentioned above  The motif name used in the Search  for ORFs by Motif Name text box is based on the PROSITE motifs  http   us ex   pasy org prosite   The FAMS results used in Search for ORFs by FAMS Results means  whether or not the model exists in the database  As an example  Figure 5 2 6 shows a query  for Gene Name   abc      Once the search criteria have been entered  click the Search button at the top of the  Web page     c  Alternatively  there are two additional text boxes in the lower part of the search  page  Search for ORFs by Hetero Atom of Reference Protein and Search for ORFs  by Amino Acid Sequence  After entering the corresponding information in the text  box es   click the Search button  for Search for ORFs by Hetero Atom of  Reference Protein  or the Submit
179. indow     As you might have noticed  when we play the animation  the protein movements are not very  smooth due to thermal fluctuations  as the simulation is performed under the conditions  that mimic a thermal bath   VMD can smooth the animation by averaging over a given  number of frames     10  In the Graphical Representations window  select your protein representation and  click on the Trajectory tab  At the bottom  you should see the Trajectory Smoothing  Window Size set to zero  As your animation is playing  increase this setting  Notice  that the motion gets smoother and smoother as the size of the smoothing window  is increased  Commonly used values for this setting are 1 to 5  depending on how  smooth you want your trajectory to be     Displaying multiple frames  We will now learn how to display many frames of the same trajectory at once     11  In the Graphical Representations window  highlight your protein representation by  clicking on it and press the Create Rep button  This creates an identical representation   but note that smoothing is set to zero  Hide the old protein representation     12  Highlight the new protein representation and click the Trajectory tab  Above the  smoothing control  notice the Draw Multiple Frames control  It is set to now by  default  which is simply the current frame  Enter 0  10 99  which selects every  tenth frame from the range 0 to 99     13  Go back to the Draw style tab  and change the Coloring Method to Timestep  This  will dr
180. indow  choose the File     Save  State menu item  Type an appropriate name  e g   myfirststate vmd  and save  it     The VMD state file myf  irststate vmd contains all the information needed to restore  a VMD session  including the viewpoints and the representations     To load a saved VMD state  start a new VMD session and in the VMD Main window  choose File     Load State     Quit VMD     The Basics of VMD Figure Rendering    One of VMD s many strengths is its ability to render high resolution  publication quality  molecule images  In this section  we will introduce some basic concepts of figure ren   dering in VMD     Setting the display background   Before rendering a figure  make sure that the OpenGL Display background is set up the  way you want  Nearly all aspects of the OpenGL Display are user adjustable  including  the background color     1   2     4     Start a new VMD session  Basic Protocol 1  and load the lubq  pdb file     In the VMD Main window  choose Graphics     Colors      The Color Controls  window should show up  Look through the Categories list  All display colors  for  example  the colors of different atoms when colored by name  are set here       In Categories  select Display  In Names  select Background  Finally  choose    8 white     in Colors  The OpenGL Display should now have a white background     When making a figure  we often do not want to include the axes  To turn off the axes   select Display     Axes     Off in the VMD Main window     Curr
181. ing and predicting structure of proteins   Proteins 5 355 373     Vakser  LA  1995  Protein docking for low   resolution structures  Protein Eng  8 371 377     van Gelder  C W   Leusen  F J   Leunissen  J A   and  Noordik  J H  1994  A molecular dynamics ap   proach for the generation of complete protein  structures from limited coordinate data  Proteins  18 174 185     van Vlijmen  H W  and Karplus  M  1997  PDB   based protein loop prediction  Parameters for  selection and methods for optimization  J  Mol   Biol  267 975 1001     Vernal  J   Fiser  A   Sali  A   Muller  M   Cazzulo   J J   and Nowicki  C  2002  Probing the speci   ficity of a trypanosomal aromatic alpha hydroxy  acid dehydrogenase by site directed mutagene   sis  Biochem  Biophys  Res  Commun  293 633   639     von Ohsen  N   Sommer  I   and Zimmer  R  2003   Profile profile alignment  A powerful tool for  protein structure prediction  Pac  Symp  Biocom   put  2003 252 263     Vriend  G  1990  WHAT IF  A molecular modeling  and drug design program  J  Mol  Graph 8 52   56  29     Wang  G  and Dunbrack  R L  Jr  2004  Scoring  profile to profile sequence alignments  Protein  Sci  13 1612 1626     Wolf    E   Vassilev  A   Makino  Y  Sali   A   Nakatani  Y   and Burley  S K  1998   Crystal structure of a GCN5 related N   acetyltransferase  Serratia marcescens amino   glycoside 3 N acetyltransferase  Cell 94 439   449     Worley  K C   Culpepper  P   Wiese  B A   and  Smith  R F  1998  BEAUTY X  Enhanced  BLAS
182. ing some shapes with the following examples     1  Hide all representations in the Graphics Representations window     2  Let us draw a point  Type the following command in your Tk Console window   graphics top point  0 0 10     Somewhere in your OpenGL window  there should be a small dot     3  Let us draw a line  Type the following command in your Console window  note  the  V  in command line means the next line is a continuation of the previous line   hence do not actually type            when you enter the following command  and do not  start a new line      graphics top line   10 0 0   0 0 0  width 5 styleN  solid    This will give you a solid line     4  You can also draw a dashed line     graphics top line  10 0 0   0 0 0  width 5 styleN  dashed    All the objects so far are all drawn in blue  You can change the color of the next graphics  object by using the command graphics top color colorid  The colorid for each  color can be found in Graphics     Colors    menu in VMD Main window  For example   the color for orange is    3        5  Type graphics top color 3 in the Tk Console window and the next object  you draw will appear in orange     6  Try the following commands to draw more shapes     graphics top cylinder  5 0 0   15 0 10  radius 10V  resolution 60 filled no   graphics top cylinder  0 0 0    5 0 10  radius 5   resolution 60 filled yes   graphics top cone  40 0 0   40 0 10  radius 10V  resolution 60   graphics top triangle  80 0 0   85 0 10   90 0 0    graphi
183. ion of potentially time consuming ex   perimental methodologies  see Suggestions for  Further Analysis      Suggestions for Further Analysis   Itis obvious that the best way to analyze the  results of any modeling exercise is by experi   mentation  There are several methods that can  beapplied  however most experiments  short of  directly solving the structure  are better suited  to refuting models  rather then confirming  them  The reason is that  typically  more than  one model can be consistent with the experi   mental results     Mutagenesis   Mutagenesis has been used in several in   stances to determine which residues are essen   tial for oligomerization of particular transmem   brane helices  This is possible only when an  oligomerization assay exists  as with glyco   phorin A  which remains dimeric in SDS PAGE   Lemmon et al   19922   In that series of experi   ments  several residues were identified that  were shown to line one side of a helix projection   Lemmon et al   1992b  Lemmon et al   1994    A solution NMR study in detergent micelles  has shown those residues to be intimately in   volved in the helix helix interface  MacKenzie  et al   1997   Mutagenesis has also been per   formed for phospholamban  which also re   mains a pentamer in SDS PAGE  Arkin et al    1994   In this instance  however  more than one  model was consistent with the mutagenesis    results and only aa direct structural method was  able to resolve this ambiguity  Torres et al    2000      
184. ion of suitable templates is achieved by  scanning structure databases  such as PDB   Deshpande et al   2005   SCOP  Andreeva  et al   2004   DALI  UNIT 5 5  Dietmann et al    2001   and CATH  Pearl et al   2005   with  the target sequence as the query  The detected    similarity is usually quantified in terms of se   quence identity or statistical measures such as  E value or z score  depending on the method  used     Three regimes of the sequence structure  relationship   The sequence structure relationship can be  subdivided into three different regimes in the  sequence similarity spectrum   1  the easily de   tected relationships  characterized by  gt 30   sequence identity   ii  the  twilight zone    Rost  1999   corresponding to relationships  with statistically significant sequence similar   ity  with identities in the 1096 to 3096 range   and  iii  the    midnight zone   Rost  1999    corresponding to statistically insignificant se   quence similarity     Pairwise sequence alignment methods   For closely related protein sequences with  identities higher than 30  to 40   the align   ments produced by all methods are almost  always largely correct  The quickest way to  search for suitable templates in this regime  is to use simple pairwise sequence alignment  methods such as SSEARCH  Pearson  1994    BLAST  Altschul et al   1997   and FASTA   Pearson  1994   Brenner et al   1998  showed  that these methods detect only  18  of the  homologous pairs at less than 4096 sequ
185. is a computer program for comparative protein structure modeling  Sali  and Blundell  1993  Fiser et al   2000   In the simplest case  the input is an alignment  of a sequence to be modeled with the template structures  the atomic coordinates of the  templates  and a simple script file  MODELLER then automatically calculates a model  containing all non hydrogen atoms  within minutes on a Pentium processor and with no  user intervention  Apart from model building  MODELLER can perform additional auxil   iary tasks  including fold assignment  Eswar  2005   alignment of two protein sequences  or their profiles  Marti Renom et al   2004   multiple alignment of protein sequences  and or structures  Madhusudhan et al   2006   calculation of phylogenetic trees  and  de novo modeling of loops in protein structures  Fiser et al   2000      NOTE  Further help for all the described commands and parameters may be obtained  from the MODELLER Web site  see Internet Resources    Necessary Resources    Hardware  A computer running RedHat Linux  PC  Opteron  EM64T Xeon64  or Itanium  2 systems  or other version of Linux Unix  x86 x86 64 1A64 Linux  Sun  SGI   Alpha  AIX   Apple Mac OSX  PowerPC   or Microsoft Windows 98 2000 XP  Software  The MODELLER 8v2 program  downloaded and installed from  http   salilab org modeller download_installation html  see Support Protocol   Files    All files required to complete this protocol can be downloaded from  http   salilab org modeller tutorial basic e
186. istidine by atom type     6  Rotate and scale the display to find a satisfactory view of the interaction  like that in  Figure 5 4 15     7  Type the following command     RasMol gt  cpk    This will use a spacefilling representation for the histidine  as in Figure 5 4 16  Notice  that the picture is more confusing now  and it is difficult to tell if the histidine is part of  the protein or part of the heme  By mixing different representations  one always runs the  risk of creating this type of confusion     GUIDELINES FOR UNDERSTANDING THE RESULTS    To create effective molecular graphics requires a combination of scientific background  and aesthetic judgement  When approaching a new project  it is first necessary to define  what needs to be shown  and then develop a representation that clearly shows it  Two  guidelines will assist in this process     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 4 21       Supplement 11    Representing  Structural  Information with  RasMol    5 4 22       Supplement 11    Define the Medium and the Audience    Before sitting down at the computer  it is important to understand the goals of the  graphics session  For instance  at the beginning of a project the goal may be to display an  entirely new structure and do some exploration  Alternatively  the goal may be to create  a figure for journal publication that shows the specifics of binding of a ligand within  an enzyme active site  These two goals will each 
187. l command  measure fit to align two molecules     Open the VMD TkConsole window by choosing Extension   TkConsole from the VMD  Main menu  and input the following commands     set sel0  atomselect 0 all   set sell  atomselect 1 all   set M  measure fit  sel0  sell    sel0 move SM    measure fit selectionl selection2   measures the transformation  matrix that best aligns the coordinates of selectionl with the coordinates of  selection2     Current Protocols in Bioinformatics    BASIC  PROTOCOL 10    Modeling  Structure from  Sequence    5 7 31       Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 32       Supplement 24                      Figure 5 7 20 Result of the alignment between the two aquaporins using the measure fit  command  For the color version of this figure go to http  www currentprotocols com     As soon as you enter the last command line  you can see that the two aquaporins are  now overlapping  Fig  5 7 20   The    helical regions of the aquaporins agree very well   with bigger deviations in the loop regions  Note that the measure fit command can  only work if two molecules have the same number of atoms  In this case  it is a pure  coincidence that the human aquaporin and E  coli aquaporin PDB files have the same  number of atoms  The measure fit command is hence most useful in aligning the  same protein in different conformations or different frames of a molecular dynamics  simulation trajectory  Generally  to compare the structures of different
188. l intramolecular distances  between the Ca atoms  Such a distance ma   trix is independent of coordinate frame but  contains more than enough information to re   construct the 3D coordinates  except for over   all chirality  by distance geometry methods   Imagine sliding a  transparent  distance ma   trix on top of another one  Depending on the  register of the two matrices  similar substruc   tures will stand out as submatrices with similar  patterns  Structurally equivalent regions can  be filtered out with a fixed cutoff on accept   able differences of intramolecular distances or   as the authors prefer  with a continuous func   tion defined in terms of relative distance devia   tions  The common structure is revealed when  two distance matrices are brought into register  by keeping only rows or columns correspond   ing to the structurally equivalent residues   Fig  5 5 14     The Dali program has a modular architec   ture  where the structure alignment database  searching problem is approached by a cascade  of algorithms  The Dali package consists of  many Fortran programs and Perl5 scripts  The  program flow is controlled by a Perl wrap     per script that calls other programs as needed   Each program implements pairwise structure  comparisons using different algorithms  Ref   erences for these programs are given in Ta   ble 5 5 2  The goal of a database search is to  find all structures that are significantly sim   ilar to the query  A conceptual map of fold  space is d
189. ld be  less than 2596  Proteins with higher sequence identity usually have very similar folds  A  typical summary of structural neighbors is shown in Figure 5 5 10  See Basic Protocol 5  for a description of this     3  Use the DaliLite server for pairwise comparison  Basic Protocol 1  to visualize  interesting pairs of structures     Current Protocols in Bioinformatics       Z  Dali   Microsoft Internet Explorer    Ele Edt View Favortes Tools Help       Hek   gt    Q  3   Quem rvs A D SO   S       http   www  ebi  ac ukjdali Interactive html         Dali Index     Anonymous FTP    Dali Help     Email Request    MaxSprout     Web Access       uropean Bioinformatics Institute    Compose Dali request    Database search   3D coordinates x PDB database  Pairwise comparison  3D coordinates x 3D coordinates    SubmitQuery   Reset                Figure 5 5 5 Interactive submission menu of the Dali server        Dali   Microsoft Internet Explorer    Ele Edt wew Favorites Joos Help       Hek   gt    OL    Qseach Girone G  D GO   Sl       Address      http j vwew ebi ac uk dal Interactive htrnl    uropean Bioinformatics Institute      Dali Index     Anonymous FTP    Dali Help     Email Request    MaxSprout     Web Access       Database search form    Your e mail address    The results of the search will be retumed by email  Please type carefully     For commercial users only  password      Your structure  upload file         gBrewsex    Note  File uploading is not supported by older br
190. le Browser window  Fig  5 7 2B  will appear on the  Screen     2  Use the Browse     Fig  5 7 2C  button to find the file 1ubq   pdb  When the file is  selected  you will be back in the Molecule File Browser window  In order to actually  load the file  press Load  Fig 5 7 2D      3  Now  ubiquitin is shown in the OpenGL Display window  Close the Molecule File  Browser window at any time     VMD can download a pdb file from the Protein Data Bank  http   www pdb org  if a  network connection is available  Just type the four letter code of the protein in the File  Name text entry of the Molecule File Browser window and press the Load button  VMD  will download it automatically     Displaying the molecule    In order to see the 3D structure of our protein  the mouse will be used in multiple modes  to change the viewpoint  VMD allows users to rotate  scale  and translate the viewpoint  of the molecule     4  In the OpenGL Display  press the left mouse button down and move the mouse   Explore what happens  This is the rotation mode of the mouse and allows for rotation  of the molecule around an axis parallel to the screen  Fig  5 7 3A                     Figure 5 7 2 Loading a molecule     Current Protocols in Bioinformatics    BASIC  PROTOCOL 1    Modeling  Structure from  Sequence    5 7 3       Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 4       Supplement 24                             Figure 5 7 3 Rotational modes   A  Rotation axes when holding down the left
191. li  2003b   The main rea   sons for choosing this implementation are  the generality and conceptual simplicity of  scoring function minimization  as well as  the limitations on the database approach that  are imposed by a relatively small number  of known protein structures  Fidelis et al    1994   Loop prediction by optimization is  applicable to simultaneous modeling of sev   eral loops and loops interacting with lig   ands  which is not straightforward with the  database search approaches  Loop optimiza   tion in MODELLER relies on conjugate gra   dients and molecular dynamics with simulated  annealing  The pseudo energy function is a  sum of many terms  including some terms  from the CHARMM22 molecular mechanics  force field  MacKerell et al   1998  and spatial  restraints based on distributions of distances   Sippl  1990  Melo et al   2002  and dihe   dral angles in known protein structures  The  method was tested on a large number of loops  of known structure  both in the native and near   native environments  Fiser et al   2000      Comparative model building by iterative  alignment  model building  and model    assessment  Comparative or homology protein struc     ture modeling is severely limited by errors  in the alignment of a modeled sequence with  related proteins of known three dimensional  structure  To ameliorate this problem  one can  use an iterative method that optimizes both  the alignment and the model implied by it   Sanchez and Sali  1997a  Miwa et al  
192. li  A  and Overington  J P  1994  Derivation of  rules for comparative protein modeling from a  database of protein structure alignments  Protein  Sci  3 1582 1596     Samudrala  R  and Moult  J  1998  A graph   theoretic algorithm for comparative modeling  of protein structure  J  Mol  Biol  279 287 302     Sanchez  R  and Sali  A  1997a  Advances in  comparative protein structure modelling  Curr   Opin  Struct  Biol  7 206 214     Sanchez  R  and Sali  A  1997b  Evaluation of  comparative protein structure modeling by  MODELLER 3  Proteins 1 50 58     Sanchez  R  and Sali  A  1998  Large scale pro   tein structure modeling of the Saccharomyces  cerevisiae genome  Proc  Natl  Acad  Sci  U S A   95 13597 13602     Saqi  M A   Russell  R B   and Sternberg  M J  1998   Misleading local sequence alignments  Implica   tions for comparative protein modelling  Protein  Eng  11 627 630     Sauder  J M   Arthur  J W   and Dunbrack  R L   Jr  2000  Large scale comparison of protein  sequence alignment algorithms with structure  alignments  Proteins 40 6 22     Schwarzenbacher  R   Godzik  A   Grzechnik  S K    and Jaroszewski  L  2004  The importance of  alignment accuracy for molecular replacement   Acta Crystallogr  D Biol  Crystallogr  60 1229   1236     Schwede  T   Kopp  J   Guex  N   and Peitsch  M C   2003  SWISS MODEL  An automated protein  homology modeling server  Nucl  Acids Res   31 3381 3385     Selzer  P M   Chen  X   Chan  VJ   Cheng  M    Kenyon  G L   Kuntz  LD   Saka
193. li Database    Dali Fold Classification    LAST UPDATE  March 2005    The Dali database is based on exhaustive all against all 3D structure comparison of protein structures currently  in the Protein Data Bank  PDB   The classification and alignments are automatically maintained and regularly  updated using the Dali search engine    Fold Classification        FOLD INDEX  the complete list of structural domains in PDB90 ordered by similarity  From here  you can  browse the list of structural neighbours and alignments for each representative        FOLD TREE a tree of the structural domains in PDB90 in postscript format    Search for PDB Identifier or Protein  PLEASE NOTE  PDB structures released after the    Enter PDB identifier  protein name or k rd  last Dali DB update will not be in the database  If   estradiol recepton you wish to find structural neighbours of such a    protein  you are advised to submit the structure to  the Dali Server at the EBI instead  Ei ME           DALI DOWNLOADS  for sequence files  mysql dumpfiles  and the DaliLite standalone application       DALI HELP  using the Dali Database  explanation of terms  all references    4 SEES SS qs ci m     ND eoe E             Figure 5 5 7 Home page of the Dali database  The user has typed estradiol receptor in  the query box     Current Protocols in Bioinformatics       2  Dali Database   Microsoft Internet Explorer    Ele Edit View Favorites Tools Hep      v Bak v      Q  3   Gseach  aiFavorites Meda    Ev GR v 
194. lices in a hetero oligomer     Clicking    View file    will allow one to view the chi_param that was created     Choosing Edit file  see following  will allow one to edit all of the parameters in the  chi_paran file     Current Protocols in Bioinformatics       Modeling  Structure from  Sequence    5 3 5       Supplement 4    Modeling  Membrane  Proteins    5 3 6       Supplement 4                   Figure 5 3 4 CHI  Create setup  Edit Sequence screen                       Figure 5 3 5 CHI  Edit setup    first screen for editing an existing parameters file     To edit a parameter file that already exists   10b  In the CHI main menu on the left hand side of the CHI home page  Fig  5 3 2   click  on    Edit setup     In the first    Edit setup  screen that appears  Fig  5 3 5   enter the  full path and the name ofthe chi param file  or click the Browse button  navigate  to its location  and select it     11b  Click on    Edit file   Note the molecule structure parameters on the new screen that  appears  Fig  5 3 6      Name of molecule  Number of helices  homo oligomer  true false     12b  If one has chosen to simulate a hetero oligomer  set the next parameters for each  helix individually  otherwise they should only be set once      Sequence   Residue number at start of sequence   Initial rotation offset around helix axis  the starting rotation angle about the  helix axis relative to some arbitrary starting position  angle     in Figure  5 3 1  default is 0 0      continued  
195. lustalW  or  Pileup from the GCG Wisconsin package     Current Protocols in Bioinformatics    Install software and set up environment  1  Install CNSsolve as follows  more detailed installation instructions can be found on  the CNSsolve Web page  http   cns csb yale edu      a  Uncompress and extract the CNSsolve tar archive in  usr local       2   tar  xzf cns solve 1 1 basic inputs tar gz       b  Assuming that the above file was uncompressed in  usr local  there is now  a new directory      usr local cns solve 1 1   c  Using any text editor  edit the file    usr local cns solve 1 1 cns solve env    by changing only one line  as follows  assuming that CNSsolve is located in   usr local cns solve 1 1      setenv CNS SOLVE  usr local cns solve 1 1    d  In order to compile the program  in the CNSsolve directory that was created in  substep 1b   usr l1ocal cns solve 1 1   type       gt  make install    This process may take several minutes depending on the computer platform  at the end of  which there is a new executable program called cns     2  Install CHI as follows   a  Uncompress and extract the CHI tar archive       gt  tar  xzf chi tar gz    b  Assuming that the above file was uncompressed in  usr local   there is now  a new directory      usr local chi   c  Using a text editor edit the file    usr local chi chi env    by changing only one line  as follows  assuming that CHI is located in  usr   l  cal eni      setenv CHI ROOT  usr local chi    d  In order to compile th
196. lution to this problem  unfortunately   Careful study of the published reports is necessary to ensure that the functionally  relevant portion of the molecule is being displayed      Modeling    Structure from  Sequence       5 4 15    Current Protocols in Bioinformatics Supplement 11    Representing  Structural  Information with  RasMol    5 4 16       Supplement 11             r        File Display Colours Options Settings Export Help                  Figure 5 4 12 Overview representation of sickle cell hemoglobin from PDB entry 1hbs  For the  color version of this figure go to http  www currentprotocols com                 File    Display Colours    Options Settings Export                   Figure 5 4 13 ATP synthase in a spacefilling representation  For the color version of this figure  go to http  www currentprotocols com     Current Protocols in Bioinformatics    CUSTOMIZING A RasMol SESSION    When beginning to use a new molecular graphics program  it is common practice to use  the default parameters during the learning process  However  these default settings are  only guidelines  and many simple modifications can improve the utility of the program for  different applications  The important thing is to understand the goal of the representation  when beginning  For instance  one type of display is needed to understand the effect of  a point mutation in hemoglobin  and a different display is needed to show the allosteric  changes between oxy and deoxy forms  Much of the 
197. lutionary relationships down to a  sequence identity of about 25   Below this  level of sequence identity starts the  twilight  zone  of similarity  Comparing structures can  help to extend the validity of an evolution   ary relationship between proteins through this  zone  This is because the structure of proteins  is much better preserved during evolution than  the sequence  Chothia and Lesk  1986   By  searching structural databases  molecular bi   ologists can gain a considerable amount of  information about connections between pro     tein families that are unseen using sequence  alone  The prediction of protein function based  on structure aims at the unification of pro   tein families into larger sets  superfamilies    Functionally divergent families classified into  the same superfamily typically exploit a con   served mechanical or biochemical mechanism  that has been adapted to different cellular  processes and substrates  Holm and Sander   1996   Inferring complex conserved properties  is the basic reason for providing the systematic  structure structure comparison and classifica   tion of available proteins    Improved methods of protein engineering   crystallography and NMR spectroscopy have  led to a surge of new protein structures de   posited in the Protein Data Bank  PDB   At the  end of 2004  the PDB contained over 28 000  protein structures  and the structural genomics    Current Protocols in Bioinformatics    initiative aims to provide a structure for each
198. ly report coordinates for half  Fortunately  the  need for appropriate biological units has become clear  and the PDB has a facility  for downloading coordinate sets with the presumed biological unit  These may be  found at the bottom of the Download Display File page for the structure  as shown  in Figure 5 4 11           File Display Colours Options Settings Export                      Figure 5 4 10 Overview representation of the coordinate file for oxyhemoglobin in PDB entry  1hho  For the color version of this figure go to htip  Awww currentprotocols com     Representing  Structural  Information with  RasMol    5 4 14          Supplement 11 Current Protocols in Bioinformatics             v  Structure Explorer   1HHO   Mozilla eH     Ele Edit View Go Bookmarks Tools Window Help       3  MEM n uf n B tp  jw rcsb org pdbjcglexplore cgi7job donnload amp gdbid   v   Search      ININ    7  bHome   ufBookmarks 4f The RCSB Protein Dat             PIDE Structure Explorer   1HHO    PROTEIN DATA BANK    Tite Structure of human oxyhaemoglobin at 2 1 A resolution   Classification  Oxygen Transport   Compound Hemoglobin A  Oxy    Exp  Method X ray Diffraction    i Try the Structure Explorer page for 1 HHO from the new  reengineered RCSB PDB Web site        Display the Structure File   Choose from the following data representation formats      header    only      no HIML TEXT     coordinates  L  Download the Structure File     Choose from the following file and compression formats     f
199. m  Sequence    5 7 41       Supplement 24                Ret 4 Top    Average   Selected  Mi Trajectory Frame ret O JAI    Sup e     Time      Pws  A    Backbone _j Trace M noh History  Wi Plot  Save to fle  F       Overs   1173  0371  0000  1423  61     Erase selected Add all   Add active                                Figure 5 7 28 RMSD Trajectory Tool  The RMSD is plotted for the equilibration of ubiquitin        read v   Frame Tgrotein  and noh                         Figure 5 7 29 RMSD versus time for the equilibration  blue  and pulling  red  trajectories of  ubiquitin  For the color version of this figure go to http  www currentprotocols com   Using VMD  An  Introductory  Tutorial          5 7 42    Supplement 24 Current Protocols in Bioinformatics    that the protein has relaxed from its initial crystal structure  which is affected by crystal  packing and usually misses some atoms  e g   hydrogens  to a more stable one  Production  molecular dynamics simulations are usually preceded by such equilibration runs  where  the protein is allowed to relax  the process is monitored by checking RMSD versus time   and equilibration is assumed to be sufficient when RMSD levels off  The RMSD of 1 5   is  an acceptable value for most protein simulations  Usually  the deviations from the crystal  structure in a simulation are due to the thermal motion and to the relaxation process  mentioned  imperfections of the simulation force fields contribute as well     5  Wewill now work wit
200. m  meen      Mycoplasma pneumoniae  pneu      Mycobacterium tuberculosis  tub   l  Neisseria meningitidis  men   F Pseudomonas aeruginosa  porr     Pasteurella multocida mu     Rickettsia prowazekii piod  V Synechocystis PCC6R03 Gyne      Thermotoga maritima  mar      Treponema palfidum  ipa   V  Lreaplasma urealyticum  ware      Viteio cholerae  cho                    Figure 5 2 5 If a particular species is of interest  one may click the check boxes to the left of the  species names  In this figure  Escherichia coli is selected     Modeling  Structure from  Sequence    5 2 9          Current Protocols in Bioinformatics Supplement 4       2  E htte  tamsbase bio nseoyacu ac p cei biv lamebace reeister cei j 3    Gene Name  fae  x  0002     POB 1D    fex TRKD TEYH A    Keywords    lex amino     Keywords    x RNA factor        Search for ORFs with 30 structures produced by FAMS      Search tor ORFs without 3D structures produced by FAMS      both    Hetero Atom Code    Qetero Atom List     File Upload 2e                 Figure 5 2 6 To search the database using an ORF or protein name  input the name directly into  the text box  As an example  an ORF named    abc    has been input     FAMS and  FAMSBASE for  Protein Structure    5 2 10          Supplement 4 Current Protocols in Bioinformatics        rmm    ey x FRA tester       Search for ORFs with 3D structures produced by FAMS      Search for ORFs without 3D structures produced by FAMS   5 both    Hetero Atom Code     Hetero A
201. milar  protein structures  The analysis relied on a  database of 105 family alignments that in   cluded 416 proteins of known 3 D structure   Sali and Overington  1994   By scanning the  database of alignments  tables quantifying var   ious correlations were obtained  such as the  correlations between two equivalent C  C   distances  or between equivalent main chain  dihedral angles from two related proteins  Sali  and Blundell  1993   These relationships are  expressed as conditional probability density  functions  pdf   s   and can be used directly as  spatial restraints  For example  probabilities  for different values of the main chain dihedral  angles are calculated from the type of residue  considered  from main chain conformation of  an equivalent residue  and from sequence sim   ilarity between the two proteins  Another ex   ample is the pdf for a certain C  C  distance  given equivalent distances in two related pro   tein structures  An important feature of the  method is that the form of spatial restraints  was obtained empirically  from a database of  protein structure alignments    Stereochemical restraints  In the sec   ond step  the spatial restraints and the  CHARMM22 force field terms enforcing  proper stereochemistry  MacKerell et al    1998  are combined into an objective func   tion  The general form of the objective func   tion is similar to that in molecular dynamics  programs  such as CHARMM22  MacKerell  et al   1998   The objective function depends  on
202. mission may be more convenient for larger sets of  queries     Necessary Resources  Hardware    Computer connected to the Internet    Software  Internet browser  e g   Internet Explorer  http   www microsoft com  Netscape     http   browser netscape com  or Firefox  http   www mozilla org firefox   E mail account    Files    Atomic coordinates of protein structure in PDB format    To submit coordinates interactively    la  Go to http   www ebi ac uk Interactive html  The submission page is shown in  Figure 5 5 5    2a  Click on the    3D structure x PDB database    link below    Database search  to access  the    Database search form  shown in Figure 5 5 6  Type in the e mail address to  which results are to be sent  ignore the password box  and upload the coordinate file   Click on the    Submit query    button     The results will be sent to the e mail address provided on the submission page  Type  carefully     To submit coordinates by e mail    1b  Send an e mail message containing the PDB coordinates in plain text to  dali ebi ac uk     The submission will fail unless the message is plain text  Encoded messages  e g   MIME  or BinHex  are rejected by the server     2b  An e mail with the results may be expected within a few days of submission  In case  of longer delays  notify dali help ebi ac uk     The comparison is carried out against a representative subset of PDB structures  The set  is constructed so that the sequence identity between any two chains in the set shou
203. mpirically chosen threshold of Z      2  This captures most cases of topological  similarity of globular domains  However  in  some fold types structural similarities between  parts of globular domains also score above this  threshold     Known similarity not reported   The Dali server currently reports simi   larities only to PDB25 representatives  The  purpose of using PDB25 is to suppress the  redundancy of output due to multiple struc   ture determinations of mutants or of the same  protein in slightly differing conditions  Thus   a particular PDB entry  known to be struc   turally similar to the query  might appear  to be missing from the output list only be   cause the representative structure is a dif   ferent PDB entry  The Dali database reports  similarities between PDB90 representatives   The PDB90 representatives for any PDB entry  can be found by using the search function   ality on the homepage of the Dali database   http   www  bioinfo  biocenter helsinki fi dali      Current Protocols in Bioinformatics    Empty result   The Dali database includes all peptide  chains from the PDB  except Ca only entries  and chains that are shorter than thirty residues   DaliLite requires that the backbone atoms  N   Ca  C  O  must be complete  The user can  build a complete backbone model from the Cx  trace using the MaxSprout Server  The Dali  server runs MaxSprout automatically  if only  a C   trace is submitted  The submission to  the Dali server will fail unless the message
204. mplates  ie   lt 25  se   quence identity   Distinguishing between a  model based on an incorrect template and a  model based on an incorrect alignment with  a correct template is difficult  In both cases   the evaluation methods will predict an unreli   able model  The conservation of the key func   tional or structural residues in the target se   quence increases the confidence in a given fold  assignment     Predicting the model accuracy   The accuracy of the predicted model de   termines the information that can be extracted  from it  Thus  estimating the accuracy of a  model in the absence of the known structure is  essential for interpreting it     Current Protocols in Bioinformatics    Initial assessment of the fold  As discussed  earlier  a model calculated using a template  structure that shares more than 30  sequence  identity is indicative of an overall accurate  structure  However  when the sequence iden   tity is lower  the first aspect of model evalu   ation is to confirm whether or not a correct  template was used for modeling  It is often the  case  when operating in this regime  that the  fold assignment step produces only false pos   itives  A further complication is that at such  low similarities the alignment generally con   tains many errors  making it difficult to dis   tinguish between an incorrect template on one  hand and an incorrect alignment with a cor   rect template on the other hand  There are sev   eral methods that use 3 D profiles and statist
205. n     Modeling  Structure from  Sequence    5 2 13          Current Protocols in Bioinformatics Supplement 4    FAMS and  FAMSBASE for  Protein Structure    5 2 14       Supplement 4             RasMol Version 2 6    File Edit Display Colours Options Export Help                Figure 5 2 10 A Superimpose view using RasMol  The model is in blue and the template is in  green  This black and white facsimile of the figure is intended only as a placeholder  for full color  version of figure go to http   www interscience  wiley com c_p colorfigures htm     Current Protocols in Bioinformatics       ee RasMol Version 26       File Edit Display Colours Options Export Help                Figure 5 2 11 The model viewed after clicking the View Target button  This black and white  facsimile of the figure is intended only as a placeholder  for full color version of figure go to  http  www  interscience  wiley com c_p colorfigures htm     Current Protocols in Bioinformatics       Modeling  Structure from  Sequence    5 2 15       Supplement 4    FAMS and  FAMSBASE for  Protein Structure    5 2 16       Supplement 4                   E Homology Modeling Service Home Page    Microsoft Internet Explorer    74  BRO ATW BRCADA Y 1D aww EI  cov Xm     ud m a        ES E2 kL TO BRAD       PELA W http   physchempharmkitasato u ac jp F AMS  AS    Modeling Service for Protein     s    Full Automatic Modeling System  FAMS     Department of Biomolecular Design  School of Pharmaceutical Sciences  Kitasa
206. n 5 5 1    where i and j label residues  L is the number of matched pairs  the size of each substruc   ture   and       is a similarity measure based on some pairwise relationship  in this case  on  the Ca C   distances dj  dj  Unmatched residues do not contribute to the overall score   For a given functional form of q i j   the largest value of S corresponds to the optimal  set of residue equivalences     Structural similarity algorithms  in this case  search for the largest common substruc   ture between two proteins  but one needs to define a similarity measure that balances  two contradictory requirements  maximizing the number of equivalenced residues and  minimizing structural deviations  The use of relative rather than absolute deviations of  equivalent distances is tolerantto the cumulative effect of gradual geometrical distortions   In Dali  the residue pair score   has the form of the equation     d  de  o i      El w d      ij  Equation 5 5 2    where d   is the average of dj  dj  0 is the similarity threshold  and w is an envelope  function  Dali uses the value of 0 equal to 0 2  Since pairs in the long distance range are  abundant but less discriminative  their contribution is weighted down by the envelope  function w r    exp   r  o   where      20 A  calibrated on the size of a typical do   main  Alignments generated using the similarity measure of Equation 5 5 2 are reported   imposing the constraint of strictly sequential alignment  The resulting raw Dali sco
207. n algorithms  The  search algorithms include the minimum per   turbation method  Fine et al   1986   molec   ular dynamics simulations  Bruccoleri and  Karplus  1990  van Vlijmen and Karplus   1997   genetic algorithms  Ring et al   1993    Monte Carlo and simulated annealing  Higo  et al   1992  Collura et al   1993  Abagyan  and Totrov  1994   multiple copy simultane   ous search  Zheng et al   1993   self consistent  field optimization  Koehl and Delarue  1995    and enumeration based on graph theory   Samudrala and Moult  1998   The accuracy  of loop predictions can be further improved  by clustering the sampled loop conformations  and partially accounting for the entropic con   tribution to the free energy  Xiang et al   2002    Another way to improve the accuracy of loop  predictions is to consider the solvent effects   Improvements in implicit solvation models   such as the Generalized Born solvation model   motivated their use in loop modeling  The sol   vent contribution to the free energy can be  added to the scoring function for optimiza   tion  or it can be used to rank the sampled loop  conformations after they are generated with a  scoring function that does not include the sol   vent terms  Fiser et al   2000  Felts et al   2002   de Bakker et al   2003  DePristo et al   2003      Current Protocols in Bioinformatics    Loop modeling in MODELLER  The loop   modeling module in MODELLER implements  the optimization based approach  Fiser et al    2000  Fiser and Sa
208. n personal computers  the program will appear as a RasMol icon  On Linux machines   the program will appear as a file with a name like xzasmol 8BIT or rasmol  32BIT     3  On workstations  ensure that the permission is set correctly for an executable file   for instance  with the command     chmod a x rasmol 32BIT    DOWNLOADING COORDINATES FROM THE PROTEIN DATA BANK  The Protein Data Bank  uwir 1 9  is the primary repository of protein structure data  It is  designed for easy searching and downloading  This protocol describes how to download  the coordinates of hemoglobin   Necessary Resources  Hardware  The Protein Data Bank on a variety of computer hardware  including personal  computers  Software    An Internet browser is required     1  On the main PDB WWW page  http   www pdb org   type 2hhb in the Search the  Archive box  then hit the Search button     This will load the Structure Explorer page for the structure   2  Click on the Download Display File link on the left side   3  Click on the link for    complete with coordinates  in the    PDB    and    TEXT    format     4  Click the    Save full entry to disk  button  This will download the file 2HHB   pdb to  the local computer     Coordinates for thousands of other biomolecules at the Protein Data Bank may be ac   cessed in a similar way  On the main PDB WWW page  one may use the Search the Archive  box to search the database using the names of molecules  authors  molecule types  and a  variety of different searchin
209. n structure models  in GTOP  Genomes TO Protein structures  and functions  alignment  calculated by  FAMS  Both GTOP and FAMSBASE are pro   jects of the Japanese government    The basic FAMS algorithm consists of a  database search and simulated annealing  The  first step obtains the C  coordinates  the sec   ond step  the backbone  the third step  side  chains  and the last step  all atoms    The effectiveness of the software was  highlighted by its performance in the CA   FASP2 and CAFASP3 competitions  Fischer  et al   2001   especially in terms of side chain  accuracy  with good performance in regard to  the backbone as well  CAFASP  Critical As   sessment of Fully Automated Structure Pre   diction  is a competition for determining the  best software of this kind  Another competi   tion  CASP  Critical Assessment of Tech   niques for Protein Structure Prediction  de   termines the best researcher in this area   CASP experiments were started in 1994 as  CASPI  and continued biennially through to  2002 as CASP5  CAFASP experiments were  started at the same time as CASP3  beginning  with CAFASPI  and hence CAFASP3 was  running in 2002  Results from the compara   tive modeling section of CASP5 suggested  that fully automated building procedures  were less accurate than procedures with hu   man intervention  Iwadate et al   2001   Hu   man intervention worked effectively on  CASP5 and the assessments have highlighted    Current Protocols in Bioinformatics    the algorithmic im
210. n structures is shown in Figure 5 5 1        T  Dalit ite Pairwise comparison of protein structures   Microsoft Internet Explorer   Ele Edt ew Favortes Tools Help   Hek   gt    OF   Quee res   SO   a   ae ee ot 7  eee                      Help Index BEING Pairwise comparison of protein structures     General Help     Formats DaliLite is    program for pairwise structure comparison  Compare your structure first structure  to  a reference structure second structure       References      DaliLite Help       First Structure Second Structure    POB entry code  igku Chain 10   A PDB entry code  Ik4w Chain 10   4l      or upload a file in or upload a file in PDB format  pdb  ent  dat  bri   gen                    Page Maintained by EBI Support  Last updated  02 09 2005 13 59 28   amp  View Prirter triendhy version of this page   Terms of Use    Using Dali for  Structural  Comparison of    Proteins Figure 5 5 1 Submission page of the DaliLite server     5 9 2                      Supplement 14 Current Protocols in Bioinformatics    2  Input First and Second Structures in the submission page  Fig  5 5 1  as PDB entry  codes  for known structures  or upload user specific coordinate files in PDB  UNIT 1 9   format  For example  to compare the structures of 1qku  estrogen nuclear receptor   ligand binding domain  and 1k4w  orphan nuclear receptor  ROR beta ligand binding  domain   enter the PDB identifiers in the    PDB entry code    boxes as shown in  Figure 5 5 1  or enter the  pdb fil
211. nari  J A    Cohen  F E   and McKerrow  J H  1997  Leish   mania major  Molecular modeling of cysteine  proteases and prediction of new nonpeptide in   hibitors  Exp  Parasitol  87 212 221     Sheng  Y   Sali  A   Herzog  H   Lahnstein  J   and  Krilis  S A  1996  Site directed mutagenesis of  recombinant human beta 2 glycoprotein I iden   tifies a cluster of lysine residues that are criti   cal for phospholipid binding and anti cardiolipin  antibody activity  J  Immunol  157 3744 3751     Current Protocols in Bioinformatics    Shenkin  P S   Yarmush  D L   Fine  R M   Wang   H J   and Levinthal  C  1987  Predicting anti   body hypervariable loop conformation  I  En   sembles of random conformations for ringlike  structures  Biopolymers 26 2053 2085     Shi  J   Blundell  T L   and Mizuguchi  K  2001   FUGUE  Sequence structure homology recog   nition using environment specific substitution  tables and structure dependent gap penalties   J  Mol  Biol  310 243 257     Sibanda  B L   Blundell  T L   and Thornton  J M   1989  Conformation of beta hairpins in protein  structures  A systematic classification with ap   plications to modelling by homology  electron  density fitting and protein engineering  J  Mol   Biol  206 759 777     Sippl  M J  1990  Calculation of conformational en   sembles from potentials of mean force  An ap   proach to the knowledge based prediction of lo   cal structures in globular proteins  J  Mol  Biol   213 859 883     Sippl  M J  1993  Recognition of 
212. nates in PDB format as input  The comparison is usually quite fast  and  results should be returned after about one minute  A search against all known structures  takes much longer and can be performed using the DALI Server  Basic Protocol 2   This  server is routinely used by protein crystallographers to compare a newly solved structure  to known structures in the database in order to detect possible evolutionary relationships   The structure neighbors of proteins already in the PDB  Protein Data Bank  can be found  in the Dali database  Its Web interface allows browsing of the hierarchical classification  of protein structures based on all against all comparisons of known structures  Basic  Protocol 3  Dali database    Table 5 5 1 Overview of Dali Resources and Their Relations  DaliLite Dali server Dali database ADDA database  Input Two  lists of  PDB One PDB structure All PDB structures All known protein  structures sequences  Steps Pairwise structure Database search using Remove redundancy Remove redundancy  comparison cascaded algorithms  All against all structure All against all  comparison sequence comparison  Domain decomposition Domain decomposition  Clustering Clustering  Output Structure neighbors of Structure neighbors of Protein fold classification Protein family  query query classification  Protocol Basic Protocol 1 Basic Protocol 2 Basic Protocol 3 Linked to Dali database  Alternate Protocols 1  and 2  Support Protocol  Modeling  Structure from  Sequence  HEN  
213. nd       MyProtein GLY   Create a new chi param file  also see steps 10a to 15a and 10b to 15b  in which  the molecule name is GLY and all the amino acids are glycine  Fig  5 3 7   Leave all    other parameters exactly as they are in all the other variants  parameter files  including  the length of the sequence   Save that file in   MyProtein GLY  directory     Current Protocols in Bioinformatics       name of the molecule  GLY  number of helices B j  bomooligomer   irc C false  molecular structure information for helix 1  sequence  GLY GLY GLY GLY GAY GLY GLY GLY GLY GLY  T oe mee ene OE SE ET AES  More Lines   residue number at start of sequence fir  initial rotation offset around helix axis ho  direction of helix   Cup S down  initial translational offset for helix along the z axis Bo    NEENENMNMNMEMMMME EE Search parameters O O O OOOO O O     extent of the search  a full search will sample all pairwise interactions a symmetric search will limit the search to   2 full    symmetne             Figure 5 3 7 Creating a glycine parameter file    26  Change directory     gt  cd   MyProtein GLY   and run         chi create    This will create the files   MyProtein GLY GLY pdb and   MyProtein GLY   GLY   psf  See step 24 annotation for explanation     27  Change directory to the parent directory     gt  cd   MyProtein   and copy the following files     gt  cp   MyProtein GLY GLY p    MyProtein     28  Create a variants list  i e   edit a text file named list  not list txt  that w
214. nd Merino  W   Zhang  Q    Knezevich  C   Xie  L   Chen  L   Feng   Z   Green R K  Flippen Anderson  J L    Westbrook  J   Berman  H M   and Bourne  P E   2005  The RCSB Protein Data Bank  A re   designed query system and relational database  based on the mmCIF schema  Nucl  Acids Res   33 D233 D237     Dietmann  S   Park  J   Notredame  C   Heger  A    Lappe  M   and Holm  L  2001  A fully automatic  evolutionary classification of protein folds  Dali  Domain Dictionary version 3  Nucl  Acids Res   29 55 57     Current Protocols in Bioinformatics    Eddy  S R  1998  Profile hidden Markov models   Bioinformatics 14 755 763     Edgar  R C  2004  MUSCLE  Multiple sequence  alignment with high accuracy and high through   put  Nucl  Acids Res  32 1792 1797     Edgar  R C  and Sjolander  K  2004  A comparison  of scoring functions for protein sequence profile  alignment  Bioinformatics 20 1301 1308     Enyedy  I J   Ling  Y   Nacro  K   Tomita  Y   Wu   X   Cao  Y   Guo  R   Li  B   Zhu  X   Huang  Y    Long  Y Q   Roller  P P   Yang  D   and Wang  S   2001  Discovery of small molecule inhibitors of  Bcl 2 through structure based computer screen   ing  J  Med  Chem  44 43 13 4324     Eswar  N   John  B   Mirkovic  N   Fiser  A   Ilyin   V A   Pieper  U   Stuart  A C   Marti Renom   M A   Madhusudhan  M S   Yerkovich  B   and  Sali  A  2003  Tools for comparative protein  structure modeling and analysis  Nucl  Acids  Res  31 3375 3380     Eyrich  V A   Marti Renom  M A   Przybylski 
215. nd structure chain  from mol2  is renamed S     6  To view the full superimposition  either open both files under the heading    PDB Files   mol2 is rotated translated to moll position  in the PDB viewer  or concatenate the    two files and view the resulting file  The second option preserves ligands that might Modeling  have been co crystallized with the protein as well as showing quaternary structure ele from  interactions    5 5 3    Current Protocols in Bioinformatics Supplement 14       Using Dali for  Structural  Comparison of  Proteins    5 5 4       Supplement 14       DaliLite    Results of Structure Comparison    Each chain of moll is compared structurally to each chain of mol2 using the DaliLite program  The  Dali method optimises a weighted sum of similarities of intramolecular distances  Sequence  identity and the root mean square deviation of C alpha atoms after rigid body superimposition are  reported for your information only  they are ignored by the structural alignment method  Suboptimal  alignments do not overlap the optimal alignment or each other  Suboptimal alignments detected by  the program are reported if the Z score is above 2  they may be of interest if there are internal  repeats in either structure  In the C alpha traces  the chains of the first and second structure are  renamed  Q  and  S   respectively  The best match to each chain in the second structure is    highlighted in the table below  Z Scores below 2 are not significant   First Structur
216. nd the other in the lower triangle  Aligned  Structural  alignment identifies a one to one correspondence between a subset of residues  The respective  submatrices of the distance matrix display similar contact patterns     Current Protocols in Bioinformatics    identified based on structural and sequence  analysis  Holm and Sander  1997   several  blind fold predictions have since been verified  by experimental structure determination     Comparison to other techniques   Dali was ranked at the top among seven  protein structure comparison methods and two  sequence comparison programs that were eval   uated on their ability to detect either protein  homologues or domains with the same topol   ogy  fold  as defined by the CATH structure  database  Novotny et al   2004      Critical Parameters   The Dali program has been run successfully  with default parameters since its inception   Holm and Sander  1993   The results usually  agree quite well with human experts  assess   ments  For example  the dendrogram of struc   tural similarities by Dali has similar topology  to the SCOP hierarchical classification based  on visual analysis and biological knowledge   Dietmann and Holm  2001     While the authors strongly advise against  changing parameter values from their default  values  a description of the numerical param   eters that go into the algorithms is given in the  appendix     Troubleshooting    Similarity not reported   The Dali system reports only similarities  above an e
217. nformatics    8     9     10     11     12     In the Molecule List Browser  double click on the    F    flag on the left of    human  aquaporin  to fix the human aquaporin molecule  Return to the OpenGL Display  window and toggle your mouse around  You can see that only the yellow E  coli  aquaporin moves  Double click on the    F    flag for human aquaporin again to release  it     One thing to notice about the  F  flag is that  although it may seem that one molecule  has been moved relative to another when one of the molecules is fixed  the difference is  only apparent  The internal coordinates of molecules are not changed by the rotation   translation  and scaling motions  To change the coordinates of atoms in a molecule you  need to use the text command interface  discussed in Basic Protocol 6  step 4   or by using  the atom move picking modes  by choosing Mouse    Move in the VMD Main menu      Other features in the Molecule List Browser include the Molecule ID  ID   Top  T   Active   A   and Drawn  D   Molecule ID is a number  starting from 0  assigned to each molecule  when it is loaded into VMD  and permits VMD to recognize each molecule internally  You  also refer to molecules by their Molecule IDs in the text command interface  Top flag  T   indicates the default molecule in VMD operations  for example when resetting the VMD  OpenGL view and when playing molecule trajectories  There can be only one top molecule  at a time  Active flag  A  indicates if the trajector
218. ng  Modeller    5 6 4       Supplement 15       known template structure   ii  alignment of the target sequence and the template s     iii  building a model based on the alignment with the chosen template s   and  iv   predicting model errors     There are several computer programs and Web servers that automate the comparative  modeling process  Table 5 6 1   The accuracy of the models calculated by many of  these servers is evaluated by EVA CM  Eyrich et al   2001   LiveBench  Bujnicki et al    2001   and the biannual CASP  Critical Assessment of Techniques for Proteins Structure  Prediction  Moult  2005  Moult et al   2005  and CAFASP  Critical Assessment of Fully  Automated Structure Prediction  experiments  Rychlewski and Fischer  2005  Fischer   2006      While automation makes comparative modeling accessible to both experts and nonspe   cialists  manual intervention is generally still needed to maximize the accuracy of the  models in the difficult cases  A number of resources useful in comparative modeling are  listed in Table 5 6 1     This unit describes how to calculate comparative models using the program MODELLER   Basic Protocol   The Basic Protocol goes on to discuss all four steps of comparative  modeling  Figure 5 6 1   frequently observed errors  and some applications  The Support  Protocol describes how to download and install MODELLER     MODELING LACTATE DEHYDROGENASE FROM TRICHOMONAS  VAGINALIS  TvLDH  BASED ON A SINGLE TEMPLATE USING MODELLER    MODELLER 
219. ng C  atoms in the superimposed native  structure      Errors in comparative models  As the similarity between the target and the    templates decreases  the errors in the model  increase  Errors in comparative models can be    divided into five categories  Sanchez and Sali   1997a b  Fig  5 6 12   as follows    Errors in side chain packing  Fig  5 6 12A    As the sequences diverge  the packing of side  chains in the protein core changes  Sometimes  even the conformation of identical side chains  is not conserved  a pitfall for many compara   tive modeling methods  Side chain errors are  critical if they occur in regions that are in   volved in protein function  such as active sites  and ligand binding sites    Distortions and shifts in correctly aligned  regions  Fig  5 6 12B   As a consequence of  sequence divergence  the main chain confor   mation changes  even if the overall fold re   mains the same  Therefore  it is possible that  in some correctly aligned segments of a model                      Figure 5 6 12 Typical errors in comparative modeling   A  Errors in side chain packing  The  Trp 109 residue in the crystal structure of mouse cellular retinoic acid binding protein    red  is  compared with its model  green    B  Distortions and shifts in correctly aligned regions  A region  in the crystal structure of mouse cellular retinoic acid binding protein    red  is compared with its  model  green  and with the template fatty acid binding protein  blue    C  Errors in r
220. nged in Graphics     Colors in the VMD Main menu     The shortcut keys for labels are 1  Atoms and 2  Bonds  You can use these instead of the  Mouse menu  Be sure the Open GL Display window is active when using these shortcuts       The labels can be used not only for displaying  but also for obtaining quantitative  information  In VMD Main menu  select Graphics     Labels  On the top left hand  side of the window  there is a pull down menu where you can choose the type of label   Atoms  Bonds  Angles  and Dihedrals   For now  keep it in Atoms  You can see the  list of atoms for which you have made a label           pa                         Figure 5 7 27 Labels in VMD  For the color version of this figure go to http  www currentprotocols com     5 7 40    Supplement 24    Current Protocols in Bioinformatics    9  Click on one of the atoms  You can see all the information of the atom displayed on  the bottom half of the Labels window  This information is useful to make selections   it corresponds to the current frame  and is updated as the frame is changed     10  You can also delete  hide  or show the atom label by clicking on the corresponding  button on the top of the Labels window     11  In the Labels window  choose label type Bonds  and select the    bond     distance  you  labeled  Fig  5 7 27   The information given corresponds to only the first atom in  the bond  but the number in the Value field corresponds to the length of the bond in  Angstroms     12  Click on t
221. nse agreement  The key will be E mailed to the address provided     4  Open a terminal or console and change to the directory containing the downloaded  distribution  The distributed file is a compressed archive file called modeller   8v2 tar gz     5  Unpack the downloaded file with the following commands   gunzip modeller 8v2 tar gz    tar  xvf modeller 8v2 tar    6  The files needed for the installation can be found in a newly created directory  called modeller 8v2  Move into that directory and start the installation with the  following commands     cd modeller 8v2      Install    7  The installation script will prompt the user with several questions and suggest default  answers  To accept the default answers  press the Enter key  The various prompts  are briefly discussed below     a  For the prompt below  choose the appropriate combination of the machine ar   chitecture and operating system  For this example  choose the default answer by  pressing the Enter key    The currently supported architectures are as follows   1  Linux x86 PC  e g   RedHat  SuSe      SUN Inc  Solaris workstation    3  Silicon Graphics Inc  IRIX workstation    4  DEC Inc  Alpha OSF 1 workstation    5  IBM AIX OS    6  Apple Mac OS X 10 3 x  Panther     7  Itanium 2 box  Linux     8  AMD64  Opteron  or EM64T  Xeon64  box  Linux     9  Alternative Linux x86 PC binary  e g   for  FreeBSD     Select the type of your computer from the list above   1     b  For the prompt below  tell the installer where
222. nt Options window       In the Stamp Alignment Options window  choose Align the following  AII Structures    and go to the bottom of the menu and press OK     The molecules have been aligned  You can see the alignment both in the OpenGL window  and in the MultiSeq window  Fig  5 7 22   Your alignment in OpenGL window will not  immediately resemble Figure 5 7 22  When MultiSeq completes an alignment  it creates a  new representation for all the aligned proteins in the NewCartoon representation with the  same default coloring method and hides all other representations created previously  Let  us give different colors to different aquaporins to distinguish them       Open your Graphical Representations window  and you should see two represen     tations for each molecule  the first one created when VMD loaded the molecule   which is now hidden   and the second one created automatically by MultiSeq  Se   lect  0 1fqy pdb  in the Selected Molecule pull down menu on top and highlight the  bottom representation by clicking on it  Change the color for this representation by  selecting ColorID     1 red for Coloring Method       In the Graphical Representations window  select    1 lrc2 pdb    in the Selected    Molecule pull down menu on top and highlight the bottom representation by clicking  on it  Select ColorID     4 yellow for Coloring Method       In the Graphical Representations window  select  2 11da pdb  in the Selected    Molecule pull down menu on top and highlight the bott
223. ntally known structure  To obtain a  particular model  select one line by clicking on a template ID  shown in the PSIBlast column in this    figure      FAMS and  FAMSBASE for  Protein Structure    5 2 12          Supplement 4 Current Protocols in Bioinformatics       PSIBlast Result  Target  abc 0 Reference  1G29 Reference Chain ID  1 Score  26 44  Alignment Coloring    match mismatch    amino acid type Reload    3D viewer select     rasmol C weblab Print   Close      View 3D structure  View Target   View Reference   Superimpose         IKLSNI TK YFHOGTRT IQALNNYSLHYPAGO I YGY IGASGAGKSTL IRCVNLLERPTEGS  VRLYDVWK VFG    EVTAVREMSLEVKDGEFMILLGPSGCGR TT TLAMIAGLEEPSRGO    62  VLVDGOELTTL SESELTKARROTGMIFOHFNLLSSRTVFGNYALPLELDNTPRD  IYIGDKLVADPEKGIFVPPKD          RDIAMVFOSYAL YPHMTVYDNTAFPLKLRKVPRO    116  EVKRRYTELLSL VGLGOKHDSYPSNLSGGOKORVA TARALASNPK VLLCDEATSALDPAT  EIDORVREVAELLGLTELLNRKPRELSGGORORVALGRA TVRKPOVFLMDEPLSNLDAKL    176  TRSILELLKDINRRLGLT ILL I THEMDVVKRICDCVAY ISNGEL IEDDTVSEVFSHPK TP  RVRMRAELKKLOROLGVTT TYVTHOOVEAMTMGORTAVMNRGYLOOVGSPOE VYDKPANT     36  LAOKF IOSTLHLDIPEDYOERLOAEPF TDCVPMLRLEF TGOSVOAPLLSE TARRF NYNNN  FVAGF IGSPPMNFLDATVTEDGFVOFGEFRLKLLPDOFEVLGELGYVGREY IFGIRPEDL     36  II    SAQMDYAGGVKFGIMLTEMAGTO  YDAMF AQVRVPGENLVRAYVE IVENLGSE                Figure 5 2 9 The amino acid alignment view page  To display the selected model  click the View  Target button  Both the model and the template will be displayed by clicking the Superimpose butto
224. ocedure can be time  consuming  it can significantly improve the  accuracy of the resulting comparative models  in difficult cases  John and Sali  2003      Importance of an accurate alignment  Regardless of the method used  searching  in the twilight and midnight zones of the  sequence structure relationship often results in  false negatives  false positives  or alignments  that contain an increasingly large number of  gaps and alignment errors  Improving the per   formance and accuracy of methods in this  regime remains one of the main tasks of com   parative modeling today  Moult  2005   It is  imperative to calculate an accurate alignment  between the target template pair  as compara   tive modeling can almost never recover from  an alignment error  Sanchez and Sali  19972      Template selection   After a list of all related protein structures  and their alignments with the target sequence  have been obtained  template structures are  prioritized depending on the purpose of the  comparative model  Template structures may  be chosen based purely on the target template  sequence identity  or on a combination of sev   eral other criteria  such as experimental ac   curacy of the structures  resolution of X ray  structures  number of restraints per residue  for NMR structures   conservation of active   site residues  holo structures that have bound  ligands of interest  and prior biological infor   mation that pertains to the solvent  pH  and  quaternary contacts  It is not n
225. ody residue   level knowledge based energy score combined  with sequence profile and secondary struc   ture information for fold recognition  Proteins  55 1005 1013     Zhou  H   and Zhou  Y  2005  Fold recogni   tion by combining sequence profiles derived  from evolution and from depth dependent struc   tural alignment of fragments  Proteins 58 321   328     Internet Resources  http   www salilab org modeller   Eswar  N   Madhusudhan  M S   Marti Renom   M A   and Sali  A  2005  MODELLER  A Protein  Structure Modeling Program  Release 8v 2        Contributed by Narayanan Eswar  Ben  Webb  Marc A  Marti Renom  M S   Madhusudhan  David Eramian  Min yi  Shen  Ursula Pieper  and Andrej Sali   University of California at San Francisco   San Francisco  California    Current Protocols in Bioinformatics    Using VMD  An Introductory Tutorial    Jen Hsin    Anton Arkhipov    Ying Yin     John E  Stone     and Klaus  Schulten        Department of Physics  University of Illinois at Urbana Champaign  Urbana  Illinois   Beckman Institute  University of Illinois at Urbana Champaign  Urbana  Illinois    ABSTRACT    VMD  Visual Molecular Dynamics  is a molecular visualization and analysis program  designed for biological systems such as proteins  nucleic acids  lipid bilayer assem   blies  etc  This unit will serve as an introductory VMD tutorial  We will present several  step by step examples of some of VMD s most popular features  including visualizing  molecules in three dimensions with 
226. oe mbi ucla edu    http    searchlauncher bcm tmc edu  http  lIblocks fherc org     http   www2 ebi ac uk clustalw   ftp lliole swmed edulpublcompass        continued    Comparative  Protein Structure  Modeling Using  Modeller       5 6 2    Supplement 15 Current Protocols in Bioinformatics    Table 5 6 1 Programs and Web Servers Useful in Comparative Protein Structure Modeling  continued       Name    World Wide Web address       Target template alignment  continued   FUGUE  Shi et al   2001    MULTALIN  Corpet  1988    MUSCLE  UNIT 6 9  Edgar  2004   SALIGN  Eswar et al   2003    SEA  Ye et al   2003    TCOFFEE  UNIT 3 8  Notredame et al   2000   USC SEQALN  Smith and Waterman  1981   Modeling   3D JIGSAW  Bates et al   2001   COMPOSER  Sutcliffe et al   1987a   CONGEN  Bruccoleri and Karplus  1990   ICM  Abagyan and Totrov  1994   JACKAL  Petrey et al   2003   DISCOVERY STUDIO   MODELLER  Sali and Blundell  1993   SYBYL   SCWRL  Canutescu et al   2003   SNPWEB  Eswar et al   2003   SWISS MODEL  Schwede et al   2003   WHAT IF  Vriend  1990    Prediction of model errors   ANOLEA  Melo and Feytmans  1998   AQUA  Laskowski et al   1996   BIOTECH  Laskowski et al   1998   ERRAT  Colovos and Yeates  1993   PROCHECK  Laskowski et al   1993   PROSAII  Sippl  1993    PROVE  Pontius et al   1996    SQUID  Oldfield  1992    VERIFY3D  Luthy et al   1992   WHATCHECK  Hooft et al   1996   Methods evaluation   CAFASP  Fischer et al   2001    CASP  Moult et al   2003    CASA  Kahsay et al
227. oinformatics    Modeling  Structure from  Sequence    5 5 17       Supplement 14    Using Dali for  Structural  Comparison of  Proteins    5 5 18       Supplement 14    Homologous proteins often share significant functional similarities  An attempt should  be made to place the query structure in the context of a fold similarity dendrogram as in  Figure 5 5 6 before transferring function  There is always a best hit  Reciprocal nearest  neighbors suggest more similar functions than if the query protein joins a whole branch  of functionally diverse proteins  For example  in the receptor dendrogram  Fig  5 5 6    sex hormone receptors form one subcluster while the orphan receptor is about equidistant  from all the other receptors     RMSD is a measure of the average deviation in distance between aligned alpha car   bons  For sequences sharing 50  identity  this should be around 1 0  Dali maximizes a  geometrical similarity score  which is defined in terms of similarities of intramolecular  distances and is thus not primarily aiming to generate alignments with low RMSD  The  RMSD and number of equivalent residues  NE  are reported because they are traditional  measures  Note that an alignment is considered better if it has both a smaller RMSD and  a larger NE  If both RMSD and NE are smaller or both are larger  it is not possible to  establish an order between the alignments     It is generally assumed that if two sequences share over 40  identity  then they are  unambiguously hom
228. ologous  However  two distantly related proteins may share very  low sequence identity but still be homologous  and conversely  two sequences may  locally share as much as 30  identity but be unrelated  Therefore  the percentage of  sequence identity is only a guide     In lieu of numbers  it is often informative to inspect using RasMol or another graph   ics program  whether the structurally equivalent regions form a continuous  com   pact structural core  If there are many known structures in a superfamily  secondary  structure elements will line up consistently in the multiple structure alignment views   Fig  5 5 11   Check especially for the conservation of known active site residues  Con   servation profiles can be studied in multiple sequence alignments of protein families  in sequence classification databases such as the Automatic Data Decompostion Al   gorithm  ADDA  at Attp   www bioinfo biocenter helsinki filsqgraph pairsdb or PFAM   Attp   www sanger ac uk Pfam   Enzyme superfamilies have sharp signatures but bind   ing domains can have very little sequence similarity  Without a sequence signature  it is  harder to establish homology     COMMENTARY       Background Information   The rapidly growing number of known ter   tiary structures makes protein structure com   parison important  In the center of biological  interest are evolutionary relationships inferred  from quantifiable similarities between pro   teins  Sequence similarity searches are able  to detect evo
229. ologs under 4046 sequence identity   Park et al   1998  Lindahl and Elofsson  2000   Sauder et al   2000   The resulting profile   sequence alignments correctly align approx   imately 43  to 48  of residues in the 0  to  40  sequence identity range  Sauder et al    2000  Marti Renom et al   2004   this number  is almost twice as large as that of the pair   wise sequence methods  Frequently used pro   grams for profile sequence alignment are PSI   BLAST  Altschul et al   1997   SAM  Karplus  et al  1998   HMMER  Eddy  1998   and  BUILD  PROFILE  Eswar  2005      Profile profile alignment methods   As anatural extension  the profile sequence  alignment methods have led to profile profile  alignment methods that search for suitable  template structures by scanning the profile of  the target sequence against a database of tem   plate profiles as opposed to a database of tem   plate sequences  These methods have proven  to include the most sensitive and accurate fold  assignment and alignment protocols to date   Edgar and Sjolander  2004  Marti Renom  et al   2004  Ohlson et al   2004  Wang and  Dunbrack  2004   Profile profile methods de   tect  28  more relationships at the superfam   ily level and improve the alignment accuracy  for 1596 to 2096  compared to profile sequence  methods  Marti Renom et al   2004  Zhou and    Current Protocols in Bioinformatics    Zhou  2005   There are a number of variants of  profile profile alignment methods that differ in  the scoring functions
230. olor version of this figure go to http   www currentprotocols com     Using VMD  An  Introductory  Tutorial    5 7 2       Supplement 24       DOWNLOADING VMD    Before starting  the current version of VMD needs to be downloaded  This tuto   rial was written for VMD version 1 8 6  VMD supports all major computer plat   forms and can be downloaded from the VMD homepage Attp   www ks uiuc edu   Research vmd  Follow the instructions online to install  Once VMD is installed  to start  VMD if using Mac OS X  double click on the VMD application icon in the Applications  directory  if using Linux and SUN  type vmd in a terminal window  or if using Windows   select     Start Programs      VMD     When VMD starts  by default three windows will open  the VMD Main window  the  OpenGL Display window  and the VMD Console window  or a Terminal window on a  Mac   To end a VMD session  go to the VMD Main window  and choose File     Quit   You can also quit VMD by closing the VMD Console window or the VMD Main window     TOPICS AND FILES    This unit contains six sections  Each section acts as an independent tutorial for a specific  topic  Working with a Single Molecule  Trajectories and Movie Making  Scripting in  VMD  Working with Multiple Molecules  Comparing Protein Structures and Sequences  with the MultiSeq Plugin  and Data Analysis in VMD   For readers with no prior ex   perience with VMD  we suggest they work through the sections in the order they are  presented  Readers already famil
231. ols will show up in the  lower right hand side corner  Use these controls to change the Sphere Scale to 0 5  and the Sphere Resolution to 13  Note that the higher the resolution  the slower the  display of the molecule will be     17  Press the Default button  This returns the screen to the default properties of the chosen  drawing method   Other popular representations include CPK and Licorice  In CPK  like in old chemistry  ball and stick kits  each atom is represented by a sphere and each bond is represented by a  thin cylinder  radius and resolution of both the sphere and the cylinder can be modified    The Licorice drawing method also represents each atom as a sphere and each bond as a  cylinder  but the sphere and the cylinder have the same radii     Using the Tube style drawing method    The previous representations visualize micromolecular details of the protein by display   ing every single atom  More general structural properties can be demonstrated better by  using more abstract drawing methods     18  Choose the Tube style under Drawing Method  which shows the backbone of the  protein  Set the Radius to 0 8  The result should be similar to Figure 5 7 6     Current Protocols in Bioinformatics    Using the NewCartoon drawing method    The last drawing method described here is NewCartoon  It gives a simplified represen   tation of a protein based on its secondary structure  Helices are drawn as coiled ribbons   D sheets as solid  flat arrows  and all other structures
232. om representation by clicking  on it  Select ColorID     11 purple for Coloring Method      In the Graphical Representations window  select    3 1j4n pdb    in the Selected    Molecule pull down menu on top and highlight the bottom representation by click   ing on it  Select ColorID     12 lime for Coloring Method  Close the Graphical  Representations window     Current Protocols in Bioinformatics    Now your OpenGL window should look similar to Figure 5 7 22  and you can see that the  alignment was pretty good as the four aquaporin structures are very similar     You can get more information about the alignment in the MultiSeq window by highlighting  the molecules you wish to compare     11  In the MultiSeq window  highlight 1fqy by clicking on it     12  To highlight another molecule without unhighlighting 1fqy  you need to Ctrl click   or command click on a Mac  on that molecule  Highlight 1rc2 by clicking on it  while holding down the Ctrl key on the keyboard  or the command key on a Mac      When both 1  qy and 1rc2 are highlighted  you should see at the lower left corner in the  MultiSeq window a line of text  QH 0 6442  RMSD 2 3043  Percent Iden   tity 30 28  Note that the values you obtain might be a little different depending on if  your MultiSeq database is updated  but they should be close to the ones given here     The Qy value is a metric for structural homology  It is an adaptation of the Q value that  measures structural conservation  Eastwood et al   2001   Q 
233. on    Software    Web browser  Internet Explorer v  5 0 or later or Netscape v  4 7 or later for  Windows  Internet Explorer v  4 5 or later for Macintosh     1  Log in to FAMSBASE as follows   a  Go to the URL of FAMSBASE  Attp   famsbase bio nagoya u ac jp famsbase      Figure 5 2 2 shows the login page of FAMSBASE     b  Enteralogin name and password  If accessing the database for the first time  obtain  a login name and a password by clicking the link labeled    For the first user    Alternatively  click on the    Public Login  hyperlink     After logging in  one arrives at the FAMSBASE search page  Figure 5 2 3 shows the upper  part of the search page  Figure 5 2 4 shows the lower part   Public Login  only provides  sufficient access to determine whether or not a model exists in FAMSBASE  Individuals  who select  Public Login  cannot view structures     2  Specify search criteria     a  Species  The upper part of the search page  Fig  5 2 3  Section 1  lists 41 species  whose genome ORFs have been determined  The check boxes on the left hand  side of the query form allow the user to specify which species should be included       Contributed by Hideaki Umeyama and Mitsuo Iwadate  Current Protocols in Bioinformatics  2003  5 2 1 5 2 16  Copyright O 2003 by John Wiley  amp  Sons  Inc        UNIT 5 2    BASIC  PROTOCOL    Modeling  Structure from  Sequence    5 2 1       Supplement 4    in the search  It is possible to select multiple species  Figure 5 2 5 shows an  example 
234. ons of RasMol are available on the WWW at   http    www bernstein plus sons com software rasmol   Downloading and  installation instructions are given in Support Protocol 1     Files    Coordinate files are read in a variety of formats  including PDB  Mol2  CHARMm   and mmCIF  The program deals gracefully with a number of variations of these  files  including files containing coordinates for multiple conformers or multiple  models  In this example  coordinates for hemoglobin  2HHB   pdb  obtained  from the Protein Data Bank  PDB  unr 1 9  are used  instructions for  downloading the PDB coordinate file are given in Support Protocol 2     Modeling  Structure from  Sequence       5 4 13    Current Protocols in Bioinformatics Supplement 11    1  Three different problems with biological units may be encountered as one goes to  the Protein Data Bank for coordinates  First  the coordinate file may include only  a portion of the physiologically active complex  The examples in this unit have  been using the deoxygenated form of hemoglobin so far  which as four protein  chains  However  an overview picture of PDB entry 1hnho  the oxygenated form of  hemoglobin  will look like Figure 5 4 10     2  Notice that there are only two chains in the file  even though it is known that  hemoglobin is active with four chains  This is due to the details ofthe crystallographic  experiment  where the two halves of the protein are crystallographically identical  in the structure  so the researchers on
235. opyright  C  Roger Sayle 1992 1999   Version 2 7 2 1 1 January 2004   Copyright  C  Herbert J  Bernstein 1998 2004      See  help notice  for further notices       32 bit version     Molecule nane HEMOGLOBIN  DEOXY   WM Classification          OXYGEN TRANSPORT  Secondary Structure     PDB Data Records  Database Code   ZHHB  Number of Chain  8  Number of Groups        574  227   Number of Atoms 4384  395   Number of Helici 32  Number of Strani  Number of Turns  Number of Bonds          RasMol gt                    Figure 5 4 1 RasMol running on the computer display  The viewer window is at upper left  behind  the Command Line window at lower right     Current Protocols in Bioinformatics    b  Using options in the Colours menu  the structure may be colored using traditional  atomic colors or several other schemes that highlight different characteristics    c  In the Options menu  slab mode may be used to cut away the nearest portions of  the molecule  and specular highlights and shadows may be toggled on and off     d  The Settings menu makes it possible to choose an action that will be performed  when clicking on a portion of the molecule  e g   measuring distances between  atoms    e  Finally  the Export menu makes it possible to save images from the graphics  window     5  The Command Line window  labeled    Terminal    in Fig  5 4 1  allows direct control  of all of the commands available in RasMol  A few of the most common commands  will be used in this unit  The user manu
236. ormance at the potential cost of ac   curacy or precision  Many programs use a hi   erarchical approach  where promising seeds  for alignment are identified using local cri   teria based on dynamic programming  dis   tance difference matrices  maximal common  subgraph detection  fragment matching  ge   ometric hashing  unit vector comparison  or  local geometry matching  reviewed by Sierk  and Kleywegt  2004   The initial set of corre   spondences is then optimized globally using  methods such as double dynamic program   ming  Monte Carlo algorithms or simulated  annealing  a genetic algorithm  or combina   torial searching  Recently  it has been proved  that brute force  exhaustive scanning of the six  degrees of freedom from rotations and transla   tions in rigid body superimposition leads to    Modeling  Structure from  Sequence    5 5 19       Supplement 14    Using Dali for  Structural  Comparison of  Proteins    5 5 20       Supplement 14    a polynomial time approximation algorithm  for the problem of determining the maximum  number of C   atom pairs that can be superim   posed within a given RMSD at a given error   However  this solution is too computationally  demanding for practical application  Kolodny  and Linial  2004     The Dali method is based on a sensi   tive measure of geometrical similarities de   fined as a weighted sum of similarities of in   tramolecular distances  see the appendix for  details   Three dimensional shape is described  with a matrix of al
237. ormation of polypep   tide segments in proteins by systematic search   Proteins 1 146 163     Moult  J   Fidelis  K   Zemla  A   and Hubbard  T   2003  Critical assessment of methods of protein  structure prediction  CASP  round V  Proteins  53 334 339     Moult  J   Fidelis  K   Rost  B   Hubbard  T    and Tramontano  A  2005  Critical assess   ment of methods of protein structure prediction   CASP  round 6  Proteins 61 3 7     Nagarajaram  H A   Reddy  B V   and Blundell  T L   1999  Analysis and prediction of inter strand  packing distances between beta sheets of glob   ular proteins  Protein Eng  12 1055 1062     Needleman  S B  and Wunsch  C D  1970  A gen   eral method applicable to the search for similar   ities in the amino acid sequence of two proteins   J  Mol  Biol  48 443 453     Notredame  C   Higgins  D G   and Heringa  J  2000   T Coffee  A novel method for fast and accu   rate multiple sequence alignment  J  Mol  Biol   302 205 217     Ohlson  T   Wallner  B   and Elofsson  A  2004   Profile profile methods provide improved fold   recognition  A study of different profile   profile alignment methods  Proteins 57 188   197     Oldfield  T J  1992  SQUID  A program for the anal   ysis and display of data from crystallography  and molecular dynamics  J  Mol  Graph  10 247   252     Oliva  B   Bates  P A   Querol  E   Aviles  F X   and  Sternberg  M J  1997  An automated classifica   tion of the structure of protein loops  J  Mol   Biol  266 814 830     Panchenko 
238. orre   sponding distances between aligned residues  in the template and the target structures are  similar  These homology derived restraints  are usually supplemented by stereochemi   cal restraints on bond lengths  bond angles   dihedral angles  and nonbonded atom atom  contacts that are obtained from a molecular  mechanics force field  The model is then de   rived by minimizing the violations of all the  restraints  This optimization can be achieved  either by distance geometry or real space op   timization  For example  an elegant distance  geometry approach constructs all atom mod   els from lower and upper bounds on dis   tances and dihedral angles  Havel and Snow   1991     Comparative protein structure modeling by  MODELLER  MODELLER  the authors    own  program for comparative modeling  belongs  to this group of methods  Sali and Blundell   1993  Sali and Overington  1994  Fiser et al    2000  Fiser et al   2002   MODELLER imple   ments comparative protein structure modeling  by satisfaction of spatial restraints  The pro   gram was designed to use as many different  types of information about the target sequence  as possible     Current Protocols in Bioinformatics    Homology derived restraints  In the first  step of model building  distance and dihe   dral angle restraints on the target sequence  are derived from its alignment with tem   plate 3 D structures  The form of these re   straints was obtained from a statistical anal   ysis of the relationships between si
239. ors the chain green   RasMol gt  select   D  This selects chain D   RasMol gt  color  50 255 150   This colors the chain aqua   RasMol gt  select ligand  This selects the heme groups   RasMol gt  color  255 100 100     This colors the hemes pink     The display should look like Figure 5 4 14  Notice how the color differences are still  apparent  but they do not distract from the inter relationship of the subunits within the  entire structure       To get an impression of the limitations of saturated colors  now type     RasMol gt  select ligand  RasMol gt  color red      Notice how the saturated red causes confusion between the heme group and the    surrounding protein chain  The impression of the heme being buried in a pocket is  not as clear  However  if the goal is to focus all attention on the hemes  this bright  red might be the best choice       Use of the color command takes some practice in order to come up with the desired    color  The values in the brackets are the intensity of red  green  and blue  with ranges  from 0 to 255  The easiest way to start is to begin with a saturated color  and then  modify it to give the desired color  In most cases  it will take a few experiments to  get the proper color  Here is an example when looking for a peach color  First type     RasMol gt  select ligand    Current Protocols in Bioinformatics                         Figure 5 4 14 An alternate coloring scheme for hemoglobin  For the color version of this figure  go to http   ww
240. ove the accuracy  particularly of the soft variables  torsion angles    Since this program is fully automated  it has some appeal for less sophisticated users  who may not be willing or able to try different strategies to obtain a suitable model     I have always believed that  although integral membrane protein structures are the most  difficult type to determine experimentally  they ought to be among the easiest to model   In general  their topologies are much simpler than those of soluble proteins  for exam   ple  mixed    helical and  sheet domains in the membrane are essentially unknown   Membrane spanning domains tend to be either bundles of    helices or barrels of antipar   allel B strands  both of which are relatively easy to recognize in amino acid sequences   Although the available database of membrane protein structures is still quite limited   enough patterns have already begun to emerge to give confidence that this type of mod   eling will eventually become common  Considering that over half of all known drugs  target integral membrane proteins  mostly G protein coupled receptors and ion channels   it is also likely that such modeling will have considerable practical importance     In the third unit in this chapter  a collaborative team from the Hebrew University in  Jerusalem and the Lawrence Berkeley Laboratory in California describes a tool for  predicting the structures of simple    helical bundle membrane proteins  UNIT 5 3   By  running a global molecular 
241. ovide a quick understanding  of the overall shape  the number of chains and how they are folded  and the location of any  ligands or prosthetic groups  This representation is also commonly used in publications  to give an overall summary of the structure of the protein  This overview representation  will display the protein chains as backbones  or ribbons  if preferred   with different  colors on each chain  The ligands are drawn with spacefilling spheres to make them easy  to find    1  Restart RasMol with the 2hhb coordinate set  see Basic Protocol 1      This will give the wireframe representation   2  In the Command Line window  type the following series of commands     RasMol gt  wireframe off    This turns off the default representation     RasMol gt  select ligand    This selects just the ligand     RasMol gt  cpk  This displays the ligand with spheres     RasMol gt  select protein    This selects just the protein     RasMol gt  backbone 100  This displays the protein with a thick backbone     RasMol gt  color chain  This colors each chain a different color     The display should look like Figure 5 4 6     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 4 9       Supplement 11    Representing  Structural  Information with  RasMol    5 4 10       Supplement 11                               Figure 5 4 6 A quick overview representation of hemoglobin  For the color version of this figure  go to http  www currentprotocols com     3  Rotate the
242. owsers  pre Internet Explorer 5 x or  pre Netscape 4 5x  or Opera  Please use email submission to dali ebi ac uk instead     Submit Query   Reset                  Figure 5 5 6 Submission page proper of the Dali server     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 5 7       Supplement 14    BASIC  PROTOCOL 3    Using Dali for  Structural  Comparison of  Proteins    5 5 8       Supplement 14       USING THE Dali DATABASE TO INVESTIGATE FAMILIAL RELATIONS  AMONG THE UNIVERSE OF PROTEIN FOLDS    The Dali database is based on exhaustive all against all 3D structure comparison of  protein structures currently in the Protein Data Bank  PDB   The classification and  alignments are automatically maintained and continuously updated using the Dali search  engine  The database currently contains 10 562 representative structures  May 2006    This protocol describes how to search for familial relationships among the known set of  protein folds     Necessary Resources    Hardware    Computer connected to the Internet    Software    Internet browser  e g   Internet Explorer  http   www microsoft com  Netscape   http lIbrowser netscape com  or Firefox  http   www mozilla org firefox    RasMol  unit 5 4  downloadable from Attp   www bernstein plus sons com   software rasmol  or other PDB viewer       E Welcome to the Dali Database   Microsoft Internet Explorer    Be Edt Vew Favorites Too Hep  HBBak v   OD   Gsexch GiFavorites Beda  v 3 N v A          The Da
243. page contains links to Fre     quently Asked Questions  FAQ  Attp   salilab org modeller FAQ html   tuto   rial examples  http   salilab org modeller tutorial   an online version of the  manual  http   salilab org modeller manual   and user editable Wiki pages   Attp   salilab org modeller wiki   to exchange tips  scripts  and examples        COMMENTARY    Background Information   As stated earlier  comparative modeling  consists of four main steps  fold assignment   target template alignment  model building and  model evaluation  Marti Renom et al   2000   Fig  5 6 1      Fold assignment and target template  alignment   Although fold assignment and sequence   structure alignment are logically two distinct  steps in the process of comparative modeling   in practice  almost all fold assignment meth   ods also provide sequence structure align   ments  In the past  fold assignment methods  were optimized for better sensitivity in de   tecting remotely related homologs  often at  the cost of alignment accuracy  However  re   cent methods simultaneously optimize both  the sensitivity and alignment accuracy  There   fore  in the following discussion  fold assign   ment and sequence structure alignment will be  treated as a single procedure  explaining the  differences as needed     Fold assignment  The primary requirement for comparative    modeling is the identification of one or more  known template structures with detectable  similarity to the target sequence  The identi   ficat
244. part of the structure  The example shown in Figure 5 4 9  is an example  The backbone representation used for the protein and the wireframe used  for the histidine have a similar look  so the viewer automatically treats them as part of  the same structure  even though the coloring scheme is different between the backbone  and the sidechain  The heme is shown in spacefilling  so it is distinguished as a different  molecule     5  To see how this works  restart RasMol with the file 2hhb  and type     RasMol gt  select all  RasMol gt  wireframe off   This turns off the default representation   RasMol gt  select protein  RasMol gt  backbone 100    This uses a thick protein backbone     RasMol gt  color  100 150 255     This colors the protein backbone blue green     RasMol gt  select ligand  RasMol gt  cpk    This uses spheres for the heme                             Representing  Structural  Information with Figure 5 4 15 A close up image of the histidine iron interaction in hemoglobin  For the color  RasMol version of this figure go to http  www currentprotocols com   5 4 20    Supplement 11 Current Protocols in Bioinformatics                         Figure 5 4 16 An alternate close up image of the histidine iron interaction in hemoglobin  For  the color version of this figure go to http  www currentprotocols com     RasMol gt  select HIS92 D and  sidechain or alpha   RasMol gt  wireframe 100    This uses wireframe for the histidine     RasMol gt  color cpk    This colors the h
245. provement of sequence  alignments  However  fully automated pro   cedures are essential  and  indeed  have been  used for large scale genome modeling  CA   FASP3 assessments did not judge human in   tervention  but only software performance    The use of typical alignment software  such as FASTA  unir 3 9   BLAST  UNITS 3 3  amp   3 4   or PSI BLAST to determine which mod   eling software demonstrates the best per   formance is very important  and the results  are of interest not only to computational bi   ologists but also to biologists at the labora   tory bench     Suggestions for Further Analysis   It is currently not possible to access the  FAMS server  However  the authors expect  that in the future  researchers will be able to  submit novel sequences directly to FAMS in  order to obtain structure predictions  see Fig   5 2 12 for the FAMS Web page      Literature Cited   Altschul  S F   Gish  W   Miller  W   Myers  E W    and Lipman  D J  1990  Basic local alignment  search tool  J  Mol  Biol  215 403 410     Fischer  D   Elofsson  A   Rychlewski  L   Pazos   F   Valencia  A   Rost  B   Ortiz  A R   and  Dunbrack  R L   Jr  2001  CAFASP2  The sec   ond critical assessment of fully automated  structure prediction methods  Proteins  45 171 183     Iwadate  M   Ebisawa  K   and Umeyama  H   2001  Comparative modeling of CAFASP2  competition  Chem Bio  Informatics J  1 136   148     Ogata  K  and Umeyama  H  2000  An automatic  homology modeling method consisting of da   t
246. r adipocyte fatty acid binding protein with its actual structure  left    B  A  putative proteoglycan binding patch was identified on a medium accuracy comparative model  of mouse mast cell protease 7  right   modeled based on its 39  sequence identity to the  crystallographic structure of bovine pancreatic trypsin  2ptn  that does not bind proteoglycans   The prediction was confirmed by site directed mutagenesis and heparin affinity chromatogra   phy experiments  Matsumoto et al   1995   Typical accuracy of a comparative model in this  range of sequence similarity is indicated by a comparison of a trypsin model with the actual  structure   C  A molecular model of the whole yeast ribosome  right  was calculated by fitting  atomic rRNA and protein models into the electron density of the 80S ribosomal particle  ob   tained by electron microscopy at 15    resolution  Spahn et al   2001   Most of the models  for 40 out of the 75 ribosomal proteins were based on template structures that were approx   imately 30  sequentially identical  Typical accuracy of a comparative model in this range of  sequence similarity is indicated by a comparison of a model for a domain in L2 protein from  B  Stearothermophilus with the actual structure  17 2   For the color version of this figure go to  http  www currentprotocols com     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 6 23       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller    
247. ral detail  The Fold Index lists all chains in  PDB90 ordered by structural similarity  The order is that of a dendrogram derived in the  hierarchical clustering  Fold types are indexed  A  heavier  branch with more members  is listed above a branch with fewer members  Domains that are structural neighbors are  found next to each other  Fold types with similar structural motifs are also found next to  each other     2  Enter into the fold classification from the FOLD INDEX or enter a PDB identifier  or text term  protein name or keyword  that occurs in the COMPND records of the  PDB entries into the text box under Search for PDB Identifier or Protein  Fig  5 5 7      More sophisticated queries should be performed using specialized search engines such  as Entrez at NCBI  http llwww ncbi nlm nih gov  or SRS  http   srs ebi ac uk      3  For example  type estradiol receptor into the text box  Figure 5 5 8 shows  the result for this query     The leftmost column shows that there are two PDB entries for estradiol receptor  namely  1qkt and 1 qku  The latter has three chains named A  B and C  The second column indicates  that the chain I1qkuA is representative of all the chains in the PDB90 set  which retains  a style representative for clusters of very similar proteins  The third column shows that  IqkuA belongs to domain fold class 1060  Fold class indices are not stable  i e   they may  change between updates of the Dali database     4  Click on a link in the Fold column to show
248. rding to their sequence and identified the  conserved residues  found mainly inside the pore  Fig  5 7 25   Since aquaporin facilitates  water transport across the membrane  these conserved residues are most likely the ones  that carry out this function     Importing FASTA files for sequence alignment    Many times the structure of a protein might not be available  but its sequence is  You can  analyze a protein in MultiSeq without its structure by loading its sequence information       eoe untitled  multiseq  Search        KUFWRAYY  R             b F  hAYY                      Figure 5 7 24 Result of a sequence alignment of the four aquaporins  colored by sequence  identity  For the color version of this figure go to http  www currentprotocols com     Current Protocols in Bioinformatics                   Figure 5 7 25 Top view of the aligned aquaporins colored by sequence conservation  The con   served residues locate mostly inside the aquaporin pore  For the color version of this figure go to  http   www currentprotocols com     in the FASTA file format  If you do not have the FASTA file of a protein but you have its  sequence  you can create a FASTA file easily with any text editor of your choice     4  Find the provided FASTA sequence file spinach_aqp  fasta and open it with a  text editor     A FASTA file contains a header that starts with     gt     followed by the name of the protein   In the next line is the protein sequence in a one letter amino acid code  You can cr
249. re  describing the structural similarity is given by     d  d   S AB    gt  Y aa 1 exp zd    iecore jecore ij       Equation 5 5 3    where values of constants in the equation are explicitly inserted  The core is defined as  a set of equivalences between residues in A and B proteins  which is analogous to a  sequence alignment     For a random pairwise comparison the expected Dali score  Equation 5 5 3  increases  with the number of residues in the compared proteins  In order to describe the statistical  significance of a pairwise comparison score S A B  the Dali server uses the Z score    Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 5 23       Supplement 14    Using Dali for  Structural  Comparison of  Proteins    5 5 24       Supplement 14       defined as    Z AB   S A B    m L   0 5  m L   Equation 5 5 4    where the denominator is an estimation of the average standard deviation of scores for  various lengths of protein chains  The approximate experimental relation between the  mean score  m  and the average length  with L  lt  400     L AL L    B    Equation 5 5 5    of two proteins is given by   m L    7 95  0 71L  2 59 10      1 92 10   P    Equation 5 5 6    The Z score is computed for every possible pair of domains  and the highest value is  reported as the Z score of the protein pair     Possible domains are determined by the PUU algorithm  parser for Protein Unfolding  Units   The algorithm recursively cuts a structure into smaller
250. re available on the WWW at  http   www   bernstein plus sons com softwarelrasmol   Downloading and installation  instructions are given in Support Protocol 1     Files    Coordinate files are read in a variety of formats  including PDB  Mol2  CHARMm   and mmCIF  The program deals gracefully with a number of variations of these  files  including files containing coordinates for multiple conformers or multiple  models  In this example  coordinates for hemoglobin  2HHB   pdb  obtained  from the Protein Data Bank  PDB  unr 1 9  are used  instructions for  downloading the PDB coordinate file are given in Support Protocol 2        Contributed by David S  Goodsell  Current Protocols in Bioinformatics  2005  5 4 1 5 4 23  Copyright    2005 by John Wiley  amp  Sons  Inc     UNIT 5 4    BASIC  PROTOCOL 1    Modeling  Structure from  Sequence  a    5 4 1    Supplement 11    Representing  Structural  Information with  RasMol    5 4 2       Supplement 11    Display hemoglobin with RasMol    1   2     3a     3b     Download and install RasMol on the local machine  Support Protocol 1      Download the PDB coordinate file 2HHB   pdb from the PDB  unir 1 9  as described  in Support Protocol 2     On Unix and Linux machines  Type rasmol 2HHB pdb at the prompt  This will  start RasMol and load the coordinates from the file 2HHB   pdb     On personal computers  Double click on the RasMol icon  This will launch  RasMol  Next  select Open from the File menu to load the coordinates from the  file 2
251. re modeled based on their intrinsic confor   mational preferences and on the conformation  of the equivalent side chains in the template  structures  Sutcliffe et al   1987b   Finally  the  stereochemistry of the model is improved ei   ther by a restrained energy minimization or a  molecular dynamics refinement  The accuracy  of a model can be somewhat increased when  more than one template structure is used to  construct the framework and when the tem   plates are averaged into the framework us   ing weights corresponding to their sequence  similarities to the target sequence  Srinivasan  and Blundell  1993   Possible future improve   ments of modeling by rigid body assembly in   clude incorporation of rigid body shifts  such  as the relative shifts in the packing of a helices  and f sheets  Nagarajaram et al   1999   Two  other programs that implement this method are  3D JIGSAW  Bates et al   2001  and SWISS   MODEL  Schwede et al   2003      Modeling by segment matching or coordinate    reconstruction  The basis of modeling by coordinate re     construction is the finding that most hexapep   tide segments of protein structure can be  clustered into only 100 structurally different  classes  Jones and Thirup  1986  Claessens  et al   1989  Unger et al   1989  Levitt  1992   Bystroff and Baker  1998   Thus  comparative  models can be constructed by using a sub   set of atomic positions from template struc   tures as guiding positions to identify and  assemble short  all atom 
252. re modeling  assume increasing importance  For those who have yet to try their hand at such endeavors   the encouraging news is that the tools are getting easier to use as well as more accurate   Dip into the protocols in this chapter and see     LITERATURE CITED    Hegyi  H  and Gerstein  M  2001  Annotation transfer for genomics  Measuring functional divergence in  multi domain proteins  Genome Res  11 1632 1640     Hou  J   Jun  S R   Zhang  C   and Kim  S H  2005  Global mapping of the protein structure space and  application in structure based inference of protein function  Proc  Natl  Acad  Sci  U S A  102 3651   3656     Kim  Y   Yakunin  A F  Kuznetsova  E   Xu  X   Pennycooke  M   Gu  J   Cheung  F   Proudfoot  M    Arrowsmith  C H   Joachimiak  A   Edwards  A M   and Christendat  D  2004  Structure  and function     based characterization of a new phosphoglycolate phosphatase from Thermoplasma acidophilum  J  Biol   Chem  2719 517 526     Sadreyev  R I  and Grishin  N V  2006  Exploring dynamics of protein structure determination and homology   based prediction to estimate the number of superfamilies and folds  BMC Struct  Biol  6 6        Contributed by Gregory A  Petsko  Brandeis University  Waltham  Massachusetts    Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 1 3       Supplement 15    FAMS and FAMSBASE for Protein Structure    The computer program FAMS  Full Automatic Modeling System  Ogata et al   2000   Iwadate et al  2001  p
253. rent Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 45       Supplement 24    Using VMD  An  Introductory  Tutorial       5 7 46    Supplement 24       Residue 76 is at the protein s C terminus  which is extended towards the solvent and is  quite flexible  while residue 10 is at the surface of the globular part of ubiquitin  The  difference in their dynamics with respect to the rest of the protein is immediately obvious  when our newly obtained data are plotted  Fig  5 7 30   the distance of residue 76 from  the protein s center is substantially greater than that of residue 10  and the distribution  of the distance is noticeably wider due to the flexibility of the C terminus  This is just a  simple example of scripting for the analysis of a trajectory  Similar  but usually much more  complex  customized scripts are routinely employed by VMD users to perform many kinds    of analysis     8  Quit VMD     COMMENTARY    Background Information   VMD has been developed by the Theo   retical and Computational Biophysics Group  at the University of Illinois at Urbana   Champaign  Throughout its development   many features have been added  and user   specific functions can be implemented through  embedded scripting languages like Python and  Tcl  providing a wide spectrum of tools for  the scientific community  Specifically  VMD  is most suitable for high resolution visual   ization and image rendering  preparation of  molecular dynamics simulation systems an
254. require entirely different approaches  to the subject  and may be best served with two entirely different molecular graphics  programs     Parameters to define when beginning a project include     The medium of presentation  Interactive display will allow the use of very complex  representations  whereas print media require simpler representations that will be com   prehensible in a still image     The audience  Images created for molecular biologists typically can be far more  complex than images created for the lay audience or for researchers in other fields   Researchers are often willing to spend more time with an image to ferret out all of the  details     Set Achievable Goals    When designing a representation for a given goal  it is important to set achievable goals   It is rarely possible to show many concepts in a single figure  Instead  it is often best to  pick one concept and create a representation that best serves that goal  For instance  the  overview representation given above is only good for one purpose  to give an overview  of the protein fold and the location of ligand binding sites  If the details of the ligand   binding site were added  perhaps by adding all of the sidechains that interact with the  ligand  the representation would suffer  The binding site would become too complex and  would distract from the global features of the protein  and the details of the active site  would be so small that they would not be comprehensible  The better approach is
255. right i j pdb    One may wish to check for errors during the run by screening the log file with the following  command        grep  i err chi search log   more    If no errors are found  it is best to delete the log files since they can be very large  several  megabytes      19  To calculate the C  RMSD between all of the structures  type         chi rmsd  verbose    This process is relatively time consuming  roughly 0 1 sec per comparison  i e   93 min for  286 structures   The output is a single file       MyProtein variantA results rmsd out     This file contains a list of structures and the C  RMSDs between them  Note that the file  only lists those structure that are lower than the RMSD threshold plus 1 A     Note that when the number of structures increases to a certain point  it is the RMSD  calculation that consumes the largest amount of CPU time  This is because the time  required for molecular dynamics simulations scales linearly with the number of structures  generated  576 structures take only twice the amount of time as 286 structures   whereas  the RMSD calculation scales with the square of the number of structures  comparison of  576 structures takes 4 times longer than 288 structures   The chi  xmsd script is therefore  best suited to cases where the number of structures is approximately 2000 or less  If  interested in simulating a larger system  one can contact the authors  arkin Q cc huji ac il   for alternative scripts to chi rmsd  These scripts reduce the 
256. rmation from template structures    target  evaluate the model model               blbp B99990001       yes     0 6     24  0 20 40 60 80 100 120  Residue Index    Pseudo energy  o  oo             Figure 5 6 1 Steps in comparative protein structure modeling  See text for details  For the color version of    this figure go to Attp   www currentprotocols com        Contributed by Narayanan Eswar  Ben Webb  Marc A  Marti Renom  M S  Madhusudhan  David  Eramian  Min yi Shen  Ursula Pieper  and Andrej Sali   Current Protocols in Bioinformatics  2006  5 6 1 5 6 30   Copyright    2006 by John Wiley  amp  Sons  Inc     Modeling  Structure from  Sequence   D n     5 6 1    Supplement 15    Table 5 6 1 Programs and Web Servers Useful in Comparative Protein Structure Modeling       Name    World Wide Web address       Databases   BALIBASE  Thompson et al   1999   CATH  Pearl et al   2005    DBALI  Marti Renom et al   2001   GENBANK  Benson et al   2005   GENECENSUS  Lin et al   2002   MODBASE  Pieper et al   2004    PDB  UNIT 1 9  Deshpande et al   2005   PFAM  UNIT 2 5  Bateman et al   2004   SCOP  Andreeva et al   2004   SWISSPROT  Boeckmann et al   2003   UNIPROT  Bairoch et al   2005   Template search   123D  Alexandrov et al   1996    3D PSSM  Kelley et al   2000    BLAST  UNIT 3 4  Altschul et al   1997   DALI  UNIT 5 5  Dietmann et al   2001   FASTA  UNIT 3 9  Pearson  2000   FFASO3  Jaroszewski et al   2005   PREDICTPROTEIN  Rost and Liu  2003   PROSPECTOR  Skolnick and Kihara  
257. rotein and ligand  will select no atoms  since there are no common atoms that are  in both the set of protein atoms and the set of ligand atoms  Selection of an appropriate  set of atoms is probably the most difficult  and the most useful  aspect of RasMol usage     b  Rotate the display and notice the following   1  Spacefilling representations show  the bulk of the protein  Notice the way the different subunits interdigitate  and the  way the heme slots into a form fitting groove   2  Many people find it difficult  to identify individual amino acids in spacefilling representations  even if they are  colored by atom type     8  Backbone and Ribbon Diagrams  Two schematic representations are commonly used  to display the topology of a protein chain  In a backbone representation  cylinders  are drawn between successive alpha carbon positions  In a ribbon diagram  a helical  ribbon is used to display alpha helices  a large flat arrow is used to display beta  sheets  and smooth tubes are used to display other portions of the chain  Ribbon  diagrams are excellent for presentation of protein folding  and are currently the most  common representation used in journal publications  The following describes how        Modeli  to create backbone and ribbon diagrams  odenng    Structure from  Sequence       5 4 5    Current Protocols in Bioinformatics Supplement 11    Representing  Structural  Information with  RasMol    5 4 6       Supplement 11                            Figure 5 4 4 B
258. rowser    Aquaporins are membrane channel proteins found in a wide range of species  from  bacteria to plants to human  They facilitate water transport across the cell membrane   and play an important role in the control of cell volume and transcellular water traffic   Many aquaporin protein structures are available in the Protein Data Bank  including a  human aquaporin  PDB code 1FQY  Murata et al   2000  and an E  coli aquaporin  PDB  code 1RC2  Savage et al   2003   To practice dealing with multiple proteins in VMD  let  us load both aquaporin structures     Loading multiple molecules  1  Start a new VMD session  In the VMD Main window  choose File     New  Molecule       The Molecule File Browser window should appear on your screen     2  Use the Browse    button to find the file 1fqy pdb  When you select the file  you  will be back in the Molecule File Browser window  Press the Load button to load the  molecule  The coordinate file of human aquaporin AQPI should now be loaded and  can be seen in the OpenGL window     3  In the Molecule File Browser  make sure you choose New Molecule in the Load files  for  pull down menu on the top  Use the Browse      button to find the file 1rc2 pdb  and press Load  Close the Molecule File Browser window    You have just loaded two molecules  Any number of molecules can be loaded and displayed    in VMD simultaneously by repeating the previous step  VMD can load as many molecules  as the memory of your computer allows     Take a look a
259. rthographic mode tends to be more useful for analysis  because alignment is easy to see   while perspective mode is often used for producing figures and stereo images     Another way VMD can represent depth is through so called  depth cueing   Depth cueing  is used to enhance three dimensional perception of molecular structures  particularly with  orthographic projections     16  Choose Display     Depth Cueing in the VMD Main window     When depth cueing is enabled  objects further from the camera are blended into the  background  Depth cueing settings are found in Display     Display Settings      Here  one can choose the functional dependence of the shading on distance  as well as some  parameters for this function  To see the depth cueing effect better  you might want to hide  the representation with the Surf drawing method     17  Finally  VMD can also produce stereo images  In the VMD Main window  look at  the Display     Stereo menu  showing many different choices  Choose SideBySide   remember to return to Perspective mode for a better result   The result should look  like Figure 5 7 13     18  Turn off stereo image by selecting Display     Stereo     Off in the VMD Main  window  Also  turn off depth cueing by unselecting the Display     Depth Cueing  checkbox in the VMD Main window     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 15       Supplement 24    Using VMD  An  Introductory  Tutorial    5 7 16       Supplement 24           
260. ry  Here  we will use the default option  Snapshot  One can also choose the  output file format for the movie in menu item Format     4  Select the Rock and Roll option in the Movie Settings menu in the VMD Movie  Generator window  Set the working directory to any convenient directory of your  choice  give your movie a name  and click Make Movie     5  Once rendering is finished  open and view the movie with your favorite application   This movie setting is good for showing one side of your system primarily    If you cannot successfully make movies with VMD  it is possible that you are missing   some software required for generating movies  All of the required softwares are freely    available  and to find what software you need  please see the VMD Movie Plugin page at  http   www ks uiuc edu Research vmd plugins vmdmovie      Making trajectory movies  6  Now  we will make a movie of the trajectory  In the VMD Movie Generator window     select Movie Settings     Trajectory  give this one a different name  and click Make  Movie     Note that the length of the movie is automatically set 24 frames per second  For a trajectory   duration of the movie can be decreased  but cannot be increased     7  Try out different options in the VMD Movie Generator window  Once you are done   quit VMD     SCRIPTING IN VMD    VMD provides embedded scripting languages  Python and Tcl  for the purpose of user  extensibility  In this section  we will discuss the basic features of the Tcl scripting 
261. s com     Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 7 17       Supplement 24    BASIC  PROTOCOL 3    Using VMD  An  Introductory  Tutorial       5 7 18    Supplement 24       The other renderers  e g   POV3 and Tachyon  reprocess everything  so it may not look  exactly as it does in the OpenGL window  In particular  they do not  clip   or hide  objects  very near the camera  If you select Display     Display Settings    in the VMD Main  window  you can set Near Clip to 0 01 to get a better idea of what will appear in your  rendering     24  Quit VMD     WORKING WITH TRAJECTORIES AND MAKING MOVIES    Time evolving coordinates of a system are called trajectories  They are most commonly  obtained from simulations of molecular systems  but can also be generated by other  means and for different purposes  Upon loading a trajectory into VMD  one can see a  movie of how the system evolves in time and analyze various features throughout the  trajectory  This section will introduce the basics of working with trajectory data in VMD   You will also learn how to analyze trajectory data in Basic Protocols 14  15  and 16     Necessary Resources    Hardware    Computer    Software    VMD  and a movie player program    Files    ubiquitin psf and pulling dcd  which can be downloaded from  http   www currentprotocols com    Working with Trajectories    Trajectory files are commonly binary files that contain several sets of coordinates for  the system  Each 
262. s in the PDB   Protein Eng  11 411 414     Mirkovic  N   Marti Renom  M A   Sali  A   and  Monteiro  A N A  2004  Structure based assess   ment of missence mutations in human BRCAI   Implications for breast and ovarian cancer pre   disposition  Cancer Res  64 3790 3797     Misura  K M  and Baker  D  2005  Progress and  challenges in high resolution refinement of pro   tein structure models  Proteins 59 15 29     Misura  K M   Chivian  D   Rohl  C A   Kim  D E    and Baker  D  2006  Physically realistic homol   ogy models built with ROSETTA can be more  accurate than their templates  Proc  Natl  Acad   Sci  U S A  103 5361 5366     Miwa  J M   Ibanez Tallon  I   Crabtree  G W    Sanchez  R   Sali  A   Role  L W   and Heintz   N  1999  lynx1  an endogenous toxin like mod   ulator of nicotinic acetylcholine receptors in the  mammalian CNS  Neuron 23 105 114     Modi  S   Paine  M J   Sutcliffe  M J   Lian  L Y    Primrose  W U   Wolf  C R   and Roberts  G C   1996  A model for human cytochrome P450 2D6  based on homology modeling and NMR studies       Modeling  Structure from  Sequence    5 6 27       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller    5 6 28       Supplement 15       of substrate binding  Biochemistry 35 4540   4550     Moult  J  2005  A decade of CASP  Progress  bot   tlenecks and prognosis in protein structure pre   diction  Curr  Opin  Struct  Biol  15 285 289     Moult  J  and James  M N  1986  An algorithm  for determining the conf
263. s may be reported by protein pair     DOWNLOADING AND INSTALLING THE DaliLite STAND ALONE  PROGRAM    DaliLite is a stand alone program package that can help researchers compare large num   bers of protein structures for specialized projects efficiently and locally  The DaliLite  distribution package contains a self contained package of scripts and programs written in  Perl and Fortran 77  It has been tested on the Linux operating systems  RedHat distribu   tion  version 6 0  http   www redhat com  and on Cygwin  a Linux like environment for  Microsoft Windows  http   cygwin com   The program code is distributed to academic  users  Commercial use is prohibited     Necessary Resources  Hardware  Computer that operates the Linux operating system  e g   Sun  Alpha  Silicon  Graphics  PC   Software    Fortran 77 compiler  http    www gnu org software fortran fortran html   Perl interpreter  Perl v  5 0 or higher http   www perl org   Cygwin  http   cygwin com   optional    1  Download the academic license agreement from  ttp   www bioinfo biocenter   helsinki fildali liteldownloads  and print  sign  and fax it to the address indicated     2  Download the DaliLite program package by clicking on the link at the top of the  above Web page     The current distribution version  as of this writing  is 2 4 2     Complete instructions for compilation and installation are available in the INSTALL  file included in the DaliLite distribution  as well as instructions for where to obtain the 
264. s with little secondary structure Falicov and Cohen  1996    Parsi Sensitive branch and bound alignment algorithm Holm and Sander  1996    Dalicon Refine all alignments generated by the above Holm and Sander  1993     methods  with different objective functions  using a  Monte Carlo algorithm that maximizes the Dali          score  DCCP 1 93 9 1 8 33 3 6 39 1 lppt Ibba             m structure          first structure                of aligned blocks  oun uu identity    number of structurally equivalent residues       alignment  1 33  List of start and  1 33  list of start and             root mean square deviation  in angstroms  of alpha carbons    raw similarity score    end residues of each aligned block in the first structure    end residues of each aligned block in the second structure          Figure 5 5 13 Format of the DCCP file     8  Prepare a list of chain identifiers in a file to perform a pairwise comparison of the  query to each structure in the list  For example  the list file  mylist  may have the  following contents     1bf6A  1j79A  la4mA  1k70A  3ubpC    9  To compare 3ubpC against each entry in the list file  enter the following user input  after the Linux prompt     Linux prompt gt  perl DaliLite    list 3ubpC mylist       10  For all against all comparison enter the following user input after the Linux prompt  Modeling  Structure from  Linux prompt  perl DaliLite    Al11Al11 mylist Sequence  5 5 15    Current Protocols in Bioinformatics Supplement 14    S
265. s with the 90 to 100    Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 6 11       Supplement 15       SUPPORT  PROTOCOL    Comparative  Protein Structure  Modeling Using  Modeller    5 6 12       Supplement 15       DOPE Score    template a       50 100 150 200 250 300 350    Residue index                Figure 5 6 11 A comparison of the pseudo energy profiles of the model  red  and the template   green  structures  For the color version of this figure go to http  Awww currentprotocols com     region  it is reported to be of high energy by DOPE  It is to be noted that a region of high  energy indicated by DOPE may not always necessarily indicate actual error  especially  when it highlights an active site or a protein protein interface  However  in this case  the  same active site loops have a better profile in the template structure  which strengthens  the argument that the model is probably incorrect in the active site region  Resolution  of such problems is beyond the scope of this unit  but is described in a more advanced  modeling tutorial available at http   salilab org modeller tutorial advanced html     OBTAINING AND INSTALLING MODELLER    MODELLER is written in Fortran 90 and uses Python for its control language  All input  scripts to MODELLER are  hence  Python scripts  While knowledge of Python is not  necessary to run MODELLER  it can be useful in performing more advanced tasks  Pre   compiled binaries for MODELLER can be downloaded 
266. sary Resources    Hardware    Hardware requirements are defined by those that are officially supported by  CNSsolve  i e   one of the following computers   SGI  R4000 and later  running IRIX 4 0 5 or later  HP  PA Risc  running HP UX 9 05 or later  DEC Alpha running OSF1 Digital Unix Tru64 Unix  PC  1386  1486  1586  or 1686  running Linux or Windows 98 or NT or higher  Additionally  CNSsolve also provides unsupported installations for other systems   Convex running ConvexOS   Cray  J90  YMP  C90  T90  running Unicos  Cray T3E  single CPU  running Unicosmk  IBM RS 6000 running AIX  Sun running SunOS  Unix systems with g77 gcc  EGCS 1 1   Windows 98 or NT  or higher  systems with g77 gcc  EGCS 1 1   A Macintosh OS X port is also available  contact the authors for details   arkin Q cc huji ac il     Software    CNSsolve  available free of charge for academic users at http   cns csb yale edu   CHI  available from Paul D  Adams  PDAdams Q9 Ibl gov    Perl  Perl is a component of nearly all standard Unix distributions  It is available  free of charge at www perl org  Install according to the instructions on the Web  page    Three Perl scripts   1  ak cluster pl   2  compare_rmsd pl  and  3  to  gly pl  available from the authors  arkin  cc huji ac il    A CNSsolve input script  cns   inp  available from the authors  arkin Q cc  huji   ac il    A standard text editor  e g   jot  notepad  or nedit    A Web browser   Software to perform multiple sequence alignment  e g   Clustal X  C
267. sb org pdb software list html    Contains links to many molecular graphics pro   grams and provides access to macromolecular co   ordinates        Contributed by David S  Goodsell  The Scripps Research Institute  La Jolla  California    Modeling  Structure from  Sequence    5 423       Supplement 11                e e e  Using Dali for Structural Comparison UNIT 5 5  of Proteins  Dali  distance matrix alignment  is a tool for both pairwise structure comparison and  structure database searching  It is equipped with a Web interface to easily view the  results  multiple alignments  and three dimensional  3D  superimpositions of structures   The method is fully automated and very sensitively identifies common structural cores  and structural resemblances  Dali uses 3D Cartesian coordinates of Ca atoms of each  protein in order to calculate residue residue distance matrices  A similarity score for  these sets is defined as a weighted sum of equivalent intramolecular distances  resulting  in a scored list of all important structural alignments  This method allows for any length  of gaps in the sequence  i e   insertions or deletions  and detects similarities involving  geometrical distortions   Dali is easily accessible through Web servers  and Table 5 5 1 outlines the relationships of  Dali resources  The DaliLite server can be used to compare two known structures to each  other and visualize their superimposition  Basic Protocol 1   This server requires two sets  of atomic coordi
268. se of the lack of solved structures  On the  other hand  other modeling methods are rela   tively easy to implement for membrane pro   teins  compared to water soluble proteins  due  to the overall simplicity of membrane proteins   in particular those formed from oc helical bun   dles  Furthermore  assignment of the different  helices in an ochelical bundle  the more abun   dant and pharmaceutically important family  is  relatively straightforward  Thus  it can be con   cluded that while the structures of o helical  membrane proteins are the most difficult to  determine experimentally  fortunately they are  the easiest to predict computationally    Despite the apparent ease with which it is  possible to simulate membrane proteins using  molecular dynamics  there is one issue that has  can potentially present difficulty  the presence  or absence of a lipid bilayer  In the simulations  of membrane proteins using molecular dynam   ics in CHI  no lipids or solvent molecules are  employed  because of the prohibitive computa   tional cost  However it is possible to argue that  the most important stabilizing force in any  oligomeric bundle will be the interaction be   tween the helices themselves  Torres et al    2001   Thus  there is some justification in the  simulation procedure described here  although    Current Protocols in Bioinformatics    the lack of a lipid environment should always  be borne in mind     Critical Parameters and  Troubleshooting   The underlining premise of
269. segments that fit  these guiding positions  The guiding positions  usually correspond to the C  atoms of the    Current Protocols in Bioinformatics    segments that are conserved in the alignment  between the template structure and the tar   get sequence  The all atom segments that fit  the guiding positions can be obtained either  by scanning all known protein structures  in   cluding those that are not related to the se   quence being modeled  Claessens et al   1989   Holm and Sander  1991   or by a conforma   tional search restrained by an energy function   Bruccoleri and Karplus  1987  van Gelder  et al   1994   This method can construct both  main chain and side chain atoms  and can also  model unaligned regions  gaps   It is imple   mented in the program SegMod  Levitt  1992    Even some side chain modeling methods   Chinea et al   1995  and the class of loop   construction methods based on finding suit   able fragments in the database of known struc   tures  Jones and Thirup  1986  can be seen as  segment matching or coordinate reconstruct   ion methods     Modeling by satisfaction of spatial restraints   The methods in this class begin by generat   ing many constraints or restraints on the struc   ture of the target sequence  using its alignment  to related protein structures as a guide  The  procedure is conceptually similar to that used  in determination of protein structures from  NMR derived restraints  The restraints are  generally obtained by assuming that the c
270. sessment of fully automated struc   ture prediction methods  Proteins 5 171 183     Fiser  A  2004  Protein structure modeling in the  proteomics era  Expert Rev  Proteomics 1 97   110     Fiser  A  and Sali  A  2003a  Modeller  Genera   tion and refinement of homology based protein  structure models  Methods Enzymol  374 461   491     Modeling  Structure from  Sequence    5 6 25       Supplement 15    Comparative  Protein Structure  Modeling Using  Modeller    5 6 26       Supplement 15       Fiser  A  and Sali  A  2003b  ModLoop  Automated  modeling of loops in protein structures  Bioin   formatics 19 2500 2501     Fiser  A   Do  R K   and Sali  A  2000  Modeling of  loops in protein structures  Protein Sci  9 1753   1773     Fiser  A   Feig  M   Brooks  C L  3rd  and Sali   A  2002  Evolution and physics in compara   tive protein structure modeling  Acc  Chem  Res   35 413 421     Gao  H   Sengupta  J   Valle  M   Korostelev  A    Eswar  N   Stagg  S M   Van Roey  P   Agrawal   R K   Harvey  S C   Sali  A   Chapman  M S    and Frank  J  2003  Study of the structural dy   namics of the E coli 70S ribosome using real   space refinement  Cell 113 789 801     Godzik  A  2003  Fold recognition methods  Meth   ods Biochem  Anal  44 525 546     Gough  J   Karplus  K   Hughey  R   and Chothia  C   2001  Assignment of homology to genome se   quences using a library of hidden Markov mod   els that represent all proteins of known structure   J  Mol  Biol  313 903 919     Greer  J  19
271. set of coordinates corresponds to one frame in time  An example of  a trajectory file is a DCD file generated by the molecular dynamics program NAMD   Phillips et al   2005      Load trajectories   Trajectory files do not contain information of the system contained in the protein structure  files  PSF   Therefore  we first need to load the PSF file  and then add the trajectory data  to this file     1  Start anew VMD session  In the VMD Main window  select File     New Molecule        The Molecule File Browser window will appear on your screen     2  Use the Browse    button to find the file ubiquitin psf  When you select this  file  you will be back in the Molecule File Browser window  Press the Load button to  load the molecule     3  In the Molecule File Browser window  make sure that ubiquitin psf is selected  in the    Load files for     pull down menu on top  and click on the Browse button   Browse for pulling dcd     Note the options available in the Molecule File Browser window  one can load trajectories  starting and finishing at chosen frames  and adjust the stride between the loaded frames   Leave the default settings so that the whole trajectory is loaded     4  Click on the Load button in the Molecule File Browser window     Current Protocols in Bioinformatics       animation tools       frame number slider play forward             Figure 5 7 15 Animation tools in the VMD main menu  The tools allow one to go over frames  of the trajectory  e g   using the    slid
272. si  method   If the query structure has few secondary structure elements  the program auto   matically switches to the Soap method  Monte Carlo optimization is used for refinement   see Table 5 5 2      6  DaliLite has three main options for alignment  The simplest is pairwise alignment    align option  which takes two chain identifiers as argument  for example     Linux prompt gt  perl DaliLite    align 3ubpC lgkpA    The arguments are the unique identifier with the chain identifier appended  Alignment  data is automatically output to alignment files   lt code gt   dccp     7  An optimal and a number of suboptimal structural alignments are reported for each  pair of structures  Similarities with a Z score below zero are omitted from the output   The format is shown and explained in Figure 5 5 13            gt  gt  gt  gt  1xg8A 108 7 3 4  EHEHEEH    i order E secondary structure elements    dudo  of beta strands  E    Mus of helices  H        total number of secondary structure elements       number of residues       chain identifier             Figure 5 5 12 Format of the DAT file     Current Protocols in Bioinformatics    Table 5 5 2 Program Modules of the Dali Suite          Program Purpose Reference   DSSP Parse PDB entry  define secondary structure Kabsch and Sander  1983   elements   PUU Derive a tree of compact substructures to guide Holm and Sander  1994   alignment   Wolf Very fast filter to identify obvious similarities Holm and Sander  1995    Soap Align structure
273. structures using the stand alone version of DaliLite  It performs the structural compar   isons between all pairs of two user provided lists of structures  The results are stored in  an internal alignment format which can be processed by computer programs for further  statistical analysis  There is an option to reformat the results as    human readable    output     Necessary Resources  Hardware    Computer that operates the Linux operating system  Sun  Alpha  Silicon Graphics   PC     Software    DaliLite program  see Support Protocol    Perl interpreter  Perl v  5 0 or higher  http   www perl org    Internet browser  e g   Internet Explorer  http   www microsoft com  Netscape   http   browser netscape com   or Firefox  http   www mozilla org firefox     Files  Protein structures in PDB format  1  Download and install DaliLite as described in the Support Protocol     Prepare structures    2  Prepare all structures that one wants to compare using the  readbrk option   supplying a unique identifier for the structure as the second argument as follows     Linux prompt gt  perl DaliLite  readbrk  lt pdbfile gt   lt pdbid gt     The identifier must be in PDB style  i e   four characters long  as shown in the  examples below     DaliLite  readbrk 3ubp brk 3ubp  DaliLite  readbrk  data pdb 3ubp brk 3ubp    DaliLite  readbrk  data pdb pdb3ubp ent 3ubp    These structural data are stored in a DAT subdirectory under the DaliLite home directory     Current Protocols in Bioinformatics   
274. sually have a larger impact  on the model accuracy  especially for models  based on low sequence identity to the tem   plates  However  it is important that a model   ing method allow a degree of flexibility and  automation to obtain better models more eas   ily and rapidly  For example  a method should  allow for an easy recalculation of a model  when a change is made in the alignment  It  should also be straightforward enough to cal   culate models based on several templates  and  should provide tools for incorporation of prior  knowledge about the target  e g   cross linking    restraints  predicted secondary structure  and  allow ab initio modeling of insertions  e g    loops   which can be crucial for annotation of  function     Loop modeling   Loop modeling is an especially important  aspect of comparative modeling in the range  from 3096 to 5096 sequence identity  In this  range of overall similarity  loops among the  homologs vary while the core regions are still  relatively conserved and aligned accurately   Loops often play an important role in defin   ing the functional specificity of a given pro   tein  forming the active and binding sites  Loop  modeling can be seen as a mini protein folding  problem  because the correct conformation of  a given segment of a polypeptide chain has  to be calculated mainly from the sequence of  the segment itself  However  loops are gener   ally too short to provide sufficient information  about their local fold  Even identical dec
275. t   S   and Schneider  M  2003  The SWISS   PROT protein knowledgebase and its supple   ment TrEMBL in 2003  Nucl  Acids Res  31 365   370     Boissel  J P   Lee  W R   Presnell  S R   Cohen  F E    and Bunn  H F  1993  Erythropoietin structure   function relationships  Mutant proteins that test  a model of tertiary structure  J  Biol  Chem   268 15983 15993     Bowie  J U   Luthy  R   and Eisenberg  D  1991  A  method to identify protein sequences that fold  into a known three dimensional structure  Sci   ence 253 164 170     Braun  W  and Go  N  1985  Calculation of protein  conformations by proton proton distance con   straints  A new efficient algorithm  J  Mol  Biol   186 611 626     Brenner  S E   Chothia  C   and Hubbard  T J  1998   Assessing sequence comparison methods with  reliable structurally identified distant evolution   ary relationships  Proc  Natl  Acad  Sci  U S A   95 6073 6078     Browne  W J   North  A C   Phillips  D C   Brew   K   Vanaman  T C   and Hill  R L  1969  A possi   ble three dimensional structure of bovine alpha   lactalbumin based on that of hen   s egg white  lysozyme  J  Mol  Biol  42 65 86     Bruccoleri  R E  and Karplus  M  1987  Prediction  of the folding of short polypeptide segments by  uniform conformational sampling  Biopolymers  26 137 168     Bruccoleri  R E  and Karplus  M  1990  Conforma   tional sampling using high temperature molec   ular dynamics  Biopolymers 29 1847 1862     Bujnicki  J M   Elofsson  A   Fischer  D   and  
276. t  Secondary structure definitions are shown below the amino  acid sequences     Current Protocols in Bioinformatics                5  Dali Database Dali Database  select structural neighbours of LqkuA   Microsoft Internet Explorer       Be Edt Vew Favorites Toos Hep         Dali Database  select structural neighbours of 1qkuA  Structure Alignment  J Structure Sequence Alignment   3D Superimposition   PDB Format    Reset Selection    neighbour Z tide rmsd lali lseq2 PDB compound  r 0  JgkuA 44 6 100 0 0 250 250 PDB ESTRADIOL RECEPTOR  m 1  lgknh 30 4 57 1 3 219 228 PDB ESTROGEN RECEPTOR BETA  C 2  ltfcA 30 1 36 1 8 225 226 PDB ESTROGEN RELATED RECEPTOR GAMMA  T 3  112jA 28 3 54 1 9 224 232 PDB ESTROGEN RECEPTOR BETA  m 4   xb  A 27 8 33 1 7 214 215 PDB STEROID HORMONE RECEPTOR ERR1  T S  ls9gB 27 1 35 1 7 205 216 PDB ESTROGEN RELATED RECEPTOR GAMMA  T 6  le3kA 26 7 22 1 9 229 251 PDB PROGESTERONE RECEPTOR  m 7  l1m2zA 26 4 25 2 0 228 255 PDB GLUCOCORTICOID RECEPTOR  n 8  lpk5A 24 4 9 5 9 95 242 PDB ORPHAN NUCLEAR RECEPTOR NR5A2  m 9  lpz    23 8 25 0 0 212 227 PDB HEPATOCYTE NUCLEAR FACTOR 4 ALPHA  T 10  lrlkD 23 7 18 0 0 221 236 PDB ECDYSONE RECEPTOR  M 11  lpg  B 23 3 18 2 3 222 241 PDB OXYSTEROLS RECEPTOR LXR BETA  T 12  llv2A 23 2 26 2 2 207 225 PDB HEPATOCYTE NUCLEAR FACTOR 4 GAMMA  m 13  lyucA 23 2 25 2 9 219 240 PDB ORPHAN NUCLEAR RECEPTOR NRSA2  T 14  lpduA 23 0 21 2 7 219 230 PDB NUCLEAR HORMONE RECEPTOR HR38  T 15  2lbd  22 7 22 2 6 217 238 PDB RETINOIC ACID RECEPTO
277. t script file that searches for templates against a database of nonre   dundant PDB sequences     3  Reads a file  in text format  containing nonredundant PDB sequences  into the sdb  database  The sequences can be found in the file pdb_95 pir  This file is also  in the PIR format  Each sequence in this file is representative of a group of PDB  sequences that share 9596 or more sequence identity to each other and have less than  30 residues or 3046 sequence length difference     4  Writes a binary machine independent file containing all sequences read in the pre   vious step     5  Reads the binary format file back in for faster execution     6  Creates a new  alignment  object  a1n   reads the target sequence TvLDH from the  file TvLDH  ali  and converts it to a profile object  orf   Profiles contain similar  information to alignments  but are more compact and better for sequence database  searching     7  prf build   searches the sequence database  sdb  with the target profile  orf    Matches from the sequence database are added to the profile     8  pr   write   writes anew profile containing the target sequence and its homologs  into the specified output file  filebuild profile prf Fig 5 6 4   The equivalent  information is also written out in standard alignment format     The profile build   command has many options  see Internet Resources for  MODELLER Web site   In this example  rr file is set to use the BLOSUM62 sim   ilarity matrix  file blosum62 sim mat provided in 
278. t your VMD Main window  which should look like Figure 5 7 19  Within the  VMD Main menu you can find the Molecule List Browser  circled in Fig  5 7 19   which  shows the global status of the loaded molecules  The Molecule List Browser displays    Current Protocols in Bioinformatics    BASIC  PROTOCOL 9    Modeling  Structure from  Sequence    5 7 29       Supplement 24    Using VMD  An  Introductory  Tutorial       5 7 30    Supplement 24             Molecule List  Browser       Molecule Status Flags             Figure 5 7 19 The Molecule List Browser     information about each molecule  including Molecule ID  ID   the four Molecule Status  Flags  T  A  D  and F  which stand for Top  Active  Drawn  and Fixed   name of the  molecule  Molecule   number of atoms in the molecule  Atoms   number of frames loaded  in the molecule  Frames   and the volumetric data loaded  Vol   Let us first start with the  Molecule column  By default  the Molecule column displays file names of the molecules  loaded in VMD  but you can change the molecule names to recognize them more easily     Changing molecule names    4  In the VMD Main menu  double click on 1fqy pdb in the Molecule column  A  window will pop up with the message Enter a new name for molecule  0   Type inhuman aquaporin  and click OK  or press enter   In the VMD Main  menu  the first molecule now has the name human aquaporin     5  Repeat the previous step for the E  coli aquaporin by double clicking the 1rc2   pdb  molecule name  
279. tart a new VMD session  Open the Molecule File Browser window by choosing the  File     New Molecule     menu item in the VMD Main window  In the Molecule File  Browser window  use the Browse    button to find and select the file 1fqy  pdb     Press Load to load the molecule     2  Load the remaining aquaporins  1rc2  11da  and 134n  Make sure that each pdb  file is loaded into a new molecule  Close the Molecule File Browser window when you  have loaded all four molecules  Your VMD Main menu should look like Figure 5 7 21    when all four aquaporins are loaded     Aligning the molecules    3  Within the VMD main window  choose the Extension menu and select Analysis  gt     MultiSeq     The MultiSeq window  with window name untitled multiseq showing at the top   should now be open  You may be asked to update some databases in a pop up window if this  is the first time you use MultiSeq  If this is the case  simply click Yes and wait for MultiSeq  to finish downloading  When MultiSeq starts  your MultiSeq window should display a  list of the four aquaporin protein structures and a list of two nonprotein structures  The  nonprotein structures are detergent molecules used in crystallizing the aquaporin proteins   and will not be needed for structure or sequence alignment  You can tell MultiSeq to discard    molecules you are not interested in     4  In the MultiSeq window  select the 11da X detergent molecule by clicking on it   This will highlight the entire row of 11da  X  Remov
280. tative structure for each cluster  issue the  following command         chi average    This process is moderately time consuming  taking a few hours   The output is both a file  depicting the results of the program       MyProtein variantA results average out    which includes the orientational parameters and energy of each cluster average  and the  structure for each cluster average       MyProtein variantA results clusterN pdb    where N is the number of the cluster     Find a    complete set       23     24     25     Repeat GMDS  steps 16 to 22  for all the variants     Remember that each variant   s search should be undertaken in its specific subdirectory  e g        MyProtein variantB     MyProtein variantC      The following steps are in preparation for comparing clusters of different variants and are  are not part of the    standard    CHI package  The process starts with creating a virtual GLY  variant  Selecting the right cluster  which is the one that exists in all the variants  depends  on comparing the RMSD between all the structures obtained in the previous steps   Comparing RMSD for all the atoms of every two variants is impossible  due to the fact that  they differ at one or more of their amino acids  However  one may avoid this problem by  comparing only the RMSD of their backbones  Therefore  a virtual variant  whose sequence  is composed only of glycine should be created     Create a new subdirectory named GLY in the upper directory  using  e g   the  comma
281. the MODELLER distribution    Accordingly  the parameters matrix offset and gap penalties 1d are set to          Comparative the appropriate values for the BLOSUM62 matrix  For this example  only one search  cd dcc iteration is run  by setting the parameter n prof iterations equal to 1  Thus  there  Modelle  is no need to check the profile for deviation  Check  profile set to False   Finally    5 6 6    Supplement 15 Current Protocols in Bioinformatics               Number of sequences  30     Length of profile E 335     N PROF ITERATIONS z 1     GAP_PENALTIES_1D    900 0  50 0     MATRIX_OFFSET   0 0     RR_FILE     MODINSTALL8v0  modlib  asl sim mat  1 TvLDH S 0 335 1 335  0 0 0 0  0 0  2 1ab5z X 1 312    75  242 63 229 164 28  0 83E 08  3 1b8pA X 1 327 jJ 331 6 325 316 42  0 0  4 i1bdmA X 1 318 1 325 1 310 309 45  0 0  5  1t2dA X 1  315 5 256 4 250 238 25  0 66E 04  6 lcivA X 1 374 6 334 33 358 325 35  0 0  7 2cm   X 1 312 r  320 3 303 289 27  0 16E 05  8 1o6zA X 1 303 7 320 3 287 278 26  0 27E 05  9 1ur5A X 1 299 13 191 9 T7 158 31  0 25E 02  10 lguzA X q 305    I3 301 8 280 265 25  0 28E 08  11 lgv A X 1 301 13 323 8 289 274 26  0 28E 04  12 1hyeA X l 307 d 191 3 183 173 29  0 14E 07  13 1liOzA X 1 332 85 300 94 304 207 25  0 66E 05  14 1lilOA X 1 331 85  295 93 298 196 26  0 86E 05  15 lidna X 1 316 78 298 73 301 214 26  0 19E 03  16 61dh X I 329 47 301 56 302 244 23  0 17E 02  17 21dx X T 331 66 306 67 306 227 26  0 25E 04  18 51dh X I 333 85 300 94 304 207 26  0 30E 05
282. the indices of these atoms  make a selection including  these two atoms by typing in the TkConsole window     set sel  atomselect top   resid 48 76 and name CA       Current Protocols in Bioinformatics    BASIC  PROTOCOL 14    Modeling  Structure from  Sequence    5 7 39       Supplement 24       5  Get the indices by typing the following line in the TkConsole window      sel get index  This command should give the indices 770 1242     Note that the atom numbers of these atoms in the pdb file are 771 and 1243  This is because  VMD starts counting atom indices from zero  This is only the case for index  since VMD  does not read them from the PDB file  Other keywords  such as residue  are consistent with  the PDB file       In the Graphical Representations window  create a representation for the selection  index 770 1242  with VDW as drawing method       Now that you can see the two a carbons  choose the Mouse     Label     Bonds menu  item from the VMD Main menu  Click on each atom one after the other     You should get a line connecting the two atoms  Fig  5 7 27   The number appearing next  to the line is the distance between the two atoms in Angstroms  The value of the distance  displayed corresponds to the current frame  Try playing the trajectory   you will see that  the label is modified automatically as the distance between the atoms changes  Note that  the appearance of the line  its color   as well as the appearance of essentially all other  objects in VMD  can be cha
283. to University    9 1 Shwokane 5  Minato ku  Tokyo  108 8642 JAPAN  PHONE  81 3 3444 6161  FAX   81 3 3446 9553    Please send your questions and c  wacatemfpharm kitasato    6  A INRTANELE Zu 47 2 5             Figure 5 2 12 The FAMS Web page  The server status is displayed in the upper right hand corner     Current Protocols in Bioinformatics    Modeling Membrane Proteins Utilizing  Information from Silent Amino Acid  Substitutions    Transmembrane o helical bundles represent a simple topology that can be described by a  relatively small number  n  of parameters   1  helix tilt   2  rotational position  and  3  register   Fig  5 3 1   Thus for any hetero oligomer  3 x n parameters are needed to describe the overall  structure  while for any symmetrical homo oligomer only 2 parameters are generally suffi   cient to describe the structure  helix tilt  B  and rotational pitch angle  0      Due to the reduced number of degrees of freedom  it is possible to exhaustively search each  ofthe above parameters computationally in a procedure for which the name Global Molecular  Dynamics Search  GMDS  has been coined  Adams et al   1995   GMDS has been automated  by a comprehensive series of task files and modules  written by Paul D  Adams  called CHI   CNS searching of Helix Interactions  Adams et al   1995   to be used in the general  computational structural biology software suite CNS  Crystallography and NMR System      Depending on the parameters used  CHI routinely yields several 
284. tom List   Tsaa  rmt     File Upload  on     Or Enter query sequence     IKLSNTTEVEHOGTRT LOALNNVSLHVPAGO TYGY IGASGAGESTL IRCVMLLERPTEG  VOGOEL TTLSESELTKARROTGM LFOMF NLLSSRTVF GNYALPLELONTPEDE VK RR  TELLSL VGLGDKHDS YPSNLSGGOK ORVA  ARALASNPX VLLCDEATSALDPATTRS IL  LKOINRALGL TILL I THEMDYVERICOCVAYISNGEL 1EQOTVSE VF SHPICTPLAOKF  TOSTLHLO  PEDYOERL OAEPF TOCVPMLRLEF TGOSVDAPLLSETARRENYNNNIISAQ  YAGGVEFGIML TEMHGTOODTOAA I AVLOE HHVY VE VLCYY    secus ee   000462  70001347                 Figure 5 2 7 If an amino acid sequence is of interest  input the sequence in the large text box as  shown here     Current Protocols in Bioinformatics       Modeling  Structure from  Sequence    52 11       Supplement 4       THA RRO RTV  B ULADUD I D AWIW    s  r 82 2 l8 woe WI  PELAR  E htto  tamebase bionseoya wac p ceibn tansbsse putAre x  osh  Goge JER dom OA  EE 900          Searched ORFs  4 Hits Found Order by  Gene Name       asc C desc Seach   Heb  Show  1004s z  Goto No f Goto Top Pace t Previous Pace Next Page    gt     243   ATP binding component of a transporter    1029 1   26 4  abc   6 ecol   343  ATP binding component of a transporter    m QUA  324  abo    gt    ecol   343 ATP binding component of a transporte  1  21 8   262  wx    ecol   343   ATP binding component of a transporter      94  197  abe                   Figure 5 2 8 A model list with annotations  model lengths  number of amino acids   and identity  percentages of amino acid sequence alignments with experime
285. toms in sel  and the second containing the corresponding maxima     Once you are done with a selection  it is always a good idea to delete it to save  memory      sel delete    Sourcing Scripts    When performing a task that requires many lines of commands  instead of typing each  line in the Tk Console window  it is usually more convenient to write all the lines into  a script file and load it into VMD  This is very easy to do  Just use any text editor to  write your script file  and in a VMD session  use the command source filename  to execute the file  You should have downloaded a simple script  beta tcl  with  this unit  We will execute it in VMD as an example  The script beta tcl sets the  colors of residues LYS and GLY to a different color from the rest of the protein by  assigning them a different beta value  a trick you have already learned in Basic Protocol 6   steps 5 to 9     In the Tk Console window  type source beta tcl and observe the color change   You should see that the protein is mostly a collection of red spheres  with some residues  shown in blue  The blue residues are the LYS and GLY residues in the ubiquitin  Take a  quick look at the script beta  tc1  Using any text editor of your choice  open the file  beta tcl  There are six lines in this file  and each line represents a Tcl command line  that you have used before  Close the text editor when you are done     The   vmd file you saved in Basic Protocol 1  step 47  is actually a series of commands   You 
286. topology of  the fold or folding motif is conserved   Topol   ogy here means the relative location of helices  and strands and the loop connections between  them   Deviations can be even larger and qual   itatively different when structural similarity is  the result of convergent rather than divergent  evolution  In particular  convergent evolution  may result in similar 3D folds that differ in the  topology of loop connections  The modular  architecture of proteins presents another com   plication  Large proteins can be decomposed  into semiautonomous  globular folding units  called domains  Domains are often evolution   arily mobile modules and may carry specific  biological functions  Because a common do   main may be surrounded by completely un   related domains  most structure comparison  methods search for local similarities    Given a measure of similarity or distance   the algorithmic problem is to find the set of  corresponding points in two structures that op   timize this target function  Just as there is much  latitude in the formulation of the structure  comparison problem  many different types of  optimization algorithms have been employed   Similarity measures of the sum of pairs form  and subgraph isomorphism formulations of  the structure comparison problem belong to  the NP complete class of problems and one  has to resort to heuristics for practical algo   rithms  Heuristic approaches do not aim for  provably correct solutions  gaining computa   tional perf
287. tput file myoutput  dat  either by a text editor of your choice  or  the command less in a terminal window on a Mac or Linux Machine     Working with a Molecule Using Tcl Text Commands    Anything that can be done in the VMD graphical interface can also be done with text  commands  This allows scripts to be written that can automatically load molecules  create  representations  analyze data  make movies  etc  Here  we will go through some simple  examples of what can be done using the scripting interface in VMD     Loading molecules with text commands    1  In the VMD TkConsole window  type the command mol new lubq pdb and hit  enter     As you can see  this command performs the same function as described at the beginning  of Basic Protocol 1  namely  loading a new molecule with file name lubq  pdb     If you see the error message Unable to load file       lubq pdb       using  file type   pdb    you might not be in the correct directory that contains the file  lubq pdb  You can use the standard Unix commands in the VMD TkConsole window to  navigate to the correct directory     When you open VMD  by default a vmd console window appears  The vmd console window  tells you what s going on within the VMD session that you are working on  Take a look at  the vmd console window  It should tell you a molecule has been loaded  as well as some  of its basic properties like number of atoms  bonds  residues etc  The Tcl commands that  you enter in the VMD TkConsole window can also be ent
288. unless noted otherwise   In order to create the starting structure run       chi create  verbose    This is a fast process  taking a few minutes   The output files will be       MyProtein variantA variantA psf    MyProtein variantA variantA  pdb    MyProtein variantA chi_create log    MyProtein variantA results create out     All of these files are accessory files that CHI uses One might want to search for an error  in the log file by issuing the following command         grep  i err chi create log   more    Current Protocols in Bioinformatics    18  Run the searching algorithm  to create all the structures  using the following com   mand         chi search  verbose    The number of structures is     end E start      x handedness x trials    increment  Uae x2x4 288  10    This process is time consuming  typically many hours   As an example  simulating a bundle  of 5 helices  each composed of 28 amino acids  takes  20 to 30 min per structure  on a  DEC Alpha 433 AU  a relatively slow machine nowadays      This will produce the following output       MyProtein variantA results search out    which contains the results of the simulation  energy and orientational parameters for each  structure  and the pdb for each structure simulated     The names of the pdb files are as follows       MyProtein variantA results left i j pdb    where i is the initial angle of rotation and j is the trial number   Right handed structures will be designated similarly       MyProtein variantA results 
289. uts  variable   prints out the value of variable  Also   variable refers to the value of variable   3  Try the expr command by entering the following lines in the VMD TkConsole  window   expr 3   8  set x 10  expr    3    x  The expr command performs mathematical operations   expr expression   evaluates a mathematical expression   4  Entering the following example in the VMD TkConsole window   set result  expr    3    x   puts Sresult  By using brackets  you can embed Tcl commands into others  A bracketed expression will  automatically be substituted by the return value of the expression inside the brackets    expression    represents the result of the expression inside the brackets   Modeling  Structure from  Sequence  5 7 23    Current Protocols in Bioinformatics Supplement 24    BASIC  PROTOCOL 6    Using VMD  An  Introductory  Tutorial    5 7 24       Supplement 24       5  Let us calculate the values of    3 x for integers x from 0 to 10 and output the results  into a file named myoutput dat     set file  open       myoutput dat       w   for  set x 0    x  lt   10   incr x     puts  file  expr    3    x         close  file    Here  you have tried the loop feature of Tcl  Tcl provides an iterated loop similar to the     for    loop in C  The for command in Tcl requires four arguments  an initialization  a  test  an increment  and the block of code to evaluate  The syntax of the for command is     for  initialization   test   increment   commands     Take a look at the ou
290. w currentprotocols com     then modify it using the following steps     a  Start with saturated red     RasMol gt  color  255 0 0     b  Try a little more green to get bright orange     RasMol gt  color  255 100 0     c  Now raise everything by 50 to get a lighter color  except red  because it is already  at the maximum of 255      RasMol gt  color  255 150 50     d  Raise by 50 more to get the pastel peach   RasMol gt  color  255 200 100     Combinations of representations   The best picture for most applications will be composed of a number of different rep   resentations  For instance  the overview representation shown above uses backbones  for the proteins and spacefilling representations for the ligands  The backbones are  simple  showing at a glance the whole structure and the relationships between the pro   tein chains  The ligands  however  are small  so the bulky spacefilling representation is  used to make sure that they stand out in a complex structure  Most molecular graph   ics programs give considerable flexibility in the modification of these representations   For instance  it is possible to vary the diameter of the cylinders used in wireframe    Current Protocols in Bioinformatics    Modeling  Structure from  Sequence    5 4 19       Supplement 11    representations and add small balls at the atom positions  to help distinguish different  parts of the structure  One way to improve the clarity of a given picture is to stick to a  common representation for each 
291. xample tar gz  Unix Linux  or  http   salilab org modeller tutorial basic example zip  Windows     Current Protocols in Bioinformatics        gt P1 TvLDH  sequence  TVLDH       0 00  0 00                                  EGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG                 MSEAAHVLITGAAGOIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKA  AFKDIDCAFLVASMPLKPGOVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPEN  FSSLSMLDONRAYYEVASKLGVDVKDVHDIIVWGNEHGESMVADLTOATFTKEGKTOKVVDVLDHDYVFDTFFKKI          GHRAWDILEHRGFTSAASPTKAAIOHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDRI    EGKIHVV          Figure 5 6 2 File TvLDH ali  Sequence file in PIR format     Background to TVLDH   A novel gene for lactate dehydrogenase  LDH  was identified from the genomic sequence  of Trichomonas vaginalis  TVLDH   The corresponding protein had higher sequence sim   ilarity to the malate dehydrogenase of the same species  TVMDH  than to any other LDH   The authors hypothesized that TVLDH arose from TvMDH by convergent evolution rel   atively recently  Wu et al   1999   Comparative models were constructed for TvLDH and  TvMDH to study the sequences in a structural context and to suggest site directed muta   genesis experiments to elucidate changes in enzymatic specificity in this apparent case  of convergent evolution  The native and mutated enzymes were subsequently expressed  and their activities compared  Wu et al   1999      Searching structures related to TVLDH  Conversion of sequence to PIR
292. y between the  six possible templates  file compare   py   Fig  5 6 5      In compare  py  the alignment object aln is created and MODELLER is instructed  to read into it the protein sequences and information about their PDB files  By default   all sequences from the provided file are read in  but in this case  the user should re   strict it to the selected six templates by specifying their align codes  The command  malign    calculates their multiple sequence alignment  which is subsequently used as  a starting point for creating a multiple structure alignment by malign3d    Based  on this structural alignment  the compare_structures    command calculates the  RMS and DRMS deviations between atomic positions and distances  differences between  the main chain and side chain dihedral angles  percentage sequence identities  and sev   eral other measures  Finally  the id table    command writes a file    amily mat   with pairwise sequence distances that can be used as input to the dendrogram    command  or the clustering programs in the PHYLIP package  Felsenstein  1989    dendrogram   calculates a clustering tree from the input matrix of pairwise dis   tances  which helps visualizing differences among the template candidates  Excerpts  from the log file  compare   1og  are shown in Figure 5 6 6     The objective of this step is to select the most appropriate single template structure  from all the possible templates  The dendrogram in Figure 5 6 6 shows that 1civ A and  7mdh A 
293. y of the given molecule is updated when  using animation tools described in Basic Protocol 3     Finally  Drawn flag  D  indicates if the given molecule is displayed in the OpenGL window   Let us try out the Top and Drawn flags     Make sure no molecule is fixed  By default  the last molecule loaded in the VMD  is the top molecule  so you can check and see that there is a    T    displayed for the  E  coli aquaporin in the VMD Main menu     6e  0     Reset the view by pressing the key on the keyboard while keeping the OpenGL  Display window active  Note that the yellow E  coli aquaporin is now placed in the  center of the OpenGL Display window     Switch the top molecule by double clicking on the empty  T  flag for the human  aquaporin molecule in the VMD Main menu  A    T    should appear for the human  aquaporin  while the  T  for E  coli disappears  Go to the OpenGL Display window  and reset the view again  You can see that this time the red human aquaporin is placed  in the center of the OpenGL Display window     In the VMD Main menu  try hiding a molecule by double clicking on its  D  flag   You can display the molecule again by double clicking its    D    flag again     Aligning Molecules with the measure fit Command    When you look at your OpenGL Display window  you can see that the two aquaporins  are very similar in structure  But it is difficult to detect their slight structural differences  as the two proteins are placed apart  We will now try out a very useful Tc
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
Philips MP4 player SA4VBE04KF  Supersoft C Compiler Table Of Contents Introduction  O2 Xda III  Samsung GT-I9152 Керівництво користувача  X-Treme XB-700Li Owners Manual    Copyright © All rights reserved. 
   Failed to retrieve file