Home
        ParsePDB.pm
         Contents
1.          Atoms       gt           AltLoc    gt     A       gt AtomNumber      gt     3        gt AtomType      gt     CA       gt ChainLabel      gt     A           14        gt InsResidue      gt              Occupancy      gt     0 0       gt Race      gt     ATOM       gt ResidueLabel      gt     ALA       gt ResidueNumber      gt     1           Rest       gt   GPa     Temp    gt   27 84          x        gt    10 309          y    gt   53 910        z    gt   25 295                 AltLoc    gt     A       gt AtomNumber      gt     4        gt AtomType      gt     0       gt ChainLabel      gt     A       gt InsResidue      gt               Occupancy      gt     0 0       gt Race      gt     ATOM       gt ResidueLabel      gt     ALA       gt ResidueNumber      gt     1          Rest       gt      O01    gt Temp      gt     27 80       x    gt    9 414        gt y    gt   53 366    z    gt   24 654                 InsResidue      gt            ResidueLabel      gt     ALA          ResidueNumber      gt   1       With  ResidueIndex it is easy to loop over the whole chain residue per residue and even over    each atom in the residue    ResidueIndex    PDB  gt Get  Chain   gt  0  ResidueIndex   gt  1      foreach  Residue   ResidueIndex     print   Residue  gt  ResidueLabel  n    print   Residue  gt  ResidueNumber  n    print   Residue  gt  Phi   Residue  gt  Psi  n      foreach  Atom      Residue  gt  Atoms        print  Atom  gt  AtomNumber     n    print  Atom  gt  Chain
2.   6 Count Number of Subgroups of the Protein    7 Retrieve a Part of the PDB with   gt Get    7 1 Parametersfor  gt Get             a eee eee eee eee  7 2 Internal Versus External Identifiers                   00     8 Write the Whole or Parts of the PDB  9 Retrieve Certain Information About the Protein    10 Renumbering Entries in the PDB    10 1 Renumbering Inserted Residues                  00000   10 2 Jenoring the TER yae taste  a Bate Soe eve e eg Baha Pata Ne    11 Generating CHARMM input files    hal     Whataitdoesi2 t  tere St aoa ht cee ear ee  P te ee BS awa  T12 Whatt does  notado asne stath ie a ec Fok ee RE Pe aa    12 Filtering the Data    12 1 Keywords for filtering actions       Ravens oe eared a  12 2 Inserted Residues        a aooaa a a  12 3 Alternative Atom Locations      oaoa a a a    13 Other methods    PSS GetRASTA  10 4  gle aa a dk it BE a a ky eR adele E  13 2 gt  SsWriteFASTA   4 202s a    dsc Qa ee ke SE OOK ae Ses  P3 3  SAmingACidConverts eee  20 Nok toned Woe le aya DE ee ae  oA  P34   SFormathines lt  ce 2  2 has ttn hk ee oe Be eae ee ee ei    14 Speed issues    15 Error handling    10  10  16    17  18    21  22  22    23  23  23    24  24  25  26    27  27  27  27  27       1 Foreword    Despite the fact that there are several packages around with the ability to parse PDB files and do  funny things with them  e g  BIOPERL   it looked like there was a lack of really easy ones  Driven  by the need of breaking a protein into its subgroups  mo
3.   It might also be that it needs to be counted to the first residue  usually    it is numbered as residue 0   In that case  ResidueStart   gt  0 can be given   e Prosthetic Groups    Sometimes PDB files contain coordinates for non peptide groups  crystallographic waters  haem  groups  metal ions  etc   CHARMM can deal with these  but it can be very difficult to do  If  you need to include them  split them up into their own PDB files  That means that you might  even have to split up a block of HETATMs into several files  e g  waters  haem groups and other  stuff         23       e Disulphide bridges  PDB files hold information on disulphide bridges with the SSBONDS keyword  CHARMM    ignores this  You must specifically add these using the PATCH command  for example  PATCH DISU prot 2 prot 11  creates a disulphide bond between residues 2 and 11 of the protein with the segment id    prot        CAUTION  The residue number might have changed since they are auto  matically renumbered  so that they start counting at 1     e Protonation state    You should consider carefully the protonation state of your titratable residues  e g  a histidine  could be protonated  in which case it should be changed from HSD to HSE  Assuming residue  29 in a segment named    prot    was a histidine that should be protonated  the following patch  command would accomplish this    PATCH hs2 prot 29   rename resn HSE sele resi 29 end   There   s no easy way to decide if a residue should be modified  One w
4.   The parser provides the possibility to handle errors via the try catch otherwise methodology  The  possible errors are divided into IO  error opening  closing  writing a file  etc    Config  wrong param   eters given to a routine  and PDB  error due to crappy PDB format   The syntax for the try catch  block is as follows  It is quite picky  note the semicolon at the end and that no semicolon is after the    other blocks  it is actually only one single commmand      try     PDB   ParsePDB  gt new  FileName   gt   File     PDB  gt Parse      catch Exception   Config with     Error   shift   print  Error parsing the file  n  Error      catch Exception  10 with     Error   shift   print  An I O Error occurred  n  Error      catch Exception  PDB with     Error   shift   print  Error parsing the file  n  Error         finally    exit  1      3       32    
5.  be given to the routine     which returns the formatted line that can be written to a PDB      AtomIndex    PDB  gt Get  Model   gt  0  Chain   gt  0  AtomIndex   gt  1      foreach  Atom    AtomIndex           27         some fancy if condition here or changes to the atom hash   Line    PDB  gt FormatLine  Atom   gt   Atom    print PDB  Line     14 Speed issues    There are a few things you should know  if speed is an issue for your work  As long as you work  with a single file of a normal size  the way how you use the parser will not make a big difference  As  soon as you have to process a load of PDBs  with ten thousands of atoms  you might want to consider    some facts of the way the PDB is treated during the parsing     First of all  remove everything you do not need  If you do not process HETATM  SIGATM  ANISOU  or SIGUJJ entries  remove them even before parsing with the respective switches of   gt new or the  respective   gt SetVariable methods  This speeds up the parsing  even if none of these atoms are  present  since the regular expressions which determine an atom become significantly smaller  And    RegExes are pigs when it comes to speed       The use of external numbers is an absolute no no in speed critical programs  Each external param   eter is converted to the respective internal one prior to processing the request  This is reasonably fast  for models and chains  but comes down to a loop over the atom lines for a residue or an atom number   until the correc
6.  can be accessed either via an array  by looping over them  or via a hash using their  atom types as keys      Content    PDB  gt Get  ResidueNumber   gt  12  ResidueIndex   gt  1     Content  0           Atoms      gt  array reference with the parsed atom lines     gt AtomTypes      gt  hash reference with the parsed atom lines    gt InsResidue               Phi          120 23      Psi       7104 12       gt ResidueLabel         ALA       gt ResidueNumber         12            Note that the phi and psi angles are not computed automatically  In order to have them in   ResidueIndex    gt GetAngles  without any parameters  has to be executed once  The angles    are then calculated for the whole file and saved     In the above example a single residue is retrieved  having the external number 12  and there   fore only one array element is returned  This can of course be used for a whole chain to con     veniently loop over each residue and each atom within it  The atoms are available in the array       13       Atoms which comes in handy if every single one needs to be accessed one after another  If par   ticular atoms are needed  like only the C  carbon for example  they can be accessed via the  AtomTypes hash  The same can be achieved directly via   gt Get by requesting a specific atom  type in a residue  however  if this is needed for every residue in a protein the approach via the    ResidueIndex is by orders of magnitudes more efficient     The hash AtomTypes is only genera
7.  gt  value  Renumber the returned ATOM and HETATM lines sequentially starting at value or 1 if no value  is given  See also RenumberAtoms    e KeepInsertions   gt  0   1  If set to 0  the insertion codes of residues are considered during renumbering the residues   Please read    Filtering the data    for more information    e SetChainLabel   gt           A      B     Changes the chain ID of all returned ATOM  SIGATM  ANISOU  SIGUIJ and HETATM lines   By giving a blank the chain label can be removed     CAUTION  This function should be used carefully  It does not care about multiple chains and  will simply change every ID to the given value  To    renumber    the chains use the     ChainStart        keyword instead     e PDB2CHARMM   gt  1  Formats the returned array for the use with CHARMM  Read further down for more informa   tion    e CHARMM2PDB   gt  1  For the use with CHARMM created PDBs  replaces the atom types of CHARMM with the stan   dard PDB labels  Read further down for more information    e AtomLocations   gt  First   All   None      A         B      etc   If alternative atom locations for an atom are available  return either only the first  all or none of  them  The switches are not case sensitive   If First    is requested  the filter returns all atoms with location    A    as well as the ones  which  have NO alternative location     If you enter    A     ONLY atoms with altLoc    A    are retrieved  If you want to get also the ones with    no altLoc  you have t
8.  retrieve the    respective PDB lines     Although   gt Get can handle external identifiers  the access is much more efficient  faster   via the    internal ones  since the former have to be translated before the content can be retrieved      Chain2    PDB  gt Get  Model   gt  0  Chain   gt  0      7 1 Parameters for   gt Get    The following Parameters can be used to request a specific part of the protein from   gt Get  including  several possibilities to change it to certain needs   e Model   gt  O   1  2        internal value   ModelNumber   gt  1   2  3        external value from the PDB     The number of the model  The available model identifiers can be retrieved using   gt IdentifyModels    and   gt IdentifyModelNumbers   e Chain   gt  O0   1  2        internal value   ChainLabel   gt     A         B       gt            external value     The number of the chain  The available chain identifiers can be retrieved using   gt IdentifyChains   Using the latter method is only possible  if the chain IDs have been checked successfully and  no duplicate or missing IDs have been found  This means that  if you want to access the chains    via their ID  you have to add an if condition to check whether you can do so or not    PDB  gt ChainLabelsValid        returns true if the chain IDs are OK       returns false if missing or multiple chain IDs have been detected     To get the    real    chain IDs  use   gt IdentifyChainLabels  to get the ID for a particular chain     use   gt Get
9. ChainLabel  see further down     e Residue   gt  1   2  3        internal value   ResidueNumber   gt  1   2  3        external value   Returns the ATOM lines of a particular residue   e ResidueLabel   gt     ALA       Returns only alanine residues       10       Atom   gt  1  2  3        internal value    AtomNumber   gt  1  2  3        external value    Returns the ATOM line of a particular atom   AtomType   gt     CA      Returns only CA atoms  To distinguish between a carbons and calcium  enter    Ca    for the  latter  To retrieve only carbons  use Race   gt   ATOM  which filters out all HETATMs    Element   gt     C         0          N      P    Return only carbons  oxygens  etc  This will also get    CA    or    OXT     To refine the filter pattern   you need to edit the filter variables at the very beginning of ParsePDB pm    Header   gt  0   1    Include the header true false  If a parsed content is requested via the AtomIndex keyword  the  first six characters of the line are stored as Race  similar to atoms  and the remaining columns    are stored in Rest   By default the header is NOT included   MinHeader   gt  0   1    Include just a minimal header true false  This is false by default  If set true  only lines begin   ning with HEADER  TITLE or COMPND are returned  This command overrides the value of  HeaderRemark  that is  no remark will be added to a minimal header  If the choice of lines  needs to be changed  this can be done in ParsePDB pm at the beginning of 
10. ChainLabelsValid returns true before  accessing the chains via letters  otherwise you can only use numbers until the IDs have been    corrected with   gt RenumberChains    PDB  gt SetChainLabelAsLetter  0     e ChainSuffix   gt    c   default   c   Defines the suffix that is added to the base name by   gt WriteChains   PDB  gt SetChainSuffix    c      e ModelSuffix   gt    m   default   m   Defines the suffix that is added to the base name by   gt WriteModels   PDB  gt SetModelSuffix    m      e HeaderRemark   gt  0   1  default 1     If enabled  a remark that the file has been changed by the parser and the header and footer  information might not be valid any more is added to the header  This is done be   gt Get and      gt Write  but not if the header is requested using the method   gt GetHeader     By default  the remark lines are added either after HEADER  COMPND or TITLE  depending  on which line is found last  That is  the comment will be inserted as the first REMARK lines  If  another position is needed  e g  directly after the HEADER line  then this has to be changed at  the beginning of ParsePDB pm under default values      PDB  gt SetHeaderRemark  1      e AtomLocations   gt  First Al1 None    A       B     etc   default    A11      Tells   gt Get and   gt Write globally how atoms with alternative atom locations are to be handled   Please see      gt Get    for more information  If you do not need the alternative locations at all  you    can use RemoveAtomLocations 
11. Label     n           The TER is not counted as part of the residue  that is it will not be included in the last residue    of a chain     The filter arguments are accumulative and can be freely combined to filter out all CA atoms in all  alanine residues for instance  They are just working on the atom lines  all other lines like MODEL     TER  etc  are also returned     The   gt Get routine returns the whole PDB with the MODEL ENDMDL tags and one single Model  without them but with the TER terminators of the chains        15       If the    Model    or    Chain    parameter is omitted  it depends what happens     e no Model  no Chain     returns the whole file  without header and footer     e Model  but no Chain     the whole model with all chains is returned    e Chain  but no Model   Model is assumed as    0     e g  if no MODEL tags are there at all    7 2 Internal Versus External Identifiers    As previously mentioned  the use of internal identifiers  see 3  is preferable  First of all it is easier to  program as e g  the ResidueNumbers or AtomNumbers have to be retrieved at first to loop over them  and extract single objects  And second of all for speed reasons  as it is quite elaborate for the parser    to interconvert external and internal identifiers     Atoms and residues always have an external number  however  PDBs with invalid numbering schemes  are even found in the Protein Data Bank  The worst case are double atom numbers  in which case  only the first atom is 
12. ParsePDB pm    Package  ParsePDB pm   Author  Benjamin Bulheller   Mail Address webmaster  at  bulheller com   Website http    comp chem nottingham ac uk parsepdb      http    www bulheller com  Research Group  Prof  Jonathan D  Hirst   School of Physical Chemistry   University of Nottingham  Funded by  EPSRC  Date  November 2005     November 2008    Acknowledgments  Special thanks to Dr  Daniel Barthel for many   many discussions and help whenever needed     Licence    Copyright    2009 Benjamin Bulheller  www bulheller com    This program is free software  you can redistribute it and or modify it under the terms of the GNU  General Public License as published by the Free Software Foundation  either version 2 of the License     or  at your option  any later version     This program is distributed in the hope that it will be useful  but WITHOUT ANY WARRANTY   without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PUR   POSE  See the GNU General Public License for more details     You should have received a copy of the GNU General Public License along with this program  If not     see http   www gnu org licenses      University of Nottingham    Contents   1 Foreword   2 Installation   3 Nomenclature and Naming Conventions    4 Initialization of a PDB Object    4 1 Parameters explicitly for   gt new 2 4 24 ware 2 SSF SAS SEGRE MES  4 2 Changeable Parameters for  gt new             0 005 ee eee    5 Identify Subgroups of the Protein for the Use in Loops
13. as used during the development of the PDB parser was 0 15   After installation  the parser can be used in a script by including the library with the use command   use ParsePDB     The function use searches Perl   s library path for the given package  If you do not want to install the  package globally on your system  for example if you do not possess root permissions   then you can  also copy the  pm file to a folder in you home directory  If you for instance collect your packages in      bin perllib   you can add this directory via    use lib   ENV HOME  bin perllib    use ParsePDB     This would also work for Error  pm           Another nice trick is to add the folder the executed script is actually living in  The package FindBin  is usually included in standard perl installations and sets the variable  Bin to the folder of the script     which can then be included via lib     use FindBin qw  Bin    use lib   Bin      3 Nomenclature and Naming Conventions    For the access to the single elements of the PDB  models  chains  residues  atoms or even specific  atoms of a certain type  there are some naming conventions which are followed throughout the    parser  It is important to differentiate between two things     e external values    These values are read directly from the PDB  This means that the first item in a list  a residue or  an atom  might not necessarily start at 1  be sequential or that the values change after renum   bering the file  Chains might not be accessibl
14. ay is to check if there are  any H bond donors or acceptors close to specific atoms  e g  ring nitrogens in histidine   There    may also be comments in the PDB file     12 Filtering the Data    12 1 Keywords for filtering actions    The returned data of   gt Get and   gt Write can be filtered using the keywords Model  ModelNumber   Chain  ChainLabel  Residue  ResidueNumber  ResidueLabel  Atom  AtomNumber  AtomType  Element    and Race  Please see   gt Get for more information     ATOM 554 CA ILE B 4 55 013 57 563 15 473 6 00 5 58                                        Residue  internal              ResidueNumber  external                       Chain  internal      fi   ChainLabel  external          l   l     ResidueLabel   gt      ILE                  AtomType   gt     CA          Element   gt     C              Atom  internal      AtomNumber  external    Race       24       12 2 Inserted Residues    If evolution has found it clever for some reason to insert some residues in a protein  we   re left with a  mutant that has the same amino acid sequence as its    parent    except for a few residues somewhere in  the middle  To maintain comparability  these additional residues often have the same residue number    with an additional letter to distinguish between them     When renumbering the residues with the parameter    KeepInsertions   gt  0     the insertion codes  will be removed and all residues numbered with a different number     KeepInsertions   gt  1   de     fault  wi
15. dard  header line is added before a new chain begins and a newline character is added if the sequence line    with the one letter codes of the amino acids is 80 characters in length     The method only processes a chain if there are ATOMS in it  that is  HETATM chains are ignored  If a    residue label cannot be converted into the one letter code  a warning is issued by AminoAcidConvert      FASTA    PDB  gt GetFASTA  Model   gt  0      13 2   gt WriteFASTA    Writes a FASTA file of the protein  If a file name is provided  the extension  fasta is added  If the file  name is omitted  the base name of the original PDB file is taken  The method returns the array with  the FASTA lines  if they are needed for anything else  and if only for checking whether anything was    written at all        gt WriteFASTA  Model   gt  0  FileName   gt   4mbn    13 3   gt AminoAcidConvert   Converts a 1 letter code into a 3 letter code and vice versa    Code    PDB  gt AminoAcidConvert   Code       13 4   gt FormatLine    Returns a string in PDB format from a given atom hash  For many problems working with the Atom   Index is by far easier and faster than retrieving the complete PDB line  e g  for more complex filtering  actions than possible via   gt Get  However  if a PDB needs to be written  retrieving the formatted PDB  line as well is circumstantial and time consuming  especially if it needs to be tweaked somehow  In  such a case  the filtered and or changed atom hash from the atom index can
16. dels and chains   ParsePDB has been coded  with the intention to create a package that is powerful enough to handle PDBs with a fair amount of  functions but is still easy to handle     Keeping the complexity at a minimum  a protein can be read  parsed and its chains written into  single files with just three commands  which are as easy as new  Parse and WriteChains  Given  certain parameters  the atoms and residues can be counted  renumbered and filtered  i e  just certain  elements or residues can be extracted   Most of the command names are designed in such a way that  they may take a little time to type  but are easy to remember and meaningful when being read     The PDB parser is an integral part of the web interface DichroCalc  which can be freely used at  http    comp chem nottingham ac uk dichrocalc     Benjamin Bulheller    2 Installation    The PDB parser itself is a Perl package  indicated by the extension  pm  To install the package globally  on your system  you can use the provided makefile to copy it to your library path  To do this  login  as root and follow the standard routine       perl Makefile PL  make  make install    Since the package uses Error  pm to handle exceptions  this package needs to be installed  too  If it is  not already installed then make will issue a warning message about that  Error  pm can be found at  http   search cpan org  by searching for    Error     was written by Graham Barr and is maintained    by Shlomi Fish  The version which w
17. e keyword    The available numbers can be retrieved via   gt IdentifyResidues and   gt IdentifyResidueNumbers  and the returned values can be fed into   gt Get via Residue   gt   value  orResidueNumber   gt    value  respectively      ResidueLabel    PDB  gt GetResidueLabel  Residue   gt  2       ResidueLabel    PDB  gt GetResidueLabel  ResidueNumber   gt  2         18       e   gt GetAtom  Returns the internal number of an external AtomNumber    Atom    PDB  gt GetAtom  AtomNumber   gt  17    e   gt GetAtomNumber  Returns the external number of an internal atom number    AtomNumber    PDB  gt GetAtomNumber  Atom   gt  0    e   gt GetAtomType    Returns the type of the atom with a given atom number  The available numbers can be re     trieved via   gt IdentifyAtoms and the returned value can be fed into   gt Get via AtomType   gt      type       AtomType    PDB  gt GetAtomType  Atom   gt  2     internal     AtomType    PDB  gt GetAtomType  AtomNumber   gt  3     external    e   gt GetElement    Returns the element of the atom with a given atom number  The available numbers can be  retrieved via   gt IdentifyAtoms and the returned value can be given to   gt Get viaElement   gt      element     Element    PDB  gt GetElement  Atom   gt  5      If you plan to use the ResidueIndex or AtomIndex later on and need the Element  you can call  the routine without parameters  which will save the element information to the main hash  that    is it will then be available in the returned Re
18. e via their external ChainID  in case no external    ChainID is given  which is quite common      Overview of the numbers and labels of a PDB entry and their    name    in the parser     ModelNumber  external   Model  internal           MODEL 1  ATOM 1 N LYS A 1  15 872 7 811 19 851 1 00 76 73  ATOM 2 C LYS A 1  15 332 7 443 18 561 1 00 99 86  ATOM 3 CA LYS A 1  14 650 6 096 18 757 1 00 72 69               l                      Residue  internal     l     ResidueNumber  external                      Chain  internal           ChainLabel  external                  ResidueLabel   gt     LYS                AtomType   gt     CA         Element   gt     C            Atom  internal   Race AtomNumber  external     internal numbers    Each item  model  chain  residue  atom  can be accessed via its sequential number in the domain   starting at 0   This number will never change for models or chains  although there is some    logical ambiguity for atoms and residues  Residue 0 in the second chain is also Residue 20  if          no chain is specified and the first chain contains 20 residues  taking into account that counting  starts at 0   If a Residue is specified  the atom number is relative to that residue and can thus    change for that very atom  if no chain or no residue is given     Although it was at first thought to be a nice idea to let internal numbers start at 1  it turned  out to be much more versatile to start at 0  like everything else in perl does  too  That way  for  ins
19. esidues like metals or water to ensure the comparability of returned arrays  that for the same    search parameters like Model 0  Chain 0  a certain index will always belong to the same residue      gt IdentifyResidueNumbers    Returns an array with all external residue numbers  including the inserted residue tag if present    Beware of multiple numbers due to the restart of the numbering in every model  or even in  every chain  depending on how crappy the file is   To be on the safe side  always specify  model and chain or use   gt RenumberResidues prior to   gt IdentifyResiduesNunbers  If model    is omitted  0 is taken as default value  if no chain is specified  all chains are processed    ResidueNumbers    PDB  gt IdentifyResidueNumbers  Chain   gt  0       gt IdentifyAtoms   Returns an array with all internal atom numbers of the requested model  chain or even residue    AllAtoms    PDB  gt IdentifyAtoms  Model   gt  1  Chain   gt  0      gt IdentifyAtomNumbers   Returns an array with all external atom numbers     A11AtomNumbers    PDB  gt IdentifyAtomNumbers  Chain   gt  0       gt IdentifyAtomTypes    Returns an array with all available atom types  e g     CA        CB        O     that can be used to filter the  atoms with   gt Get  AtomType   gt            AllAtomTypes    PDB  gt IdentifyAtomTypes  Chain   gt  0      gt IdentifyElements    Returns an array with all available atom elements  e g     C     N        O     that can be used to filter the  atoms with   gt Ge
20. eters for   gt new are divided into two groups  The first one consists of parameters  which  can only be given to   gt new directly  while the others can also be changed after the initialization of  the object  The default values of all of the following switches can be altered in ParsePDB pm at the    very beginning of the code under    Default Values      e FileName   gt   file pdb   The PDB file including path  The extension  pdb can be omitted   e NoHETATM   gt  0   1  default 0     If set to    1     HETATM lines will be filtered out before parsing the file  This can be handy  if you  do not process HETATMs anyway and can save several checks  whether a chain contains any  ATOMS at all     e NoANISIG   gt  O   1  default 0     If set to    1     SIGATM  SIGUJJ and ANISOU lines will be filtered out before parsing the file   If you do not process these atoms  it saves processing time  each atom needs to be compared    against two strings only instead of five  and avoids checks for you           4 2 Changeable Parameters for   gt new    All the following parameters can be given to   gt new or alternatively changed later on in the program  using one of the   gt SetVariable methods  This is mainly useful to avoid a   gt new command that  needs three lines to be viewed entirely      e ChainLabelAsLetter   gt  0   1  default 0     Tells   gt WriteChains whether the exported file names should be named with the number of  the chain or the actual chain ID letter  Check whether   gt 
21. he parameter FileName is mandatory  if it is omitted     the method throws a NoFile error     Sometimes it is required to split a protein into its models or chains  Since that is a standard task  there    are two methods for it    AllModels    PDB  gt WriteModels    extract all models in single files   AllChains    PDB  gt WriteChains    extract all chains in single files    Most parameters of these two methods are identical with   gt Get  Header and Footer are included by    default  If the parameter    FileName    is omitted  the base name of the PDB is taken     The suffix defined via   gt new  ModelSuffix or ChainSuffix  is added plus the chain or model number     for example     file_ci pdb  file_c2 pdb  file_c3 pdb    An array containing the names of the created files is returned       gt WriteChains can also write the chain IDs as letters instead of numbers  Numbers are the default  if  letters are desired  ChainLabelAsLetter has to be set to 1 via   gt new   or   gt SetChainLabelAsLetter   If   gt ChainLabelsValid returns false  the setting is ignored and chains can only be accessed via their    sequential numbers given by   gt IdentifyChains     If no Model is given to   gt WriteChains     0    is taken by default  The routine can just process one  model at a time  To loop over all models  the information provided by the method   gt IdentifyModels  can be used  When all models are processed  something like  FileName   gt    BaseName  Model     should be defined as fi
22. ith increasing age  sorry about that    A11Chains    PDB  gt IdentifyChains  Model   gt  0    e   gt IdentifyChainLabels    Returns an array with the chain IDs of the chains in a certain model  e g      A        B        C     if three    chains are present  If no model is given  0 is taken by default     Check whether   gt ChainLabelsValid returns true before accessing the chains via letters  oth     erwise you can only use numbers until the IDs have been corrected with   gt RenumberChains     It is strongly recommended to use the numbers given by   gt IdentifyChains to access the chains    rather than using letters    Al11Chains    PDB  gt IdentifyChainLabels  Model   gt  0    e   gt IdentifyResidues  Returns an array with all internal residue sequence numbers    AllResidues    PDB  gt IdentifyResidues  Model   gt  0  Chain   gt  0      e   gt IdentifyResidueLabels          Returns an array with all external residue labels  e g  ALA  TYR          ResidueLabels    PDB  gt IdentifyResidueLabels  Model   gt  0  Chain   gt  0      This array represents the sequence of the amino acids in the requested chain  If one letter codes    are preferred  the parameter OneLetterCode may be set to 1    ResidueLabels    PDB  gt IdentifyResidueLabels  Model   gt  0   Chain   gt  0  OneLetterCode   gt  1      Mind  that OneLetterCode only makes sense  when no hetero atoms are in the PDB  that is  for example  NoHETATM is set to 1  The method will nevertheless return undef for    unknown     r
23. iven to   gt Get  it is possible to access and use these readily  processed entires directly  please see   gt Get for more information  The line is cut into pieces  according to the following scheme  taken from the Protein Data Bank Contents Guide  version  23i     Columns Field Name  1 6 Race   7 11 AtomNumber  13 16 AtomType   17 AltLoc   18 20 ResidueLabel  22 ChainLabel  23 26 ResidueNumber  27 InsResidue  31 38 x   39   46 y   47 54 Z   55   60 Occupancy  61   66 Temp   67   80 Rest          The field names of the above table can be given to the various methods to filter the contents     for example to retrieve only atoms with a certain AtomType   e The chain IDs are checked whether    every chain has an ID     no duplicate IDs are found within one model     Missing or duplicate chain IDs cause a warning message that this can be corrected using   gt Renumber   Chains  In this case  the chains can only be accessed via their internal number and the parser    does not use or accept the real IDs at all   The chain IDs are processed case sensitive   Have a look at 1FNT to see that this is really    necessary        If the same PDB has to be read again  after changing something in the file  to return to the original  version after having renumbered something or for other reasons that require the object to be updated     this can be done using   gt Reset  the object will then be re parsed automatically       PDB  gt Reset      4 1 Parameters explicitly for   gt new    The param
24. le name   The following  additional  parameters can be given to   gt WriteModels and   gt WriteChains   e ModelSuffix   gt    model      To change the default suffix     m        e ChainSuffix   gt    chain         17       To change the default suffix     c      e PDB2CHARM   gt  1  To write CHARMM Input Files  Read further down for more information   e CHARMM2PDB   gt  1  To convert CHARMM generated PDB format to the PDB standard  Read further down for    more information     Retrieve Certain Information About the Protein    e   gt GetModel  Returns the internal number of an external ModelNumber   Model    PDB  gt GetModel  ModelNumber   gt  4    e   gt GetModelNumber  Returns the external number of an internal Model   ModelNumber    PDB  gt GetModelNumber  Model   gt  1    e   gt GetChain  Returns the internal number of an external ChainLabel   Chain    PDB  gt GetChain  Model   gt  1  ChainLabel   gt     A       e   gt GetChainLabel  Returns the    real    chain ID for a particular chain  undef if no ChainLabel is set    ChainLabel    PDB  gt GetChainLabel  Model   gt  1  Chain   gt  1    e   gt GetResidue  Returns the internal number of an external ResidueNumber    Residue    PDB  gt GetResidue  Model   gt  1  ResidueNumber   gt  5    e   gt GetResidueLabel    Returns the label of the residue with a given residue number  usually the amino acid   The  number can be either the external one indicated by the keyword ResidueNumber    or the internal   one by giving the Residu
25. ll keep the code and the same residue number     ATOM 157 N ARG 36  ATOM 158 CA ARG 36  ATOM 159 C ARG 36  ATOM 160 O ARG 36  ATOM 161 CB ARG 36  ATOM 162 CG ARG 36  ATOM 163 CD ARG 36  ATOM 164 NE ARG 36  ATOM 165 CZ ARG 36  ATOM 166 NH1 ARG 36  ATOM 167 NH2 ARG 36  ATOM 168 N SER 36A  ATOM 169 CA SER 36A  ATOM 170 C SER 36A  ATOM 171 0 SER 36A  ATOM 172 CB SER 36A  ATOM 173 OG SER 36A  ATOM 174 N GLY 36B  ATOM 175 CA GLY 36B  ATOM 176 C GLY 36B  ATOM 177 O GLY 36B  ATOM 178 N SER 36C  ATOM 179 CA SER 36C  ATOM 180 C SER 36C  ATOM 181 0 SER 36C  ATOM 182 CB SER 36C  ATOM 183 OG SER 36C       Inserted residues A  B  C  column 27      498 21 150 28 984   066 21 087 27 635   153 20 325 26 712   720 19 206 27 052  437 20 435 27 542   490 21 213 26 764  474 20 233 26 120   840 19 039 26 969   566 17 748 26 564   793 17 540 25 474   990 16 634 27 226   771 21 064 25 675   791 20 697 24 655   435 20 746 23 257   539 21 844 22 669   734 21 806 24 772   709 21 554 23 828   196 19 668 22 952   090 19 571 21 796   416 20 232 22 102   103 19 879 23 089   580 21 333 21 367   647 22 291 21 565   085 23 567 22 177   667 24 643 22 026   228 22 641 20 183   437 23 401 20 316    DOWDWAANAOANDORFPNAFAKHWPOWDODOUWDOOONAOOH OH  eerrrrrrrrrrrrrrrrrrrrrrrrr  O  O  w  O  O  OQ       m     A big problem which has to be taken into account are PDB files  which use the inserted residue tag to  mark alternative residues  that is  residues superimposed with others like ALA and LEU at the same  position f
26. llowing             if the dihedral angles are needed   PDB  gt GetAngles      Models    PDB  gt IdentifyModels   foreach  Model   Models      Chains    PDB  gt IdentifyChains  Model   gt   Mode1       foreach  Chain   Chains       if the residues are not needed to be processed as a whole   AtomIndex    PDB  gt Get  Model   gt   Model  Chain   gt   Chain  AtomIndex   gt  1      foreach  Atom   AtomIndex       do some stuff with each atom  print   Atom  gt  x   Atom  gt  y   Atom  gt  z  n       or alternatively  if the residues need to be processed one by one   ResidueIndex    PDB  gt Get  Model   gt   Model  Chain   gt   Chain  ResidueIndex   gt  1      foreach  Residue   ResidueIndex       do some stuff with each residue  print   Residue  gt  Phi   Residue  gt  Psi  n n      foreach  Atom      Residue  gt  Atoms     f      do some stuff with each atom   print   Atom  gt  x   Atom  gt  y   Atom  gt  z  n               of foreach  Chain      of foreach  Model       29       15 Error handling    The error handling of ParsePDB consists of two levels     e warnings which are reported and give hints for possible problems but do not cause the pro     cessing to be aborted  e errors which are fatal and cause ParsePDB to abort the processing    If you are not familiar with the try  throw catch methodology have a look at the documentation of  Error pm  In case you do not make use of this  the program dies as any other program does  issuing  the respective error message  However  if y
27. most methods  if no model number is specified  Therefore  if no MODEL tags are given or just    one MODEL is present  the model number may be omitted   e Inside each MODEL   ENDMDL domain  or the whole file if no MODEL is found   Every block  of ATOM lines is regarded as one chain either      until a following TER or    a change of the chain ID           If HETATMs are following the ATOMs  there are two possibilities     1  one chain with ATOMs and HETATMs if    the blocks have the same chain IDs    both have no chain IDs and are not separated by a TER   2  two chains  one with ATOMs and one with HETATM s  if      the blocks have different chain IDs     both blocks have no chain IDs and are separated by a TER     Theoretically  there is the third possibility of the same  defined  chain ID and a TER in between   However  according to the PDB manual  a TER marks the end of a chain  thus it is given the  higher priority in that case  A TER within a chain is more or less a violation of the PDB format  and in terms of the speed of parsing it is much faster to rely on the fact that a TER can only be    at the very end of a chain     All chains inside one model are numbered sequentially as chain 0  1  etc  and can be requested  with this number  If the following check of the chain IDs is successful  they may also be ad     dressed using the actual letter     Each line of the protein section is split into its entries during parsing  With parameters like  AtomIndex and RedidueIndex  g
28. o request     A        Please see    Filtering the data    for more information on       12       the location indicators in the ATOM line   AtomIndex   gt  1    This can be a very handy command  especially if the parser is only used to retrieve the chains  one after another  AtomIndex tells   gt Get to return not the original ATOM lines from the PDB  but the fully parsed entries  That is  each line  atom  in this array is a hash in which the respec   tive ATOM line is broken down into its parts  see also   gt new for the used scheme       Content    PDB  gt Get  AtomNumber   gt  5    Content  0       ATOM 5 CB AVAL A 1 1 224 33 077 8 946 1 00 13 10        Content    PDB  gt Get  AtomNumber   gt  5  AtomIndex   gt  1     Content  0        gt Race       gt     ATOM       gt AtomNumber      gt     5       gt AtomType      gt     CB       AltLoc       gt     A       gt ResidueLabel      gt     VAL       gt ChainLabel      gt     A       gt InsResidue      gt             gt ResidueNumber      gt     1      x    gt  21 224      y    gt  233 077      z    gt   8 946          Occupancy      gt     1 0      Temp      gt     13 10          Rest       gt   1TBE 113           ResidueIndex   gt  1    This switch is even more powerful than AtomIndex and returns the parsed information of the  latter divided up into residues while still providing the same information of AtomIndex on a  per residue basis  The parsed information for each individual residue is available and for each  its atoms
29. ods can be called directly    PDB  gt RenumberModels  ModelStart   gt  3      PDB  gt RenumberChains  ChainStart   gt     G         PDB  gt RenumberResidues  ResidueStart   gt  5      PDB  gt RenumberAtoms  AtomStart   gt  5      When renumbering chains  the letter is processed case sensitive  If the letter    Z    is reached during  renumbering     a       z    will be used and    0       9    after non captials have been used up  Proteins larger  than that require to start again with A Z  which leads to duplicate chain labels and some routines   e g  the retrieval of chains using the chain label  will not work then  However  this is a shortcoming  of the PDB standard with the chain label being restricted to one character  The parser will not return    a chain based on its chain label  if multiple possibilities are found     If the Start parameters are omitted  1 and    A     for chains  are taken by default     Be aware that after renumbering  some information in the header and footer does not fit to the atom  numbers any more  e g  SSBOND  HELIX  CONECT   The numeration of chains restarts in each    model     If the main content should not be altered or for instance every retrieved chain must start at 1  the  Start parameters can be given to   gt Get  which then renumbers only the filtered content but not  the main hash itself  That way  one can extract all alpha carbon atoms and still have a sequential    numbering    Content    PDB  gt Get  AtomType   gt     CA     AtomS
30. om number as the atom before    PDB  gt RenumberAtoms  IgnoreTER   gt  1      Please see also further down chapter 12     Filtering the data           22       11 Generating CHARMM input files    Working with CHARMM usually is a pain somewhere where you don   t want it  At least the parser  is able to solve some of the problems  The main pain is that CHARMM can only process one chain  at a time  Although it is possible to give the  PDB2CHARMM   gt  1  parameter to   gt Get and every    related method  you will use   gt WriteChains most of the time     If you use another method  please remember to work with only one chain to produce a valid input  file      CHARMMFiles    PDB  gt WriteChains  PDB2CHARMM   gt  1      To convert the CHARMM PDB output files to standard format  use the realted method CHARMM2PDB  equivalently with   gt Write or   gt Get     11 1 What it does    e All chains are written into seperate files   e All residues are renumbered sequentially starting at 1 in each chain  e    HIS    is replaced with    HSD      the last O is renamed OT1  OXT is renamed OT2    All atom types are replaced according to this list     http   www  bmrb wisc edu ref_info atom_nom tbl    When more than one chain is detected  a warning will be issued that CHARMM is unable to process    more than one chain at a time     11 2 What it does not do    e Terminal acetyl groups    Sometimes proteins have a terminal acetyl group on the amino end which needs to be patched  to work with CHARMM
31. or example  Without checking the atom distances or the header remarks  it is not possible    to distinguish between inserted residues and alternative residues     If it is important  that the chain has no such alternatives  they can be reliably removed using the  method   gt RemoveInsertedResidues  It loops over all residues with an inserted residue tag  checks  the atom distances with the neighbouring groups and removes only residues if they are superposi     tions      PDB  gt    gt RemoveInsertedResidues        25       If it is really important that there are no superpositions and the PDB file might be crappy enough that  the inserted residue tags are not reliable  the parameter Intensive can be set true  what will then    cause a check off all residues in the protein    PDB  gt RemovelInsertedResidues  Intensive   gt  1      If superpositions are found  the residue with an InsResidue tag is removed  it does not matter   whether the sequence is 36  36A or the other way round  If none of them possesses an InsResidue    marker  the second one is removed     In all cases  a warning is issued  stating the ResidueNumbers  the external ones for easy comparison    in the PDB file  and which of them was removed     12 3 Alternative Atom Locations    If some atoms showed a deviation during the structural elucidation of the protein via NMR or X Ray   the alternative locations are sometimes stated within the same model  instead of in a second one  The  alternate location indicator i
32. ou catch the error in the main program  it will    survive       and can handle the error appropriately  or simply close open file handles before dieing off as well      15 1 Which error can happen where     This listing can be helpful to decide for which error to check after several actions in your script  As  you can see  this is merely necessary after   gt new  whose only error is due to user incompetence  and    gt Parse  After that  the PDB and its parser should work smoothly together          gt new  NoFile  If no file name was given or the file has not been found  FileNotFound  If the file given via FileName was not found   e   gt Parse and   gt Reset  I0Error  If the file could not be opened  e g  due to a permission problem   CorruptFile  If the file is empty or no ATOM lines have been found in it  e   gt Get and   gt Write  and all methods using them     UnknownElement    If a specific element was requested via Element   gt      that has not yet been specified in ParsePDB pm    BadParameter    If an external identifier and the respective internal identifier were given at the same time  e g     Chain and ChainLabel  It is no problem of course to mix for instance Residue and AtomNumber        30       e   gt Write  NoFile  If no file name has been given to   gt Write  IOError    If the file could not be opened  e g  due to a permission problem     15 2 Methods for Error Handling    e   gt Warning    Returns true  if a warning has been issued    Returns false  if no wa
33. returned if it is tried to access them via the AtomNumber  For residues  the num   bers may also contain letters in case of inserted residues like 34A  In most of the PDB files  only one    model exists and this is usually not explicitly named  thus it possesses no external number     The biggest problem are chain IDs  Very often  chains are not named at all  empty chain ID  or  double IDs exist  like A  B  A  B   In both cases  ChainLabel cannot be used to access the chains   Every used model is checked right after parsing for those errors  Each use of ChainLabel triggers a  check whether the chain IDs of the current model have passed this check and are valid  If they are    invalid  a warning is issued  the parser stops processing and returns undef     It is also possible for the user to determine  whether external chain identifiers can be used for a model     if   PDB  gt ChainLabelsValid  Model   gt  0            else            As usual  the method defaults to model 0  if no parameters are given  If the chain labels are invalid     the chains need to be renumbered  before they can be accessed via ChainLabel        16       8 Write the Whole or Parts of the PDB    To write out specific parts of the PDB  the method   gt Write is used  All parameters are identical with    gt Get  so it is possible to write out for example only certain atom types or all alanine residues and    so on    PDB  gt Write  FileName   gt   file pdb       Header and Footer are included by default  T
34. rning has been issued  if   PDB  gt Warning    print  Uh oh    n     e   gt GetWarnings    Only returns the warning messages as array  no automatic output     if   PDB  gt Warning      QWarnings    PDB  gt GetWarnings   print  Warnings     e   gt PrintWarnings    Prints out all warning messages and also returns an array     if   PDB  gt Warning      Warning    PDB  gt PrintWarnings        To check for specific warnings  e g  to tell the user user explicitely  why his crappy file is crap indeed     one can use the following methods     e g  if   PDB  gt Warning NoChainLabel    print  Watch out        e   gt Warning_ NoENDMDL   If a MODEL without a corresponding ENDMDL has been found  e   gt Warning_NoChainLabel   If no chain ID is given at all  e   gt Warning MultipleChainLabel    If a certain chain ID has been found more than once  This can lead to problems when using let   ters to read chains    gt Get  Chain   gt   B     If this warning has been reported    gt WriteChains    ignores the setting of   gt ChainLabelAsLetter and uses numbers        31       e   gt Warning_ UnknownModel  If the requested model is not defined  e   gt Warning_UnknownChain  If the requested chain is not defined  e   gt Warning UnknownChainLabel  If the given chain ID could not be found in the PDB  e   gt Warning UnknownAminoAcid  If a 1  or 3 letter code given to AminoAcidConvert has not been recognized  e   gt Warning InvalidAminoAcid    If a code given to AminoAcidConvert has not 1 or 3 letters  
35. s usually    A    or    B     but has also been found as    1    and    2     The parser    detects the used method automatically     ATOM 143 N SER 18 7 902 9 621 14 878 1 00 8 15  ATOM 144 CA SER 18 6 436 9 552 14 567 1 00 10 68  ATOM 145 C SER 18 6 287 9 730 13 049 1 00 10 82  ATOM 146 O SER 18 7 124 10 246 12 306 1 00 11 90  ATOM 147 CB SER 18 5 687 10 570 15 337 1 00 14 98  ATOM 148 OG ASER 18 6 225 11 165 16 468 0 50 14 28  ATOM 149 OG BSER 18 6 181 11 830 15 086 0 50 9 85    Alternative atom location A and B  column 17     The alternative atom locations can be removed using the AtomLocations keyword via   gt new or      gt SetAtomLocations    PDB  gt SetAtomLocations     First         This will enable the filtering of the additional atoms and return only the first one  If these atoms  are not needed anyway  they can be removed entirely with   gt RemoveAtomLoactions and the same    keyword  which denotes the atoms which are kept    PDB  gt RemoveAtomLocations  AtomLocations   gt     First         This deletes all atoms with multiple locations and keeps just the first one  See   gt new for more infor   mation  If no parameter is given  the default is taken  which is set to    All    and therefore does not do    anything        26       13 Other methods    13 1   gt GetFASTA    Returns an array with the FASTA format lines  A model number has to be provided  otherwise model  0 is taken as default  To process only a single chain  Chain or ChainLabel can be specified  A stan
36. sidueIndex and AtomIndex  respectively    PDB  gt GetElement    e   gt GetCoordinates    Returns a 2 dimensional array with the coordinates of the requested atoms  all parameters like  Model  Chain and the filter commands are given to   gt Get and all lines not beginning with  ATOM  e g  TER  are ignored      CAUTION  All coordinates are returned  regardless of how many chains are retrieved from    gt Get  So be careful to specify a particular one      Coordinates    PDB   gt  GetCoordinates  ChainLabel   gt     A       print  First Atom  x  Coordinates 0   gt  x     y  Coordinates 0   gt  y     z  Coordinates 0   gt  z  n      e   gt GetAngles       19       Returns a hash with the    and Y angles of a chain or a residue  Remember that for the first  residue of a chain  no    angle is defined whereas the last one does not have a    angle  the    angles are then given as 360        The angles are returned in an array  if only one residue was processed  only the first element  is filled   The routing calculated the bond distance between two residues  If it is too big  for  example between two chains or if a residue has been removed for same reason   the respective    angle is given as 360       ZAngles    PDB  gt GetAngles  Residue   gt  2      ZAngles    PDB  gt GetAngles  ResidueNumber   gt  2    print  Phi angle   Angles 0  Phi  n    print  Psi angle   Angles 0  Psi  n n      If a certain model or chain is processed with the method  the calculated data is only returned   bu
37. t  Element   gt        To refine the filter pattern  you need to edit the filter  variables at the very beginning of ParsePDB pm      AllElements    PDB  gt IdentifyElements  Model   gt  0  Chain   gt  0            6 Count Number of Subgroups of the Protein    e   gt CountModels  Returns the number of models in the PDB    ModelNumber    PDB  gt CountModels   e   gt CountChains  Returns the number of chains in a model  If no model is given  0 is taken by default    ChainNumber    PDB  gt CountChains  Model   gt  2    e   gt CountAtoms    Returns the number of atoms in the specified part of the protein  If no model is given  0 is  taken by default  ATOM and HETATM lines are treated equally  if you do not want to process  HETATMs filter them out via NoHETATM   1  see   gt new      AtomNumber    PDB  gt CountAtoms  Model   gt  0  Chain   gt  2      If you need HETATMs but want to determine the number of ATOMs or HETATMs in one model    or chain  you can use the parameter Race   gt   ATOM     AtomNumber    PDB  gt CountAtoms  Model   gt  0  Chain   gt  2  Race   gt   ATOM     e   gt CountResidues   Returns the number of residues  If no model is given  0 is taken by default      ResidueNumber    PDB  gt CountResidues  Model   gt  0  Chain   gt  1            7 Retrieve a Part of the PDB with   gt Get    The method   gt Get is the universal tool to retrieve content from the parsed PDB  The information  gathered via the Identify or the GetIdentifier methods can be fed into   gt Get to
38. t not saved for later use  Sometimes it is more handy  and faster  to simply calculate all    angles in one go and retrieve them later on with other information   This can be achieved by calling the method without any parameters     gt GetAngles    to calculate and save all dihedral angles    The computed angles are then included in the ResidueIndex which can be retrieved from      gt Get     gt GetSection    To fetch certain information from header or footer  All lines starting with the given pattern are    returned    Section    PDB  gt GetSection   CONECT        gt GetResolution    Returns the resolution in Angstroms that was used for building the model  If no REMARK 2    field or no resolution is given  undef is returned    Resolution    PDB  gt GetResolution     gt GetHeader    Returns the header  everything until the first MODEL or ATOM   The parameter MinHeader      gt  1 can be given to get only a minimal header    Header    PDB  gt GetHeader     gt GetMinHeader   This is a shortcut to retrieve a minimal header      MinHeader    PDB  gt GetMinHeader        20       e   gt GetFooter    Returns the footer  everything after the last ATOM  HETATM  TER or ENDMDL   The param     eter MinFooter   gt  1 can be given to get only a minimal footer    Footer    PDB  gt GetFooter    e   gt GetMinFooter  This is a shortcut to retrieve a minimal footer      MinFooter    PDB  gt GetMinFooter     10 Renumbering Entries in the PDB    To renumber the protein globally  the following meth
39. t one is found  This could have been solved faster by a hash key for each external num   ber but this would raise lots of problems with inserted residues  alternative atom locations  crappy  PDB files and would extremely slow down many methods like    Renumber    for example and the idea    was therefore discarded     The access of whole chains is the fastest of all  compared to models  residues and atoms   In other  words  the parser is quite optimized for that  Thus  extracting each single atom one by one would be  about an order of a magnitude slower than to extract the whole chain and looping over the lines in it   If a residue is requested  the parser has to determine the chain  then looks up the atoms which belong  to that residue  extracts and filters the lines and returns them  In any case it speeds up processing if  the chain is specified  so rather than looping over the whole model  loop over each chain and process    the residues or atoms     Two of the most powerful switches of   gt Get are AtomIndex and ResidueIndex  They provide the  possibility to extract the readily parsed atom lines instead of the original strings and break a chain  down into residues in just one parser query  Therfore  using these switches you can spare cutting  the line yourself with substr or requesting lots of information like AtomType  AtomNumber and so on    with the respective   gt Get method     As a ready made code snippet  the fastest way to access each atom line in a file is the fo
40. tance an array with AtomTypes can be extracted and while looping over it  the current index  of the array  e g   AtomTypes  5   can be used to extract the respective atom  which possesses    the internal number Atom 5     Using the internal numbers is much more favorable than using the external identifiers  It is  easier to program  the numbers always start at 0 and are always sequential  In addition to that   one does not need to worry about missing ChainIDs and format errors like that  Furthermore   processing is much faster  something like 10 times and more if it comes to accessing residues or  atoms   since the external identifiers have to be translated and this translation may moreover be  prone to bugs  like finding only the first match if an identifier is used twice for some reason    Inserted residues have the same external number and can only be accessed via the internal    number     4 Initialization of a PDB Object    The parser can be invoked by the command ParsePDB  To create a new object  the method new is    required      PDB   ParsePDB  gt new  FileName   gt   File     PDB  gt Parse       gt Parse reads the whole PDB and splits it up into its subgroups  The routine for this is as follows     e Look for blocks divided by MODEL tags  Every MODEL tag causes the parser to regard the    following block as a new model  a terminating ENDMDL is not desperately needed     If no MODEL tag is found  the protein is regarded as model 0  This is also the default value for  
41. tart   gt  5     Only the returned array is renumbered  leaving the internal data untouched        21       10 1 Renumbering Inserted Residues    When inserted residues are contained in the PDB  they usually have the same residue number as  the residue before with an added inserted residue tag  e g  residue 21 and 21A  If the insertion codes  should be considered  that is if you want to preserve this numbering scheme explicitely  you can set    the parameter KeepInsertions to 1    PDB  gt RenumberResidues  KeepInsertions   gt  1      Since 1    is the default for KeepInsertions  it is more likely needed when you need to turn it off if you  want to remove the inserted residue tags and have each residue numbered with a different number   Be aware  that inserted residues might be superpositions instead of insertions    gt RemoveInsertedResidues  checks the distances of the atoms of each residue with an InsResidue tag and removes only the ones  which are indeed superpositions  whereas real insertions are kept  That means  after you have exe   cuted RemoveInsertedResidues  you can safely renumber the residues discarding their InsResidue    tag with     PDB  gt RenumberResidues  KeepInsertions   gt  0      10 2 Ignoring the TER    According to the PDB manual  a TER line has its own atom number  i e  it counts as an atom  If you  for some reason need all atoms to be numbered sequentially without counting the TER  you can set    IgnoreTER true  The TER line will then have the same at
42. ted if the atom types are unambiguous  That is  if one type is  found more than once  the key AtomTypes of this residue will be removed since the integrity of  the atom type cannot be assured  If used in a script  the absence of this key points to problems    with the atom types and the array can be used instead to find out what is wrong in the PDB   The following example shows the result of the query   Content    PDB  gt Get  ResidueNumber   gt  1  ResidueIndex   gt  1      for a residue containing two atoms with the atom types CA and 0      Content 0        gt AtomTypes      gt     CA    gt      gt AltLoc       gt     A       gt AtomNumber      gt    3       gt AtomType      gt     CA       gt ChainLabel      gt     A       gt InsResidue      gt              Occupancy      gt     0 0       gt Race      gt      ATOM       gt ResidueLabel      gt     ALA       gt ResidueNumber      gt     1          Rest    gt   Go rt     Temp    gt   27 84       gt x    gt    10 309        gt y       gt  753 910      2Z    gt  225 295     Fs  0    gt       gt AltLoc       gt     A        gt AtomNumber      gt     4        gt AtomType      gt     0        gt ChainLabel      gt     A        gt InsResidue      gt            Occupancy      gt     0 0        gt Race      gt      ATOM        gt ResidueLabel      gt     ALA        gt ResidueNumber      gt     1           Rest    gt   Or 2      Temp        gt   27 80        x    gt    9 414         gt y       gt  763 366       z    gt   24 654      
43. the code under    default variables    If MinHeader is given  the Header keyword can be omitted    Footer   gt  0   1   Include the footer true false  By default the footer is NOT included   MinFooter   gt  0   1    Include just a minimal footer true false  This is false by default  If set true  only the line  beginning with END is returned  If additional other lines are needed  this can be changed in    ParsePDB pm at the beginning of the code under default variables   If MinFooter is given  the Footer keyword can be omitted   ModelStart   gt  value    Renumber the Models in the returned content  This does not affect the main hash  that is   the content is renumbered after it was extracted  To renumber globally  see RenumberModels    instead        11       e ChainStart   gt  letter    Renumber the chain IDs of the returned ATOM and HETATM lines starting with the given  letter  The letter is processed case sensitive  If the letter    Z    is reached during renumbering   the next chain will be    a     continuing with non capital letters  After reaching z the number 0   9 are used  Proteins larger than that require to start again with A Z  This does not affect the  main hash  that is  the content is renumbered after it was extracted  To renumber globally  see    RenumberChains instead    e ResidueStart   gt  value  Renumber the residue numbers of the returned ATOM and HETATM lines sequentially starting  at value     or 1 by default   See also RenumberResidues    e AtomStart  
44. to get rid of them    PDB  gt SetAtomLocations   All     e Verbose   gt  0   1  default 1   Turn verbose mode off or on  If enabled  all warnings  i e  wrong ChainLabel  are printed      PDB  gt SetVerbose  1            5 Identify Subgroups of the Protein for the Use in Loops    The Identify Methods return the requested identifiers of a specific domain in the PDB  This domain   model  chain  residue  can be narrowed using the internal or external identifiers  The returned list  can then be used in a loop to use it with   gt Get    e   gt IdentifyModels    Returns an array with the internal identifiers of the models  i e   0  1  2  if three models are    present    A11Models    PDB  gt IdentifyModels   e   gt IdentifyModelNumbers    Returns an array with the external identifiers of the models  i e   1  2  3  if three models are    present  These numbers may change  if the file is renumbered via   gt RenumberModels    Al1ModelNumbers    PDB  gt IdentifyModelNumbers    e   gt IdentifyChains    Returns an array with the internal identifiers of the chains in a certain model  i e   0  1  2  if three  chains are present  If no model is given  0 is taken by default  These numbers represent the  chains in the order as they occur in the PDB  Accessing the chains via their sequential numbers  is in any case more secure than using the chain IDs and works definitely with EVERY file  no  matter how crappy its format turns out to be  I   m sorry  if I keep on repeating myself  it comes    w
    
Download Pdf Manuals
 
 
    
Related Search
 ParsePDB.pm 
    
Related Contents
UK - Manuale LITESTAR 4D Litecalc - Rv12 100615 Jill  Digital Home Safe  Pour consulter le mini-guide d`utilisation de la liseuse  MODE D`EMPLOI systeme fm sCOLA™  Port Designs Detroit II 10.1"  Origin Storage 160GB  CW-1801/165 CW-1801  Philips 50PFP5332D 50" plasma integrated digital widescreen flat TV  - Herrmann & Co  LaCie 301442U external hard drive    Copyright © All rights reserved. 
   Failed to retrieve file