Home

ParsePDB.pm

1. Atoms gt AltLoc gt A gt AtomNumber gt 3 gt AtomType gt CA gt ChainLabel gt A 14 gt InsResidue gt Occupancy gt 0 0 gt Race gt ATOM gt ResidueLabel gt ALA gt ResidueNumber gt 1 Rest gt GPa Temp gt 27 84 x gt 10 309 y gt 53 910 z gt 25 295 AltLoc gt A gt AtomNumber gt 4 gt AtomType gt 0 gt ChainLabel gt A gt InsResidue gt Occupancy gt 0 0 gt Race gt ATOM gt ResidueLabel gt ALA gt ResidueNumber gt 1 Rest gt O01 gt Temp gt 27 80 x gt 9 414 gt y gt 53 366 z gt 24 654 InsResidue gt ResidueLabel gt ALA ResidueNumber gt 1 With ResidueIndex it is easy to loop over the whole chain residue per residue and even over each atom in the residue ResidueIndex PDB gt Get Chain gt 0 ResidueIndex gt 1 foreach Residue ResidueIndex print Residue gt ResidueLabel n print Residue gt ResidueNumber n print Residue gt Phi Residue gt Psi n foreach Atom Residue gt Atoms print Atom gt AtomNumber n print Atom gt Chain
2. 6 Count Number of Subgroups of the Protein 7 Retrieve a Part of the PDB with gt Get 7 1 Parametersfor gt Get a eee eee eee eee 7 2 Internal Versus External Identifiers 00 8 Write the Whole or Parts of the PDB 9 Retrieve Certain Information About the Protein 10 Renumbering Entries in the PDB 10 1 Renumbering Inserted Residues 00000 10 2 Jenoring the TER yae taste a Bate Soe eve e eg Baha Pata Ne 11 Generating CHARMM input files hal Whataitdoesi2 t tere St aoa ht cee ear ee P te ee BS awa T12 Whatt does notado asne stath ie a ec Fok ee RE Pe aa 12 Filtering the Data 12 1 Keywords for filtering actions Ravens oe eared a 12 2 Inserted Residues a aooaa a a 12 3 Alternative Atom Locations oaoa a a a 13 Other methods PSS GetRASTA 10 4 gle aa a dk it BE a a ky eR adele E 13 2 gt SsWriteFASTA 4 202s a dsc Qa ee ke SE OOK ae Ses P3 3 SAmingACidConverts eee 20 Nok toned Woe le aya DE ee ae oA P34 SFormathines lt ce 2 2 has ttn hk ee oe Be eae ee ee ei 14 Speed issues 15 Error handling 10 10 16 17 18 21 22 22 23 23 23 24 24 25 26 27 27 27 27 27 1 Foreword Despite the fact that there are several packages around with the ability to parse PDB files and do funny things with them e g BIOPERL it looked like there was a lack of really easy ones Driven by the need of breaking a protein into its subgroups mo
3. It might also be that it needs to be counted to the first residue usually it is numbered as residue 0 In that case ResidueStart gt 0 can be given e Prosthetic Groups Sometimes PDB files contain coordinates for non peptide groups crystallographic waters haem groups metal ions etc CHARMM can deal with these but it can be very difficult to do If you need to include them split them up into their own PDB files That means that you might even have to split up a block of HETATMs into several files e g waters haem groups and other stuff 23 e Disulphide bridges PDB files hold information on disulphide bridges with the SSBONDS keyword CHARMM ignores this You must specifically add these using the PATCH command for example PATCH DISU prot 2 prot 11 creates a disulphide bond between residues 2 and 11 of the protein with the segment id prot CAUTION The residue number might have changed since they are auto matically renumbered so that they start counting at 1 e Protonation state You should consider carefully the protonation state of your titratable residues e g a histidine could be protonated in which case it should be changed from HSD to HSE Assuming residue 29 in a segment named prot was a histidine that should be protonated the following patch command would accomplish this PATCH hs2 prot 29 rename resn HSE sele resi 29 end There s no easy way to decide if a residue should be modified One w
4. The parser provides the possibility to handle errors via the try catch otherwise methodology The possible errors are divided into IO error opening closing writing a file etc Config wrong param eters given to a routine and PDB error due to crappy PDB format The syntax for the try catch block is as follows It is quite picky note the semicolon at the end and that no semicolon is after the other blocks it is actually only one single commmand try PDB ParsePDB gt new FileName gt File PDB gt Parse catch Exception Config with Error shift print Error parsing the file n Error catch Exception 10 with Error shift print An I O Error occurred n Error catch Exception PDB with Error shift print Error parsing the file n Error finally exit 1 3 32
5. be given to the routine which returns the formatted line that can be written to a PDB AtomIndex PDB gt Get Model gt 0 Chain gt 0 AtomIndex gt 1 foreach Atom AtomIndex 27 some fancy if condition here or changes to the atom hash Line PDB gt FormatLine Atom gt Atom print PDB Line 14 Speed issues There are a few things you should know if speed is an issue for your work As long as you work with a single file of a normal size the way how you use the parser will not make a big difference As soon as you have to process a load of PDBs with ten thousands of atoms you might want to consider some facts of the way the PDB is treated during the parsing First of all remove everything you do not need If you do not process HETATM SIGATM ANISOU or SIGUJJ entries remove them even before parsing with the respective switches of gt new or the respective gt SetVariable methods This speeds up the parsing even if none of these atoms are present since the regular expressions which determine an atom become significantly smaller And RegExes are pigs when it comes to speed The use of external numbers is an absolute no no in speed critical programs Each external param eter is converted to the respective internal one prior to processing the request This is reasonably fast for models and chains but comes down to a loop over the atom lines for a residue or an atom number until the correc
6. can be accessed either via an array by looping over them or via a hash using their atom types as keys Content PDB gt Get ResidueNumber gt 12 ResidueIndex gt 1 Content 0 Atoms gt array reference with the parsed atom lines gt AtomTypes gt hash reference with the parsed atom lines gt InsResidue Phi 120 23 Psi 7104 12 gt ResidueLabel ALA gt ResidueNumber 12 Note that the phi and psi angles are not computed automatically In order to have them in ResidueIndex gt GetAngles without any parameters has to be executed once The angles are then calculated for the whole file and saved In the above example a single residue is retrieved having the external number 12 and there fore only one array element is returned This can of course be used for a whole chain to con veniently loop over each residue and each atom within it The atoms are available in the array 13 Atoms which comes in handy if every single one needs to be accessed one after another If par ticular atoms are needed like only the C carbon for example they can be accessed via the AtomTypes hash The same can be achieved directly via gt Get by requesting a specific atom type in a residue however if this is needed for every residue in a protein the approach via the ResidueIndex is by orders of magnitudes more efficient The hash AtomTypes is only genera
7. gt value Renumber the returned ATOM and HETATM lines sequentially starting at value or 1 if no value is given See also RenumberAtoms e KeepInsertions gt 0 1 If set to 0 the insertion codes of residues are considered during renumbering the residues Please read Filtering the data for more information e SetChainLabel gt A B Changes the chain ID of all returned ATOM SIGATM ANISOU SIGUIJ and HETATM lines By giving a blank the chain label can be removed CAUTION This function should be used carefully It does not care about multiple chains and will simply change every ID to the given value To renumber the chains use the ChainStart keyword instead e PDB2CHARMM gt 1 Formats the returned array for the use with CHARMM Read further down for more informa tion e CHARMM2PDB gt 1 For the use with CHARMM created PDBs replaces the atom types of CHARMM with the stan dard PDB labels Read further down for more information e AtomLocations gt First All None A B etc If alternative atom locations for an atom are available return either only the first all or none of them The switches are not case sensitive If First is requested the filter returns all atoms with location A as well as the ones which have NO alternative location If you enter A ONLY atoms with altLoc A are retrieved If you want to get also the ones with no altLoc you have t
8. retrieve the respective PDB lines Although gt Get can handle external identifiers the access is much more efficient faster via the internal ones since the former have to be translated before the content can be retrieved Chain2 PDB gt Get Model gt 0 Chain gt 0 7 1 Parameters for gt Get The following Parameters can be used to request a specific part of the protein from gt Get including several possibilities to change it to certain needs e Model gt O 1 2 internal value ModelNumber gt 1 2 3 external value from the PDB The number of the model The available model identifiers can be retrieved using gt IdentifyModels and gt IdentifyModelNumbers e Chain gt O0 1 2 internal value ChainLabel gt A B gt external value The number of the chain The available chain identifiers can be retrieved using gt IdentifyChains Using the latter method is only possible if the chain IDs have been checked successfully and no duplicate or missing IDs have been found This means that if you want to access the chains via their ID you have to add an if condition to check whether you can do so or not PDB gt ChainLabelsValid returns true if the chain IDs are OK returns false if missing or multiple chain IDs have been detected To get the real chain IDs use gt IdentifyChainLabels to get the ID for a particular chain use gt Get
9. ChainLabel see further down e Residue gt 1 2 3 internal value ResidueNumber gt 1 2 3 external value Returns the ATOM lines of a particular residue e ResidueLabel gt ALA Returns only alanine residues 10 Atom gt 1 2 3 internal value AtomNumber gt 1 2 3 external value Returns the ATOM line of a particular atom AtomType gt CA Returns only CA atoms To distinguish between a carbons and calcium enter Ca for the latter To retrieve only carbons use Race gt ATOM which filters out all HETATMs Element gt C 0 N P Return only carbons oxygens etc This will also get CA or OXT To refine the filter pattern you need to edit the filter variables at the very beginning of ParsePDB pm Header gt 0 1 Include the header true false If a parsed content is requested via the AtomIndex keyword the first six characters of the line are stored as Race similar to atoms and the remaining columns are stored in Rest By default the header is NOT included MinHeader gt 0 1 Include just a minimal header true false This is false by default If set true only lines begin ning with HEADER TITLE or COMPND are returned This command overrides the value of HeaderRemark that is no remark will be added to a minimal header If the choice of lines needs to be changed this can be done in ParsePDB pm at the beginning of
10. ChainLabelsValid returns true before accessing the chains via letters otherwise you can only use numbers until the IDs have been corrected with gt RenumberChains PDB gt SetChainLabelAsLetter 0 e ChainSuffix gt c default c Defines the suffix that is added to the base name by gt WriteChains PDB gt SetChainSuffix c e ModelSuffix gt m default m Defines the suffix that is added to the base name by gt WriteModels PDB gt SetModelSuffix m e HeaderRemark gt 0 1 default 1 If enabled a remark that the file has been changed by the parser and the header and footer information might not be valid any more is added to the header This is done be gt Get and gt Write but not if the header is requested using the method gt GetHeader By default the remark lines are added either after HEADER COMPND or TITLE depending on which line is found last That is the comment will be inserted as the first REMARK lines If another position is needed e g directly after the HEADER line then this has to be changed at the beginning of ParsePDB pm under default values PDB gt SetHeaderRemark 1 e AtomLocations gt First Al1 None A B etc default A11 Tells gt Get and gt Write globally how atoms with alternative atom locations are to be handled Please see gt Get for more information If you do not need the alternative locations at all you can use RemoveAtomLocations
11. Label n The TER is not counted as part of the residue that is it will not be included in the last residue of a chain The filter arguments are accumulative and can be freely combined to filter out all CA atoms in all alanine residues for instance They are just working on the atom lines all other lines like MODEL TER etc are also returned The gt Get routine returns the whole PDB with the MODEL ENDMDL tags and one single Model without them but with the TER terminators of the chains 15 If the Model or Chain parameter is omitted it depends what happens e no Model no Chain returns the whole file without header and footer e Model but no Chain the whole model with all chains is returned e Chain but no Model Model is assumed as 0 e g if no MODEL tags are there at all 7 2 Internal Versus External Identifiers As previously mentioned the use of internal identifiers see 3 is preferable First of all it is easier to program as e g the ResidueNumbers or AtomNumbers have to be retrieved at first to loop over them and extract single objects And second of all for speed reasons as it is quite elaborate for the parser to interconvert external and internal identifiers Atoms and residues always have an external number however PDBs with invalid numbering schemes are even found in the Protein Data Bank The worst case are double atom numbers in which case only the first atom is
12. ParsePDB pm Package ParsePDB pm Author Benjamin Bulheller Mail Address webmaster at bulheller com Website http comp chem nottingham ac uk parsepdb http www bulheller com Research Group Prof Jonathan D Hirst School of Physical Chemistry University of Nottingham Funded by EPSRC Date November 2005 November 2008 Acknowledgments Special thanks to Dr Daniel Barthel for many many discussions and help whenever needed Licence Copyright 2009 Benjamin Bulheller www bulheller com This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PUR POSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with this program If not see http www gnu org licenses University of Nottingham Contents 1 Foreword 2 Installation 3 Nomenclature and Naming Conventions 4 Initialization of a PDB Object 4 1 Parameters explicitly for gt new 2 4 24 ware 2 SSF SAS SEGRE MES 4 2 Changeable Parameters for gt new 0 005 ee eee 5 Identify Subgroups of the Protein for the Use in Loops
13. as used during the development of the PDB parser was 0 15 After installation the parser can be used in a script by including the library with the use command use ParsePDB The function use searches Perl s library path for the given package If you do not want to install the package globally on your system for example if you do not possess root permissions then you can also copy the pm file to a folder in you home directory If you for instance collect your packages in bin perllib you can add this directory via use lib ENV HOME bin perllib use ParsePDB This would also work for Error pm Another nice trick is to add the folder the executed script is actually living in The package FindBin is usually included in standard perl installations and sets the variable Bin to the folder of the script which can then be included via lib use FindBin qw Bin use lib Bin 3 Nomenclature and Naming Conventions For the access to the single elements of the PDB models chains residues atoms or even specific atoms of a certain type there are some naming conventions which are followed throughout the parser It is important to differentiate between two things e external values These values are read directly from the PDB This means that the first item in a list a residue or an atom might not necessarily start at 1 be sequential or that the values change after renum bering the file Chains might not be accessibl
14. ay is to check if there are any H bond donors or acceptors close to specific atoms e g ring nitrogens in histidine There may also be comments in the PDB file 12 Filtering the Data 12 1 Keywords for filtering actions The returned data of gt Get and gt Write can be filtered using the keywords Model ModelNumber Chain ChainLabel Residue ResidueNumber ResidueLabel Atom AtomNumber AtomType Element and Race Please see gt Get for more information ATOM 554 CA ILE B 4 55 013 57 563 15 473 6 00 5 58 Residue internal ResidueNumber external Chain internal fi ChainLabel external l l ResidueLabel gt ILE AtomType gt CA Element gt C Atom internal AtomNumber external Race 24 12 2 Inserted Residues If evolution has found it clever for some reason to insert some residues in a protein we re left with a mutant that has the same amino acid sequence as its parent except for a few residues somewhere in the middle To maintain comparability these additional residues often have the same residue number with an additional letter to distinguish between them When renumbering the residues with the parameter KeepInsertions gt 0 the insertion codes will be removed and all residues numbered with a different number KeepInsertions gt 1 de fault wi
15. dard header line is added before a new chain begins and a newline character is added if the sequence line with the one letter codes of the amino acids is 80 characters in length The method only processes a chain if there are ATOMS in it that is HETATM chains are ignored If a residue label cannot be converted into the one letter code a warning is issued by AminoAcidConvert FASTA PDB gt GetFASTA Model gt 0 13 2 gt WriteFASTA Writes a FASTA file of the protein If a file name is provided the extension fasta is added If the file name is omitted the base name of the original PDB file is taken The method returns the array with the FASTA lines if they are needed for anything else and if only for checking whether anything was written at all gt WriteFASTA Model gt 0 FileName gt 4mbn 13 3 gt AminoAcidConvert Converts a 1 letter code into a 3 letter code and vice versa Code PDB gt AminoAcidConvert Code 13 4 gt FormatLine Returns a string in PDB format from a given atom hash For many problems working with the Atom Index is by far easier and faster than retrieving the complete PDB line e g for more complex filtering actions than possible via gt Get However if a PDB needs to be written retrieving the formatted PDB line as well is circumstantial and time consuming especially if it needs to be tweaked somehow In such a case the filtered and or changed atom hash from the atom index can
16. dels and chains ParsePDB has been coded with the intention to create a package that is powerful enough to handle PDBs with a fair amount of functions but is still easy to handle Keeping the complexity at a minimum a protein can be read parsed and its chains written into single files with just three commands which are as easy as new Parse and WriteChains Given certain parameters the atoms and residues can be counted renumbered and filtered i e just certain elements or residues can be extracted Most of the command names are designed in such a way that they may take a little time to type but are easy to remember and meaningful when being read The PDB parser is an integral part of the web interface DichroCalc which can be freely used at http comp chem nottingham ac uk dichrocalc Benjamin Bulheller 2 Installation The PDB parser itself is a Perl package indicated by the extension pm To install the package globally on your system you can use the provided makefile to copy it to your library path To do this login as root and follow the standard routine perl Makefile PL make make install Since the package uses Error pm to handle exceptions this package needs to be installed too If it is not already installed then make will issue a warning message about that Error pm can be found at http search cpan org by searching for Error was written by Graham Barr and is maintained by Shlomi Fish The version which w
17. e keyword The available numbers can be retrieved via gt IdentifyResidues and gt IdentifyResidueNumbers and the returned values can be fed into gt Get via Residue gt value orResidueNumber gt value respectively ResidueLabel PDB gt GetResidueLabel Residue gt 2 ResidueLabel PDB gt GetResidueLabel ResidueNumber gt 2 18 e gt GetAtom Returns the internal number of an external AtomNumber Atom PDB gt GetAtom AtomNumber gt 17 e gt GetAtomNumber Returns the external number of an internal atom number AtomNumber PDB gt GetAtomNumber Atom gt 0 e gt GetAtomType Returns the type of the atom with a given atom number The available numbers can be re trieved via gt IdentifyAtoms and the returned value can be fed into gt Get via AtomType gt type AtomType PDB gt GetAtomType Atom gt 2 internal AtomType PDB gt GetAtomType AtomNumber gt 3 external e gt GetElement Returns the element of the atom with a given atom number The available numbers can be retrieved via gt IdentifyAtoms and the returned value can be given to gt Get viaElement gt element Element PDB gt GetElement Atom gt 5 If you plan to use the ResidueIndex or AtomIndex later on and need the Element you can call the routine without parameters which will save the element information to the main hash that is it will then be available in the returned Re
18. e via their external ChainID in case no external ChainID is given which is quite common Overview of the numbers and labels of a PDB entry and their name in the parser ModelNumber external Model internal MODEL 1 ATOM 1 N LYS A 1 15 872 7 811 19 851 1 00 76 73 ATOM 2 C LYS A 1 15 332 7 443 18 561 1 00 99 86 ATOM 3 CA LYS A 1 14 650 6 096 18 757 1 00 72 69 l Residue internal l ResidueNumber external Chain internal ChainLabel external ResidueLabel gt LYS AtomType gt CA Element gt C Atom internal Race AtomNumber external internal numbers Each item model chain residue atom can be accessed via its sequential number in the domain starting at 0 This number will never change for models or chains although there is some logical ambiguity for atoms and residues Residue 0 in the second chain is also Residue 20 if no chain is specified and the first chain contains 20 residues taking into account that counting starts at 0 If a Residue is specified the atom number is relative to that residue and can thus change for that very atom if no chain or no residue is given Although it was at first thought to be a nice idea to let internal numbers start at 1 it turned out to be much more versatile to start at 0 like everything else in perl does too That way for ins
19. esidues like metals or water to ensure the comparability of returned arrays that for the same search parameters like Model 0 Chain 0 a certain index will always belong to the same residue gt IdentifyResidueNumbers Returns an array with all external residue numbers including the inserted residue tag if present Beware of multiple numbers due to the restart of the numbering in every model or even in every chain depending on how crappy the file is To be on the safe side always specify model and chain or use gt RenumberResidues prior to gt IdentifyResiduesNunbers If model is omitted 0 is taken as default value if no chain is specified all chains are processed ResidueNumbers PDB gt IdentifyResidueNumbers Chain gt 0 gt IdentifyAtoms Returns an array with all internal atom numbers of the requested model chain or even residue AllAtoms PDB gt IdentifyAtoms Model gt 1 Chain gt 0 gt IdentifyAtomNumbers Returns an array with all external atom numbers A11AtomNumbers PDB gt IdentifyAtomNumbers Chain gt 0 gt IdentifyAtomTypes Returns an array with all available atom types e g CA CB O that can be used to filter the atoms with gt Get AtomType gt AllAtomTypes PDB gt IdentifyAtomTypes Chain gt 0 gt IdentifyElements Returns an array with all available atom elements e g C N O that can be used to filter the atoms with gt Ge
20. eters for gt new are divided into two groups The first one consists of parameters which can only be given to gt new directly while the others can also be changed after the initialization of the object The default values of all of the following switches can be altered in ParsePDB pm at the very beginning of the code under Default Values e FileName gt file pdb The PDB file including path The extension pdb can be omitted e NoHETATM gt 0 1 default 0 If set to 1 HETATM lines will be filtered out before parsing the file This can be handy if you do not process HETATMs anyway and can save several checks whether a chain contains any ATOMS at all e NoANISIG gt O 1 default 0 If set to 1 SIGATM SIGUJJ and ANISOU lines will be filtered out before parsing the file If you do not process these atoms it saves processing time each atom needs to be compared against two strings only instead of five and avoids checks for you 4 2 Changeable Parameters for gt new All the following parameters can be given to gt new or alternatively changed later on in the program using one of the gt SetVariable methods This is mainly useful to avoid a gt new command that needs three lines to be viewed entirely e ChainLabelAsLetter gt 0 1 default 0 Tells gt WriteChains whether the exported file names should be named with the number of the chain or the actual chain ID letter Check whether gt
21. he parameter FileName is mandatory if it is omitted the method throws a NoFile error Sometimes it is required to split a protein into its models or chains Since that is a standard task there are two methods for it AllModels PDB gt WriteModels extract all models in single files AllChains PDB gt WriteChains extract all chains in single files Most parameters of these two methods are identical with gt Get Header and Footer are included by default If the parameter FileName is omitted the base name of the PDB is taken The suffix defined via gt new ModelSuffix or ChainSuffix is added plus the chain or model number for example file_ci pdb file_c2 pdb file_c3 pdb An array containing the names of the created files is returned gt WriteChains can also write the chain IDs as letters instead of numbers Numbers are the default if letters are desired ChainLabelAsLetter has to be set to 1 via gt new or gt SetChainLabelAsLetter If gt ChainLabelsValid returns false the setting is ignored and chains can only be accessed via their sequential numbers given by gt IdentifyChains If no Model is given to gt WriteChains 0 is taken by default The routine can just process one model at a time To loop over all models the information provided by the method gt IdentifyModels can be used When all models are processed something like FileName gt BaseName Model should be defined as fi
22. ith increasing age sorry about that A11Chains PDB gt IdentifyChains Model gt 0 e gt IdentifyChainLabels Returns an array with the chain IDs of the chains in a certain model e g A B C if three chains are present If no model is given 0 is taken by default Check whether gt ChainLabelsValid returns true before accessing the chains via letters oth erwise you can only use numbers until the IDs have been corrected with gt RenumberChains It is strongly recommended to use the numbers given by gt IdentifyChains to access the chains rather than using letters Al11Chains PDB gt IdentifyChainLabels Model gt 0 e gt IdentifyResidues Returns an array with all internal residue sequence numbers AllResidues PDB gt IdentifyResidues Model gt 0 Chain gt 0 e gt IdentifyResidueLabels Returns an array with all external residue labels e g ALA TYR ResidueLabels PDB gt IdentifyResidueLabels Model gt 0 Chain gt 0 This array represents the sequence of the amino acids in the requested chain If one letter codes are preferred the parameter OneLetterCode may be set to 1 ResidueLabels PDB gt IdentifyResidueLabels Model gt 0 Chain gt 0 OneLetterCode gt 1 Mind that OneLetterCode only makes sense when no hetero atoms are in the PDB that is for example NoHETATM is set to 1 The method will nevertheless return undef for unknown r
23. iven to gt Get it is possible to access and use these readily processed entires directly please see gt Get for more information The line is cut into pieces according to the following scheme taken from the Protein Data Bank Contents Guide version 23i Columns Field Name 1 6 Race 7 11 AtomNumber 13 16 AtomType 17 AltLoc 18 20 ResidueLabel 22 ChainLabel 23 26 ResidueNumber 27 InsResidue 31 38 x 39 46 y 47 54 Z 55 60 Occupancy 61 66 Temp 67 80 Rest The field names of the above table can be given to the various methods to filter the contents for example to retrieve only atoms with a certain AtomType e The chain IDs are checked whether every chain has an ID no duplicate IDs are found within one model Missing or duplicate chain IDs cause a warning message that this can be corrected using gt Renumber Chains In this case the chains can only be accessed via their internal number and the parser does not use or accept the real IDs at all The chain IDs are processed case sensitive Have a look at 1FNT to see that this is really necessary If the same PDB has to be read again after changing something in the file to return to the original version after having renumbered something or for other reasons that require the object to be updated this can be done using gt Reset the object will then be re parsed automatically PDB gt Reset 4 1 Parameters explicitly for gt new The param
24. le name The following additional parameters can be given to gt WriteModels and gt WriteChains e ModelSuffix gt model To change the default suffix m e ChainSuffix gt chain 17 To change the default suffix c e PDB2CHARM gt 1 To write CHARMM Input Files Read further down for more information e CHARMM2PDB gt 1 To convert CHARMM generated PDB format to the PDB standard Read further down for more information Retrieve Certain Information About the Protein e gt GetModel Returns the internal number of an external ModelNumber Model PDB gt GetModel ModelNumber gt 4 e gt GetModelNumber Returns the external number of an internal Model ModelNumber PDB gt GetModelNumber Model gt 1 e gt GetChain Returns the internal number of an external ChainLabel Chain PDB gt GetChain Model gt 1 ChainLabel gt A e gt GetChainLabel Returns the real chain ID for a particular chain undef if no ChainLabel is set ChainLabel PDB gt GetChainLabel Model gt 1 Chain gt 1 e gt GetResidue Returns the internal number of an external ResidueNumber Residue PDB gt GetResidue Model gt 1 ResidueNumber gt 5 e gt GetResidueLabel Returns the label of the residue with a given residue number usually the amino acid The number can be either the external one indicated by the keyword ResidueNumber or the internal one by giving the Residu
25. ll keep the code and the same residue number ATOM 157 N ARG 36 ATOM 158 CA ARG 36 ATOM 159 C ARG 36 ATOM 160 O ARG 36 ATOM 161 CB ARG 36 ATOM 162 CG ARG 36 ATOM 163 CD ARG 36 ATOM 164 NE ARG 36 ATOM 165 CZ ARG 36 ATOM 166 NH1 ARG 36 ATOM 167 NH2 ARG 36 ATOM 168 N SER 36A ATOM 169 CA SER 36A ATOM 170 C SER 36A ATOM 171 0 SER 36A ATOM 172 CB SER 36A ATOM 173 OG SER 36A ATOM 174 N GLY 36B ATOM 175 CA GLY 36B ATOM 176 C GLY 36B ATOM 177 O GLY 36B ATOM 178 N SER 36C ATOM 179 CA SER 36C ATOM 180 C SER 36C ATOM 181 0 SER 36C ATOM 182 CB SER 36C ATOM 183 OG SER 36C Inserted residues A B C column 27 498 21 150 28 984 066 21 087 27 635 153 20 325 26 712 720 19 206 27 052 437 20 435 27 542 490 21 213 26 764 474 20 233 26 120 840 19 039 26 969 566 17 748 26 564 793 17 540 25 474 990 16 634 27 226 771 21 064 25 675 791 20 697 24 655 435 20 746 23 257 539 21 844 22 669 734 21 806 24 772 709 21 554 23 828 196 19 668 22 952 090 19 571 21 796 416 20 232 22 102 103 19 879 23 089 580 21 333 21 367 647 22 291 21 565 085 23 567 22 177 667 24 643 22 026 228 22 641 20 183 437 23 401 20 316 DOWDWAANAOANDORFPNAFAKHWPOWDODOUWDOOONAOOH OH eerrrrrrrrrrrrrrrrrrrrrrrrr O O w O O OQ m A big problem which has to be taken into account are PDB files which use the inserted residue tag to mark alternative residues that is residues superimposed with others like ALA and LEU at the same position f
26. llowing if the dihedral angles are needed PDB gt GetAngles Models PDB gt IdentifyModels foreach Model Models Chains PDB gt IdentifyChains Model gt Mode1 foreach Chain Chains if the residues are not needed to be processed as a whole AtomIndex PDB gt Get Model gt Model Chain gt Chain AtomIndex gt 1 foreach Atom AtomIndex do some stuff with each atom print Atom gt x Atom gt y Atom gt z n or alternatively if the residues need to be processed one by one ResidueIndex PDB gt Get Model gt Model Chain gt Chain ResidueIndex gt 1 foreach Residue ResidueIndex do some stuff with each residue print Residue gt Phi Residue gt Psi n n foreach Atom Residue gt Atoms f do some stuff with each atom print Atom gt x Atom gt y Atom gt z n of foreach Chain of foreach Model 29 15 Error handling The error handling of ParsePDB consists of two levels e warnings which are reported and give hints for possible problems but do not cause the pro cessing to be aborted e errors which are fatal and cause ParsePDB to abort the processing If you are not familiar with the try throw catch methodology have a look at the documentation of Error pm In case you do not make use of this the program dies as any other program does issuing the respective error message However if y
27. most methods if no model number is specified Therefore if no MODEL tags are given or just one MODEL is present the model number may be omitted e Inside each MODEL ENDMDL domain or the whole file if no MODEL is found Every block of ATOM lines is regarded as one chain either until a following TER or a change of the chain ID If HETATMs are following the ATOMs there are two possibilities 1 one chain with ATOMs and HETATMs if the blocks have the same chain IDs both have no chain IDs and are not separated by a TER 2 two chains one with ATOMs and one with HETATM s if the blocks have different chain IDs both blocks have no chain IDs and are separated by a TER Theoretically there is the third possibility of the same defined chain ID and a TER in between However according to the PDB manual a TER marks the end of a chain thus it is given the higher priority in that case A TER within a chain is more or less a violation of the PDB format and in terms of the speed of parsing it is much faster to rely on the fact that a TER can only be at the very end of a chain All chains inside one model are numbered sequentially as chain 0 1 etc and can be requested with this number If the following check of the chain IDs is successful they may also be ad dressed using the actual letter Each line of the protein section is split into its entries during parsing With parameters like AtomIndex and RedidueIndex g
28. o request A Please see Filtering the data for more information on 12 the location indicators in the ATOM line AtomIndex gt 1 This can be a very handy command especially if the parser is only used to retrieve the chains one after another AtomIndex tells gt Get to return not the original ATOM lines from the PDB but the fully parsed entries That is each line atom in this array is a hash in which the respec tive ATOM line is broken down into its parts see also gt new for the used scheme Content PDB gt Get AtomNumber gt 5 Content 0 ATOM 5 CB AVAL A 1 1 224 33 077 8 946 1 00 13 10 Content PDB gt Get AtomNumber gt 5 AtomIndex gt 1 Content 0 gt Race gt ATOM gt AtomNumber gt 5 gt AtomType gt CB AltLoc gt A gt ResidueLabel gt VAL gt ChainLabel gt A gt InsResidue gt gt ResidueNumber gt 1 x gt 21 224 y gt 233 077 z gt 8 946 Occupancy gt 1 0 Temp gt 13 10 Rest gt 1TBE 113 ResidueIndex gt 1 This switch is even more powerful than AtomIndex and returns the parsed information of the latter divided up into residues while still providing the same information of AtomIndex on a per residue basis The parsed information for each individual residue is available and for each its atoms
29. ods can be called directly PDB gt RenumberModels ModelStart gt 3 PDB gt RenumberChains ChainStart gt G PDB gt RenumberResidues ResidueStart gt 5 PDB gt RenumberAtoms AtomStart gt 5 When renumbering chains the letter is processed case sensitive If the letter Z is reached during renumbering a z will be used and 0 9 after non captials have been used up Proteins larger than that require to start again with A Z which leads to duplicate chain labels and some routines e g the retrieval of chains using the chain label will not work then However this is a shortcoming of the PDB standard with the chain label being restricted to one character The parser will not return a chain based on its chain label if multiple possibilities are found If the Start parameters are omitted 1 and A for chains are taken by default Be aware that after renumbering some information in the header and footer does not fit to the atom numbers any more e g SSBOND HELIX CONECT The numeration of chains restarts in each model If the main content should not be altered or for instance every retrieved chain must start at 1 the Start parameters can be given to gt Get which then renumbers only the filtered content but not the main hash itself That way one can extract all alpha carbon atoms and still have a sequential numbering Content PDB gt Get AtomType gt CA AtomS
30. om number as the atom before PDB gt RenumberAtoms IgnoreTER gt 1 Please see also further down chapter 12 Filtering the data 22 11 Generating CHARMM input files Working with CHARMM usually is a pain somewhere where you don t want it At least the parser is able to solve some of the problems The main pain is that CHARMM can only process one chain at a time Although it is possible to give the PDB2CHARMM gt 1 parameter to gt Get and every related method you will use gt WriteChains most of the time If you use another method please remember to work with only one chain to produce a valid input file CHARMMFiles PDB gt WriteChains PDB2CHARMM gt 1 To convert the CHARMM PDB output files to standard format use the realted method CHARMM2PDB equivalently with gt Write or gt Get 11 1 What it does e All chains are written into seperate files e All residues are renumbered sequentially starting at 1 in each chain e HIS is replaced with HSD the last O is renamed OT1 OXT is renamed OT2 All atom types are replaced according to this list http www bmrb wisc edu ref_info atom_nom tbl When more than one chain is detected a warning will be issued that CHARMM is unable to process more than one chain at a time 11 2 What it does not do e Terminal acetyl groups Sometimes proteins have a terminal acetyl group on the amino end which needs to be patched to work with CHARMM
31. or example Without checking the atom distances or the header remarks it is not possible to distinguish between inserted residues and alternative residues If it is important that the chain has no such alternatives they can be reliably removed using the method gt RemoveInsertedResidues It loops over all residues with an inserted residue tag checks the atom distances with the neighbouring groups and removes only residues if they are superposi tions PDB gt gt RemoveInsertedResidues 25 If it is really important that there are no superpositions and the PDB file might be crappy enough that the inserted residue tags are not reliable the parameter Intensive can be set true what will then cause a check off all residues in the protein PDB gt RemovelInsertedResidues Intensive gt 1 If superpositions are found the residue with an InsResidue tag is removed it does not matter whether the sequence is 36 36A or the other way round If none of them possesses an InsResidue marker the second one is removed In all cases a warning is issued stating the ResidueNumbers the external ones for easy comparison in the PDB file and which of them was removed 12 3 Alternative Atom Locations If some atoms showed a deviation during the structural elucidation of the protein via NMR or X Ray the alternative locations are sometimes stated within the same model instead of in a second one The alternate location indicator i
32. ou catch the error in the main program it will survive and can handle the error appropriately or simply close open file handles before dieing off as well 15 1 Which error can happen where This listing can be helpful to decide for which error to check after several actions in your script As you can see this is merely necessary after gt new whose only error is due to user incompetence and gt Parse After that the PDB and its parser should work smoothly together gt new NoFile If no file name was given or the file has not been found FileNotFound If the file given via FileName was not found e gt Parse and gt Reset I0Error If the file could not be opened e g due to a permission problem CorruptFile If the file is empty or no ATOM lines have been found in it e gt Get and gt Write and all methods using them UnknownElement If a specific element was requested via Element gt that has not yet been specified in ParsePDB pm BadParameter If an external identifier and the respective internal identifier were given at the same time e g Chain and ChainLabel It is no problem of course to mix for instance Residue and AtomNumber 30 e gt Write NoFile If no file name has been given to gt Write IOError If the file could not be opened e g due to a permission problem 15 2 Methods for Error Handling e gt Warning Returns true if a warning has been issued Returns false if no wa
33. returned if it is tried to access them via the AtomNumber For residues the num bers may also contain letters in case of inserted residues like 34A In most of the PDB files only one model exists and this is usually not explicitly named thus it possesses no external number The biggest problem are chain IDs Very often chains are not named at all empty chain ID or double IDs exist like A B A B In both cases ChainLabel cannot be used to access the chains Every used model is checked right after parsing for those errors Each use of ChainLabel triggers a check whether the chain IDs of the current model have passed this check and are valid If they are invalid a warning is issued the parser stops processing and returns undef It is also possible for the user to determine whether external chain identifiers can be used for a model if PDB gt ChainLabelsValid Model gt 0 else As usual the method defaults to model 0 if no parameters are given If the chain labels are invalid the chains need to be renumbered before they can be accessed via ChainLabel 16 8 Write the Whole or Parts of the PDB To write out specific parts of the PDB the method gt Write is used All parameters are identical with gt Get so it is possible to write out for example only certain atom types or all alanine residues and so on PDB gt Write FileName gt file pdb Header and Footer are included by default T
34. rning has been issued if PDB gt Warning print Uh oh n e gt GetWarnings Only returns the warning messages as array no automatic output if PDB gt Warning QWarnings PDB gt GetWarnings print Warnings e gt PrintWarnings Prints out all warning messages and also returns an array if PDB gt Warning Warning PDB gt PrintWarnings To check for specific warnings e g to tell the user user explicitely why his crappy file is crap indeed one can use the following methods e g if PDB gt Warning NoChainLabel print Watch out e gt Warning_ NoENDMDL If a MODEL without a corresponding ENDMDL has been found e gt Warning_NoChainLabel If no chain ID is given at all e gt Warning MultipleChainLabel If a certain chain ID has been found more than once This can lead to problems when using let ters to read chains gt Get Chain gt B If this warning has been reported gt WriteChains ignores the setting of gt ChainLabelAsLetter and uses numbers 31 e gt Warning_ UnknownModel If the requested model is not defined e gt Warning_UnknownChain If the requested chain is not defined e gt Warning UnknownChainLabel If the given chain ID could not be found in the PDB e gt Warning UnknownAminoAcid If a 1 or 3 letter code given to AminoAcidConvert has not been recognized e gt Warning InvalidAminoAcid If a code given to AminoAcidConvert has not 1 or 3 letters
35. s usually A or B but has also been found as 1 and 2 The parser detects the used method automatically ATOM 143 N SER 18 7 902 9 621 14 878 1 00 8 15 ATOM 144 CA SER 18 6 436 9 552 14 567 1 00 10 68 ATOM 145 C SER 18 6 287 9 730 13 049 1 00 10 82 ATOM 146 O SER 18 7 124 10 246 12 306 1 00 11 90 ATOM 147 CB SER 18 5 687 10 570 15 337 1 00 14 98 ATOM 148 OG ASER 18 6 225 11 165 16 468 0 50 14 28 ATOM 149 OG BSER 18 6 181 11 830 15 086 0 50 9 85 Alternative atom location A and B column 17 The alternative atom locations can be removed using the AtomLocations keyword via gt new or gt SetAtomLocations PDB gt SetAtomLocations First This will enable the filtering of the additional atoms and return only the first one If these atoms are not needed anyway they can be removed entirely with gt RemoveAtomLoactions and the same keyword which denotes the atoms which are kept PDB gt RemoveAtomLocations AtomLocations gt First This deletes all atoms with multiple locations and keeps just the first one See gt new for more infor mation If no parameter is given the default is taken which is set to All and therefore does not do anything 26 13 Other methods 13 1 gt GetFASTA Returns an array with the FASTA format lines A model number has to be provided otherwise model 0 is taken as default To process only a single chain Chain or ChainLabel can be specified A stan
36. sidueIndex and AtomIndex respectively PDB gt GetElement e gt GetCoordinates Returns a 2 dimensional array with the coordinates of the requested atoms all parameters like Model Chain and the filter commands are given to gt Get and all lines not beginning with ATOM e g TER are ignored CAUTION All coordinates are returned regardless of how many chains are retrieved from gt Get So be careful to specify a particular one Coordinates PDB gt GetCoordinates ChainLabel gt A print First Atom x Coordinates 0 gt x y Coordinates 0 gt y z Coordinates 0 gt z n e gt GetAngles 19 Returns a hash with the and Y angles of a chain or a residue Remember that for the first residue of a chain no angle is defined whereas the last one does not have a angle the angles are then given as 360 The angles are returned in an array if only one residue was processed only the first element is filled The routing calculated the bond distance between two residues If it is too big for example between two chains or if a residue has been removed for same reason the respective angle is given as 360 ZAngles PDB gt GetAngles Residue gt 2 ZAngles PDB gt GetAngles ResidueNumber gt 2 print Phi angle Angles 0 Phi n print Psi angle Angles 0 Psi n n If a certain model or chain is processed with the method the calculated data is only returned bu
37. t Element gt To refine the filter pattern you need to edit the filter variables at the very beginning of ParsePDB pm AllElements PDB gt IdentifyElements Model gt 0 Chain gt 0 6 Count Number of Subgroups of the Protein e gt CountModels Returns the number of models in the PDB ModelNumber PDB gt CountModels e gt CountChains Returns the number of chains in a model If no model is given 0 is taken by default ChainNumber PDB gt CountChains Model gt 2 e gt CountAtoms Returns the number of atoms in the specified part of the protein If no model is given 0 is taken by default ATOM and HETATM lines are treated equally if you do not want to process HETATMs filter them out via NoHETATM 1 see gt new AtomNumber PDB gt CountAtoms Model gt 0 Chain gt 2 If you need HETATMs but want to determine the number of ATOMs or HETATMs in one model or chain you can use the parameter Race gt ATOM AtomNumber PDB gt CountAtoms Model gt 0 Chain gt 2 Race gt ATOM e gt CountResidues Returns the number of residues If no model is given 0 is taken by default ResidueNumber PDB gt CountResidues Model gt 0 Chain gt 1 7 Retrieve a Part of the PDB with gt Get The method gt Get is the universal tool to retrieve content from the parsed PDB The information gathered via the Identify or the GetIdentifier methods can be fed into gt Get to
38. t not saved for later use Sometimes it is more handy and faster to simply calculate all angles in one go and retrieve them later on with other information This can be achieved by calling the method without any parameters gt GetAngles to calculate and save all dihedral angles The computed angles are then included in the ResidueIndex which can be retrieved from gt Get gt GetSection To fetch certain information from header or footer All lines starting with the given pattern are returned Section PDB gt GetSection CONECT gt GetResolution Returns the resolution in Angstroms that was used for building the model If no REMARK 2 field or no resolution is given undef is returned Resolution PDB gt GetResolution gt GetHeader Returns the header everything until the first MODEL or ATOM The parameter MinHeader gt 1 can be given to get only a minimal header Header PDB gt GetHeader gt GetMinHeader This is a shortcut to retrieve a minimal header MinHeader PDB gt GetMinHeader 20 e gt GetFooter Returns the footer everything after the last ATOM HETATM TER or ENDMDL The param eter MinFooter gt 1 can be given to get only a minimal footer Footer PDB gt GetFooter e gt GetMinFooter This is a shortcut to retrieve a minimal footer MinFooter PDB gt GetMinFooter 10 Renumbering Entries in the PDB To renumber the protein globally the following meth
39. t one is found This could have been solved faster by a hash key for each external num ber but this would raise lots of problems with inserted residues alternative atom locations crappy PDB files and would extremely slow down many methods like Renumber for example and the idea was therefore discarded The access of whole chains is the fastest of all compared to models residues and atoms In other words the parser is quite optimized for that Thus extracting each single atom one by one would be about an order of a magnitude slower than to extract the whole chain and looping over the lines in it If a residue is requested the parser has to determine the chain then looks up the atoms which belong to that residue extracts and filters the lines and returns them In any case it speeds up processing if the chain is specified so rather than looping over the whole model loop over each chain and process the residues or atoms Two of the most powerful switches of gt Get are AtomIndex and ResidueIndex They provide the possibility to extract the readily parsed atom lines instead of the original strings and break a chain down into residues in just one parser query Therfore using these switches you can spare cutting the line yourself with substr or requesting lots of information like AtomType AtomNumber and so on with the respective gt Get method As a ready made code snippet the fastest way to access each atom line in a file is the fo
40. tance an array with AtomTypes can be extracted and while looping over it the current index of the array e g AtomTypes 5 can be used to extract the respective atom which possesses the internal number Atom 5 Using the internal numbers is much more favorable than using the external identifiers It is easier to program the numbers always start at 0 and are always sequential In addition to that one does not need to worry about missing ChainIDs and format errors like that Furthermore processing is much faster something like 10 times and more if it comes to accessing residues or atoms since the external identifiers have to be translated and this translation may moreover be prone to bugs like finding only the first match if an identifier is used twice for some reason Inserted residues have the same external number and can only be accessed via the internal number 4 Initialization of a PDB Object The parser can be invoked by the command ParsePDB To create a new object the method new is required PDB ParsePDB gt new FileName gt File PDB gt Parse gt Parse reads the whole PDB and splits it up into its subgroups The routine for this is as follows e Look for blocks divided by MODEL tags Every MODEL tag causes the parser to regard the following block as a new model a terminating ENDMDL is not desperately needed If no MODEL tag is found the protein is regarded as model 0 This is also the default value for
41. tart gt 5 Only the returned array is renumbered leaving the internal data untouched 21 10 1 Renumbering Inserted Residues When inserted residues are contained in the PDB they usually have the same residue number as the residue before with an added inserted residue tag e g residue 21 and 21A If the insertion codes should be considered that is if you want to preserve this numbering scheme explicitely you can set the parameter KeepInsertions to 1 PDB gt RenumberResidues KeepInsertions gt 1 Since 1 is the default for KeepInsertions it is more likely needed when you need to turn it off if you want to remove the inserted residue tags and have each residue numbered with a different number Be aware that inserted residues might be superpositions instead of insertions gt RemoveInsertedResidues checks the distances of the atoms of each residue with an InsResidue tag and removes only the ones which are indeed superpositions whereas real insertions are kept That means after you have exe cuted RemoveInsertedResidues you can safely renumber the residues discarding their InsResidue tag with PDB gt RenumberResidues KeepInsertions gt 0 10 2 Ignoring the TER According to the PDB manual a TER line has its own atom number i e it counts as an atom If you for some reason need all atoms to be numbered sequentially without counting the TER you can set IgnoreTER true The TER line will then have the same at
42. ted if the atom types are unambiguous That is if one type is found more than once the key AtomTypes of this residue will be removed since the integrity of the atom type cannot be assured If used in a script the absence of this key points to problems with the atom types and the array can be used instead to find out what is wrong in the PDB The following example shows the result of the query Content PDB gt Get ResidueNumber gt 1 ResidueIndex gt 1 for a residue containing two atoms with the atom types CA and 0 Content 0 gt AtomTypes gt CA gt gt AltLoc gt A gt AtomNumber gt 3 gt AtomType gt CA gt ChainLabel gt A gt InsResidue gt Occupancy gt 0 0 gt Race gt ATOM gt ResidueLabel gt ALA gt ResidueNumber gt 1 Rest gt Go rt Temp gt 27 84 gt x gt 10 309 gt y gt 753 910 2Z gt 225 295 Fs 0 gt gt AltLoc gt A gt AtomNumber gt 4 gt AtomType gt 0 gt ChainLabel gt A gt InsResidue gt Occupancy gt 0 0 gt Race gt ATOM gt ResidueLabel gt ALA gt ResidueNumber gt 1 Rest gt Or 2 Temp gt 27 80 x gt 9 414 gt y gt 763 366 z gt 24 654
43. the code under default variables If MinHeader is given the Header keyword can be omitted Footer gt 0 1 Include the footer true false By default the footer is NOT included MinFooter gt 0 1 Include just a minimal footer true false This is false by default If set true only the line beginning with END is returned If additional other lines are needed this can be changed in ParsePDB pm at the beginning of the code under default variables If MinFooter is given the Footer keyword can be omitted ModelStart gt value Renumber the Models in the returned content This does not affect the main hash that is the content is renumbered after it was extracted To renumber globally see RenumberModels instead 11 e ChainStart gt letter Renumber the chain IDs of the returned ATOM and HETATM lines starting with the given letter The letter is processed case sensitive If the letter Z is reached during renumbering the next chain will be a continuing with non capital letters After reaching z the number 0 9 are used Proteins larger than that require to start again with A Z This does not affect the main hash that is the content is renumbered after it was extracted To renumber globally see RenumberChains instead e ResidueStart gt value Renumber the residue numbers of the returned ATOM and HETATM lines sequentially starting at value or 1 by default See also RenumberResidues e AtomStart
44. to get rid of them PDB gt SetAtomLocations All e Verbose gt 0 1 default 1 Turn verbose mode off or on If enabled all warnings i e wrong ChainLabel are printed PDB gt SetVerbose 1 5 Identify Subgroups of the Protein for the Use in Loops The Identify Methods return the requested identifiers of a specific domain in the PDB This domain model chain residue can be narrowed using the internal or external identifiers The returned list can then be used in a loop to use it with gt Get e gt IdentifyModels Returns an array with the internal identifiers of the models i e 0 1 2 if three models are present A11Models PDB gt IdentifyModels e gt IdentifyModelNumbers Returns an array with the external identifiers of the models i e 1 2 3 if three models are present These numbers may change if the file is renumbered via gt RenumberModels Al1ModelNumbers PDB gt IdentifyModelNumbers e gt IdentifyChains Returns an array with the internal identifiers of the chains in a certain model i e 0 1 2 if three chains are present If no model is given 0 is taken by default These numbers represent the chains in the order as they occur in the PDB Accessing the chains via their sequential numbers is in any case more secure than using the chain IDs and works definitely with EVERY file no matter how crappy its format turns out to be I m sorry if I keep on repeating myself it comes w

Download Pdf Manuals

Related Search

ParsePDB.pm

ParsePDB.pm

Contents

Download Pdf Manuals

Related Search

Related Contents