Home
The Proteus software for computational protein design
Contents
1. 24 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 TL 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 else eval firstI 1 eval lastI 9999 end if Positions J to process if exist_positionJ TRUE then eval firstJ decode positionJ eval lastJ decode positionJ eval matrixfile dat matrix_IJ_ positionI _ positionJ dat else eval firstJ 1 eval lastJ 9999 end if Parameters for this job MYLIB parameters str Debugging output if debug 1 then set echo true end else set echo false end end if Read topology and parameters CPD 1lib toppar str Read PSF struct setup psf end Read coordinates coor setup pdb Show unknown coordinates write coor sele not known end Store initial coordinates to ref arrays vector do refx x known vector do refy y known z known vector do refz Non bonded energy options CPD lib nb str Read parameters for surface area energy term store in fbeta 25 99 00 01 02 03 04 05 06 07 08 09 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
2. e phia str sets the atomic surface energy coefficients e refener str sets the reference or unfolded state energies for each amino acid type these should be chosen consistently with the other energy parameters e oneletterLIGA str define one letter codes for any active or inactive ligands e other parameters are set in the CPD lib stream files including nb str toppar str oneletter str but do not usually need to be changed 3 3 The energy matrix A flowchart for the entire calculation is shown in Fig I System preparation for CPD The initial build step above was a generic preparation for XPLOR A second more complex step starts from the generic build and prepares the system specifically for a CPD calculation It can can be run using a bash script PROJ matrix setup sh It uses a second XPLOR script CPD inp setup inp The user edits the file MYLIB sele str to indicate which residues will be active they mutate inactive they are flexible but don t mutate or frozen The main task performed by setup inp is to modify the active residues by grafting patching all possible sidechain types onto their backbone Ca The resulting residues are referred to as giant residues For an active ligand the corresponding operation is to simply read in preexisting PSF files for the different ligand types so that they coexist in the system The allowed residue types are listed in a file MYLIB mutation_space dat and mig
3. eval rotafile1 local Rota encode 1 _ baal _ encode rot1 pdb REMARK position 1 typel bbaal rotamer Taal Trot if nativerot 1 then if aa1 S bbaal then if roti gt nbrotlib1 then REMARK position 1 typel bbaal native rotamer Trot end if end if end if write coor output rotafilel sele resid 1 and resn aai and not store9 end Restore initial coordinates for sidechain I coor init sele resid 1 and resn aai and not store9 end vector do x refx resid 1 and resn bbaal and not store9 vector do y refy resid 1 and resn bbaal and not store9 refz resid 1 and resn bbaai and not store9 vector do z Increment rotamer counter for residue I eval roti roti 1 End loop lroti end loop lroti Restore initial backbone resname for residue I vector do resname bbaai resid 1 and store9 End loop laal end loop laal End loop 11 end loop 11 Close matrix output file close matrixfile end stop 23 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 7 3 remarks remarks remarks remarks remarks remarks remarks remarks remarks remarks inp matrixIJ inp matrizclJ inp Compute energy matrix for protein design by Anne Lopes Marcel S
4. loop lrot2 Read residue J sidechain rotamer CPD lib readJ str Recreate store3 if ligand coordinates change vector ident store3 segid LIGA and tag and known 30 349 350 351 352 353 354 355 360 361 362 363 364 365 366 367 368 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 397 398 Second distance filter CPD lib second_distance filter str Test minimum distance between sidechains I and J if dmin lt secondfilter then if dmin lt thirdfilter then Minimize residue I J sidechain rotamers CPD 1lib minIJ str end if end minimization Compute residue I J sidechain interactions CPD lib interactionsIJ str end if fend second filter Restore sidechain I coordinates prior to IJ minimization coor swap sele resid 1 and resn aa1 and not store9 end Restore initial coordinates for sidechain J coor init sele resid 2 and resn aa2 and not store9 end vector do x refx resid 2 and resn bbaa2 and not store9 vector do y refy resid 2 and resn bbaa2 and not store9 refz resid 2 and resn bbaa2 and not store9 vector do z Increment rotamer counter for residue J leval rot2 rot2 1 End loop lrot2 end loop lrot2 Restore initial backbone resname for residue J vector do resname bb
5. 48 OMYLIB phia str Read reference energies store in harm OMYLIB refener str Read one letter codes for amino acids and any ligands store in stringl CPD lib oneletter str MYLIB oneletterLIGA str Define selections of interest MYLIB sele str storel store2 store3 storeb store6 store7 stores store9 inactive residues active residues ligand center atom ligand fitting atoms protein fitting atoms ligand backbone protein backbone whole backbone if gb 1 then Begin loop 11 Read solvation radii for backbone CPD lib batom_read_bb str end if Get current resid vector show resid id i eval 1 decode result Get current resname vector show resname id i eval aa1 result Save current backbone resname eval bbaal faal Get current segid vector show segid id i eval segl result Get current b factor vector show b id i Modified to include ligand Feb 2013 for i in id resid firstI lastI and name CA and storel or store2 or store3 loop 11 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 TT 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 eval b1 result Find type if b1 1 then eval typel inactive el
6. 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 eval firstI decode positionI eval lastI decode positionI eval matrixfile dat matrix_I_ positionI dat else eval firstI 1 eval lastI 9999 end if Parameters for this job OMYLIB parameters str Read topology and parameters CPD lib toppar str Read PSF struct setup psf end Read coordinates coor setup pdb Show unknown coordinates write coor sele not known end Store initial coordinates to ref arrays vector do refx x known vector do refy y known z known vector do refz Non bonded energy options CPD lib nb str Read parameters for surface area energy term store in fbeta OMYLIB phia str Read reference energies store in harm MYLIB refener str Read one letter codes for amino acids and any ligands store in stringl CPD lib oneletter str MYLIB oneletterLIGA str Define selections of interest MYLIB sele str storei inactive residues store2 active residues 19 96 97 98 99 00 01 02 03 04 05 06 07 08 09 117 118 119 120 121 122 123 124 125 126 127 128 129 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 store3 ligand center atom stored ligand fitting
7. LIGROTLIBDIR if nativerot 1 then eval rotnatdir1 LIGROTNATDIR end if else eval rotlibdir1 ROTLIBDIR if nativerot 1 then eval rotnatdir1 ROTNATDIR end if end if if sa 1 then Initialize storage vector modified to include ligand Feb 2013 vector do vx 0 0 all Surface of position I local backbone save in vx surf rh2o0 rh20 accu accu mode access sele mame CA and not segid LIGA or store3 and resid 1 around sabbcutoff and store9 and not name H end vector do vx rmsd name CA and not segid LIGA or store3 and resid 1 around sabbcutoff and store9 and not name H end if Begin loop laal for Taal in mutation_spacel loop laal Get number of library rotamers eval nbrotlibfilel rotlibdir1l Nbrot aal dat nbrotlibfilel eval nbrotlib1 nbrot eval nbroti nbrotlib1 Write number of rotamers eval nbrotfilel local Nbrot encode 1 _ baal dat set display nbrotfile1 end eval string nbrot display eval string nbrot1 close nbrotfilel end set display 0UTPUT end if nativerot 1 then if aa1 bbaal then Handle residue I sidechain native rotamers CPD 1lib nativerotI str end if 21 196 197 198 199 200 201 202 203 204 205 207 208 209 220 221 222 223 224 225 226 227 228 229 231 2
8. Polytechnique 2005 2011 remarks remarks matrix loop I precalculations remarks optional argument positionl remarks this file matrixl inp remarks Should be run in the user s matrix directory Updated by TS SP and KD June 2013 added support for ligands added support for acid base activity Updated by TG January 2011 separate scripts for II loop and IJ loops inactive active and ligand positions in the same loop mutation space stream file native rotamer support GB ACE and HCT support ff99SB force field support changed rotamer library format Pick Rest Nbrot Chis Rota Updated by TG November 2009 increased modularization stream files giant residues at all active positions instead of moving giant residues 999 and 998 around improved the surface calculation decomposition 3 body backbone I J effects were ignored Updated by TS and MSAB January 2009 Cbeta moved into sidechain took into account the desolvation of backbone by the sidechain of 999 applied burial correction to the sidechain sidechain ASA interaction Set default matrix output file eval matrixfile dat matrix_I dat Positions I to process if exist_positionI TRUE then 18 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
9. ao yu y asi 32 EAp jEBp AF E 5 E dij TQ snl 33 i Ap jE Bp These equations generalize Eqs E 6 which correspond to a single pair P M x M M being the whole macromolecule The factor 5 in Eq 32 corrects for double counting of i j and j i terms 0 is the Kronecker symbol 8 2 4 Crystal symmetry The GB model has been implemented for UN with symmetry crystallographic or other wise for details see Moulinier et al 2003 8 3 Syntax 8 3 1 GB energy terms The GB solvation energy is divided into a self energy term and an interaction energy term corresponding to the two terms on the right of Eq B Egpsoiv EcbseLr EGBINT They are available to the user through the variables GBSE and GBIN They are activated by the flags statement in the usual way flags include gbse gbin end They are inactive by default 8 3 2 Setting the GB options All the parameters of the GB solvent model are under user control with sensible defaults The setup of the atomic volumes is described further on The other GB parameters are set up with the nbonds subcommand NBONDs lt nbonds statement gt lt gborn nbonds statement gt END applies to elec trostatic van der Waals and GB energy terms lt gborn nbonds statement gt GBACE GBHCT Excusive flags activating the GB ACE or the GB HCT model Default inactive WEPS lt real gt Solvent dielectric constant Default 1 if GB is
10. atoms store6 protein fitting atoms store7 ligand backbone store8 protein backbone store9 whole backbone if gb 1 then Read solvation radii for backbone CPD 1lib batom_read_bb str Save native b values in vz vector do vz bsolv all end if Begin loop 11 modified to include ligand Feb 2013 for i in id resid firstI lastI and name CA and storel or store2 or store3 loop 11 Get current resid vector show resid id i eval 1 decode result Get current resname vector show resname id i eval baal result Save current backbone resname eval bbaal faal Get current segid vector show segid id i eval segi result Get current b factor vector show b id i eval b1 result Find type if b1 1 then eval typel inactive elseif b1 2 then eval typel active end if if segi LIGA then eval typel ligand end if Determines mutation space eval mutation_spacel local Mut encode 1 _ typel _ bbaal dat 20 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 TT 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 Determines rotamer directories if typel ligand then eval rotlibdir1
11. ll la that control the calculation of an energy matrix for the system of interest 3 3 a C program proteus with a lowercase p for exploring the space of sequences and conformations using various search algorithms including a Monte Carlo method 4 a collection of perl and shell scripts that automate various steps To use this manual the reader should first carefully read the Proteus article Simonson et al J Comp Chem 2013 4 which includes details on the theoretical methods and the energy function In addition the reader should have a copy of the XPLOR manual and preferabl some familiarity with either XPLOR or a similar program such as CNS 2 5 or Charmm The XPLOR manual is available online as of today at http www pasteur fr recherche unites Binfs xplor manual or http www csb yale edu userguides datamanip xplor xplorman htmlman html A slightly older version probably sufficient is provided as part of the Proteus distribution The maual can also be purchased in book form A Brunger Yale University Press We assume the reader is familiar with Unix The distribution files will work best in a linux environment with an Intel processor and an Intel compiler although compilation should not be necessary for Intel based machines and using Gnu compilers should not be difficult Here we first describe the directory structure in the Proteus distribution and the main files that are used in
12. or store8 parameters str remarks remarks Define parameters remarks Force field toph19 or ff99SB eval ff ff99SB 14 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Al 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 61 GB HCT solvation model eval gbhct 1 GB HCT offset eval offset 0 00 GB HCT lambda eval lambda 1 0 GB ACE solvation model eval gbace 0 GB ACE smooth eval smooth 1 3 GB HCT ACE solvent dielectric constant eval weps 80 0 Non bonded model notice NB parameters depend on the GB model Dielectric constant eval eps 4 0 Inhibit distance eval inhibit 0 0 if GB 0 then Non bonded cuton eval ctonnb 10 0 Non bonded cutoff eval ctofnb 12 0 Non bonded cutnb eval cutnb 14 0 Non bonded tolerance eval toler 0 25 end if if GB 1 then Non bonded cuton eval ctonnb 979 0 Non bonded cutoff eval ctofnb 989 0 15 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03
13. 04 05 06 07 08 09 10 11 Non bonded cutnb eval cutnb 999 0 Non bonded tolerance eval toler 999 0 end if Distance filters First distance filter CB CB distance eval firstfilter 30 0 Second distance filter minimum sidechain sidechain distance for interaction eval secondfilter 12 01 Third distance filter minimum sidechain sidechain distance for minimization eval thirdfilter 3 0 Increased cutoff for second distance filter eval increasedcutoff 12 1 GB exclusion cutoff eval dcut 3 0 l Surface Area Term Surface area term zero not included eval sa 1 Surface backbone cutoff eval sabbcutoff 14 0 Surface IJ sidechain distance filter eval saijfilter 7 0 Surface correction factor for buried residues eval buriedfactor 1 0 Surface correction factor for exposed residues eval exposedfactor 1 0 Surface burial fraction threshold eval threshold 0 3 Surface calculations probe radius eval rh20 1 5 Surface calculations accuracy eval accu 0 005 16 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Al 42 43 44 45 46 AT 48 49 50 51 52 53 54 55 56 57 58 59 60 61 Restrain
14. 10 ij ij A a a i lt j i Ai The quantity in parentheses will be denoted dE since for a given conformation it depends only on 7 The last quantity on the right can be written V AE Vek V2 ifiAn KA Y Ve ifi n 11 kAn Grouping the second and third terms on the right of 9 and rearranging the first we obtain E 0 A Em nep Obn OES mt Ob OES ra ri Vn DAE 2 ar aa E OE Dra 7 i lt j Ln in i in in with l OA LE dEimb 2 13 o 13 b b E a AA CA if A Blt aay Dee n DA Est AEs R A Dies E gt if AES gt Emin 14 35 int b The quantities b and dE requires only a loop over all solute atoms In 12 the derivatives of AEH are the same for GB ACE and GB HCT can be precalculated so that obtaining the force on atom n 1 DAE THs a in a Tin Ofin 2 rij 3 2 4 a Ha ra bibj expla 2 uuh ennL ri ne 2 AE TU E ij 16 2 r2 j i L bib exp Za GB ACE self energy term The self energy and the associated forces depend on the GB variant With GB ACE ton 2 r2 V plo 5 ij ik j ij 4 4 4 _ expl 4 EP fard u4 rt 17 Po Ori Wij wai E an ri Mi ig Hig r4 17 The parameters wij Gij Hij are defined by 1 4 Qir arctanQir 18 n TUS arctanQ vR 18 3 Qix arctanQ x 2 2 RS 19 Oik 3 al fin Qik darctanQa tk 2 dik E k o 20 Ce Og De 20 2 1 R 21 lx
15. 3 244 245 246 247 248 eval nbrotlib1 nbrotlib eval nbrotnat1 nbrotnat end if end if Determines rotamer space eval rotamersi local EnrFltr encode 1 _ baal dat Begin loop lrot1 for rot1 in rotamers1 loop lrot1 Read residue I sidechain rotamer CPD lib readI str Recreate store3 if ligand coordinates change vector ident store3 segid LIGA and tag and known if sa 1 then INDEPENDENT OF AA1 ROT1 COULD BE PUT HIGHER IN THE SCRIPT Initialize storage vector vector do vx 0 0 all Surface of position I local backbone save in vx Modified to include ligand Feb 2013 COULD BE READ FROM LOCAL STORAGE surf rh20 rh20 accu accu mode access sele mame CA and not segid LIGA or store3 and resid 1 around sabbcutoff and store9 and not name H end vector do vx rmsd name CA and not segid LIGA or store3 and resid 1 around sabbcutoff and store9 and not name H Precompute surface areas CPD lib interactionsI_casa str end if Begin loop 12 Modified to include ligand Feb 2013 for j in id resid firstJ lastJ and name CA and storel or store2 or store3 loop 12 Get current resid vector show resid id j eval 2 decode result 28 249 250 251 252 253 254 255 260 261 262 263 264 265 266 267 268 270 271 272 273 274 275 276 277 278 2
16. 32 233 234 235 236 237 238 239 240 241 242 243 244 245 Initialize rotamer counter for residue I eval roti 1 Begin loop lrot1 while rot1 lt nbrot1 loop lrot1 Place residue I sidechain rotamer unless this is a native rotamer OR a ligand rotamer eval doplacel 1 if nativerot 1 then if faal bbaai then if roti gt nbrotlib1 then eval doplacel 0 end if end if end if if typel ligand then read coordinates do not fit eval doplacel 0 eval rotafile1 rotlibdir1 Rota baal _ encode rot1 pdb coor sele segid LIGA and resnam faal CO rotafilel vector ident store3 segid LIGA and tag and known end if if doplaceI 1 then CPD lib placel str end if if gb 1 then Compute residue I sidechain solvation radii which will be used during minimization CPD lib batom_sclI str end if Minimize residue I sidechain rotamer CPD lib minI str if gb 1 then Update residue I sidechain solvation radii CPD lib batom_scI str end if Compute residue I sidechain interactions CPD lib interactionsI str 22 246 247 248 249 250 251 252 253 254 255 257 258 259 260 261 262 263 264 265 270 271 272 273 274 275 276 277 278 279 281 282 283 284 285 286 Write residue I sidechain rotamer coordinates
17. 79 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 Get current resname vector show resname id j eval aa2 result Save current backbone resname eval bbaa2 aa2 Get current segid vector show segid id j eval seg2 result Get current b factor vector show b id j eval b2 result Find type if b2 1 then eval type2 inactive elseif b2 2 then eval type2 active end if if seg2 LIGA then eval type2 ligand end if Determines mutation space eval mutation_space2 local Mut encode 2 _ type2 _ bbaa2 dat Determines dihedral pick restraints and rotamers directories if type2 ligand then eval rotlibdir2 eval rotnatdir2 else eval rotlibdir2 eval rotnatdir2 end if Initialize marker for a pair to be computed eval mark 1 First distance filter OCPD 1ib first_distance_filter str Test marker for a pair to be computed if mark 1 then if sa 1 then Initialize storage vector vector do vx 0 0 all Surface of position J local backbone save in vx COULD BE READ FROM LOCAL STORAGE surf rh20 rh20 accu accu mode access sele mame CA and not segid LIGA or store3 and resid 2 LIGROTLIBDIR LIGROTNATDIR ROTLIBDIR gt ROTNATDIR gt Modified to include l
18. Ar a ES 2 2 T 2 A fe 22 dk 3 22 ai Max a R Ry 23 Tin LR Hik 512 1 9 3 2 3 Ri 24 4 _ ye Sak 25 GB HCT self energy With GB HCT the self energy contribution EY is given by hd 1 1 S 1 1 1 L R 1 1 la eo Lin Um 4 U2 12 2ra Ur 4ra L U2 where Lig 1 if rak Ry lt Ri Le H f fip Rk lt Rk lt Tik Rp Lik Te Rk E Ris Res tike ie 27 Up 1 if rak Rk lt Ri U Tik Ry if Ri lt T Ry 28 36 The corresponding gradient is given by 4 DERE E A a 1 1 1 Le Le 29 Tn Orig Tx L2 U2 Arip LUZ L3 2U LS 1 ENE A Lin ik R _ 1 _ Lig ik ik ik ik ik k ik ik k 2 2 with Li OL x Orix Uj 0U x Orix The radii Ry are calculated from the atomic volumes as in Eq 25 then reduced by a scaling factor Sy lt 1 which depends only on the chemical type of atom k Reasonable values are given in Table 1 of f This basic model was modified by Onufriev et al to improve performance for proteins The self energy in Eq 6 is replaced by 2 AES 30 1 b Ra AD ER a 31 ki In other words the atomic radius R is reduced by a constant offset po the self energy contri bution Ese is scaled by a constant factor A and the solvation radius b is reduced by a constant offset 6 The values A 1 4 po 0 09 A and 6 0 15 A were used in 8 2 3 Pairs of interacting groups In structure refinement it is often necessary to use
19. The Proteus software for computational protein design a preliminary user s manual last updated August 30 2013 Thomas Simonson Department of Biology Ecole Polytechnique Palaiseau France thomas simonson polytechnique fr Proteus is available from http biology polytechnique fr biocomputing Acknowledgements Proteus is described in the following article Thomas Simonson Thomas Gaillard David Mignon Marcel Schmidt am Busch Anne Lopes Najette Amara Savvas Polydorides Audrey Sedano Karen Druart and Georgios Archontis 2013 J Comp Chem in press doi 10 1002 jcc 23418 Computational protein design the Proteus software and selected applications An earlier version was described in M Schmidt am Busch A Lopes D Mignon amp T Simonson 2008 J Comp Chem 29 1092 1102 Computational protein design software implementation parameter optimization and performance of a simple model M Schmidt am Busch A Lopes D Mignon T Gaillard amp T Simonson 2012 In Quantum Simulations of Materials and Biological Systems editors J Zeng R Q Zhang H Treutlein Springer Science Dordrecht pages 121 140 The Inverse Protein Folding Problem Protein Design and Structure Prediction in the Genomic Era Thomas Gaillard and David Mignon contributed to this documentation In addition to the authors listed above additional contributions helpful discussions or suggestions were made by Alfonso Jaramillo Ch
20. a Parameter Temperature Rseed_ Definition Surf Ener Factor Dielectric_ Constant Rot_Proba Rot_Rot_ Proba Mut_ Proba Mut_Mut_ Proba Mut_Rot_Proba Neighbor Threshold Seq Output_ File Energy_Output_ File Mean Field relaxation parameter for Mean Field or Monte Carlo seed for the random number generator energy parameter energy parameter probability to have a rotamer move at each MC step probability to have a move with two rotamer changes move probability move probability move probability energy threshold defines positions that can move together output file output file 10 proteus command file for an MC run with the Crk SH3 domain and a bound 9 peptide lt Mode gt MONTECARLO lt Mode gt use MC for sequence structure exploration lt Energy_ Directory gt matrix location of the energy matrix lt Energy_ Directory gt lt Temperature gt 0 6 lt Temperature gt in kT units lt Trajectory_Number gt 1 lt Trajectory_Number gt number of MC runs lt Trajectory_Length gt number of MC steps per run 100000000 careful the output file will be big lt Trajectory_Length gt lt Seq_Output_File gt prod seq output file for sequences structures energies lt Seq_Output_File gt lt Group_Definition gt pept 1 9 define the peptide and protein by residue numbers prot 134 190 as two distinct groups called pept and prot lt Group_Definition gt lt Optimization_Configuration gt us
21. a model in which different parts of the macromolecule are artificially duplicated for example a protein side chain that is disordered and occupies multiple positions in a crystal structure To allow for these situations both X PLOR il and CNS Usually there is only one such pair the macromolecule interacting with itself view the system formally as a set of pairs of interacting groups MoM where M is the macromolecule and indicates an interaction In the case of a single disordered protein side chain thought to have two main conformations one would normally consider a protein P with two copies of the side chain S and S2 leading to the following pairs of interacting groups P S1 S2 gt P S1 S2 PXAXS1 S2 S1 weight of 1 2 PMR So So weight of 1 2 where P S1 S2 represents the protein without the disordered side chain and the protein S interactions are weighted by 1 2 because there are two copies of S The two copies of S do not interact with each other This formalism is implemented in X PLOR through the constraints interaction statement for an example see the gbtests testfirst inp test case The same formalism applies to the GB energy terms If the interacting groups are denoted Ap Bp with p 1 N their pairs take the form P A x Bp 1 j 1 Ap j Bp There 37 are N pairs of groups P and each has a weight w The GB interaction and self energies take the form A Eint
22. aa2 resid 2 and store9 End loop laa2 end loop laa2 End test marker for a pair to be computed end if End loop 12 end loop 12 Restore initial coordinates for sidechain I coor init sele resid 1 and resn aai and not store9 end refx resid 1 and resn bbaai and not store9 vector do x vector do y refy resid 1 and resn bbaal and not store9 refz resid 1 and resn bbaai and not store9 vector do z Increment rotamer counter for residue I 31 399 400 401 402 403 404 405 406 407 eval rot1 roti 1 End loop lroti end loop lroti Restore initial backbone resname for residue I vector do resname bbaai resid 1 and store9 End loop laal end loop laal End loop 11 end loop 11 Close matrix output file close matrixfile end stop 32 8 Implementation of Generalized Born solvent models in XPLOR 8 1 Introduction The Generalized Born GB model is an efficient and accurate implicit solvent model for biomolecular simulations and structure refinement It describes the solvent around the biomolecule as a dielectric continuum But the numerical complexities of an inhomogeneous solute solvent dielectric system are effectively swept away and replaced by approximate eff cient analytical formulas The model thus allows one to compute the electrostatic interactions between a macromolecule and its surrounding solvent without explicitly in
23. applications Second we describe the steps in a typical application system preparation energy matrix calculation searching sequence conformation space postprocessing and analysis Third we include a description of the logical structure of the rotamer library Fourth we describe and comment a test case provided with the distribution the Chignolin decapeptide Fifth we include a few of the main script files explicitly Last we include detailed documentation for the generalized Born GB model implemented in XPLOR dg 2 Directory structure and files 2 1 XPLOR program The organization of the XPLOR files is described in detail in the XPLOR manual The ver sion distributed with Proteus has the same structure with local modifications to individual source code files The top XPLOR directory could be something like usr local xplor3 8 or usr local Proteus xplor3 8 It is normally defined by the environment variable XPLOR The main XPLOR subdirectories are the following e XPLOR source source code e XPLOR toppar topology and parameter files e XPLOR intel64 machine and compiler specific source code useful shell scripts e XPLOR objects_intel64 object files and xplor exe binary e XPLOR test3 1 example scripts These subdirectories are normally defined as environment variables SSOURCE STOPPAR COMS OBJ TEST With csh these definitions can be set by sourcing the file XPLOR intel64 ulogin com With bash eq
24. ation radii calculate ii energy matrix element write coordinates of rotamer ri TL Procedure to compute ij off diagonal matrix element foreach variable position i get mutation space for position i foreach amino acid type ti in mutation space i 1 get rotamer space for type ti at position i foreach corresponding rotamer ri read coordinates of rotamer ri foreach variable position j lt i if ij Cbeta distance below threshold then get mutation space for position j foreach amino acid type tj in mutation space j 1 get rotamer space for type tj at position j foreach G E rotamer rj read coordinates of rotamer rj if min dist sidei sidej lt 12 A then if min dist sidei sidej lt 3 A then minimize rotamers ri rj en calculate matrix element ij end 1 end ES The entire scripts matrixl inp and matrixIJ inp are listed further on 4 Exploring sequence rotamer space with proteus With the matrix in place the sequence rotamer exploration is done with a C program called proteus not to be confused with the entire Proteus package A single command file controls the calculation with a simple XML format and flexible commands The main parameters the user can set are listed in Table 1 exploration method number of steps choice of the starting sequence structure restrictions on sequence rotamer space and so on An example is shown in Fig 2 Sequences are output in the form of lists of rotamers along with their folding energi
25. chmidt am Busch Thomas Gaillard and Thomas Simonson Ecole Polytechnique 2005 2011 matrix loop I loop J calculations optional arguments positionI positionJ this file matrixIJ inp Should be run in the user s matrix directory Updated by TS SP and KD June 2013 added support for ligands added support for acid base activity Updated by TG January 2011 separate scripts for II loop and IJ loops inactive active and ligand positions in the same loop mutation space stream file native rotamer support GB ACE and HCT support ff99SB force field support changed rotamer library format Pick Rest Nbrot Chis Rota Updated by TG November 2009 increased modularization stream files giant residues at all active positions instead of moving giant residues 999 and 998 around improved the surface calculation decomposition 3 body backbone I J effects were ignored Updated by TS and MSAB January 2009 Cbeta moved into sidechain took into account the desolvation of backbone by the sidechain of 999 applied burial correction to the sidechain sidechain ASA interaction Set default matrix output file eval matrixfile dat matrix_IJ dat Positions I to process if exist_positionI TRUE then eval firstI decode positionI eval lastI decode positionI eval matrixfile dat matrix_IJ_ positionI dat
26. cluding individual solvent molecules in the calculation It can be used either to determine the energy of a single structure or to generate multiple structures by molecular dynamics or simulated annealing Several recent review articles describe the_theoretical background the performance and the ongoing progress of the GB model see eg hhg Two GB variants have been implemented in X PLOR li and CNS 5 The first is termed GB ACE Schaefer amp Karplus J Phys Chem 1996 100 1578 for Analytical Continuum Electrostatics the second is termed GB HCT for Hawkins Cramer amp Truhlar HCT Chem Phys Lett 1995 246 122 We emphasize at the outset that the GB solvation model decribes the solvent response to the charges and Coulomb potential of the solute Therefore it is meaningless to use GB in a simulation or structure refinement where the ordinary electrostatics energy term is turned off The Theory section below reviews the GB ACE and GB HCT models Expressions of the solvation energies and forces are given This section can be skipped by those already familiar with the model The following section Syntax gives the necessary syntax and the default options for using GB in XPLOR The last section Installation and Testing describes the source file organization the method to merge the GB source code with an existing XPLOR distribution and the execution of test files 8 2 Theory 8 2 1 GB energy In the world of continuum el
27. d over each chemical type using the parameter reduce statement 39 NRO 00 Y OU BUN e BRE OTB WNO ONDE WN e e ppp RO000 0 O18 WHF OO CO NOOB WN e O OON O GO BWNE WUNNNN YNNN Ne e BPRRERRRER pa 32 w w coor volumes pdb vector do rmsd wmain all flags exclude include gbse gbin end parameter reduce selection a1l1 overwrite true mode average end en flags include bonds angl dihe impr vdw read coordinate file with atomic volumes in wmain field copy into rmsd field Wwe Koko activate GB energy terms so GB parameters will be reduced average volumes over each chemical type elec reactivate the other terms The atomic volumes suitably averaged are then available for GB calculations 8 3 4 Examples coordinates protein pdb parameter nbonds tolerance 0 25 atom cdie trunc nbxmod 5 vswitch el4fac 1 cutnb 15 ctonnb 13 ctofnb 14 EPS 1 WEPS 80 smooth 1 3 gbace GB options end end flags include gbse gbin end minimize powell nstep 50 end Minimization with GB ACE 8 3 5 Molecular dynamics with GB HCT remarks Asparagine MD with GB HCT remarks this file dyna inp topology GBXPLOR gbtoppar amber topamber inp GBXPLOR gbtoppar amber patches pro end Amber topology file N and C terminal patches for Amber force field parameter GBXPLOR gbtoppar amber paramber gb inp Amber parameter file T end including GB parameters segment nam
28. diagonal matrix elements spaces atomic solvation radii matrix_lJ_ lt i gt _ lt j gt dat AND Matrix diagonal matrix_l_ lt i gt dat pr oteus reconstruct sh options conf XPLOR reconstruct inp 3D Sequences in raw or gt human readable format models Figure 1 Flow chart for the energy matrix and sequence generation including the GB implicit solvent term but omitting the dihedral restraints A surface energy contribution is also computed as follows We estimate a contact surface between sidechain i and its environment the solvent accessible surface area of sidechain 2 alone minus the area of sidechain 7 buried by the extended backbone minus the area of the extended backbone buried by sidechain 7 The atomic contributions to this contact area are multiplied by type specific coefficients to yield a solvation energy term An unfolded state contribution Ey is subtracted 4 to obtain the ii diagonal element of the energy matrix which is written to a file matrix _i dat The unfolded contributions Fx are read from the file MYLIB refener str Off diagonal elements For the off diagonal matrix elements ij the calculations are con trolled by a shell script runIJ sh and an XPLOR command script matrixIJ inp The XPLOR script loops over all active and inactive positions 2 and all their types and rotamers as did matrixLinp above For each one we consider all positions 7 whose Cg is less than 15 A from that of i A library rotamer is
29. e ASN1i molecule name ASN number 1 end end patch NASN refe nil resid 1 end patch CASN refe nil resid 1 end parameter nbonds atom cdie trunc el4fac 0 8333333 cutnb 500 ctonnb 480 tolerance 100 nbxmod 5 vswitch wmin 1 0 ctofnb 490 end end parameters nbonds EPS 1 WEPS 80 GBHCT offset 0 09 lambda 1 33 end end coor volumes pdb vector do RMSD wmain all vector do rmsd rmsd 0 9 all flags include gbse gbin end use this to reproduce amber elec essentially no cutoff only build the nonbonded list once GB parameters GB parameters PDB with volumes in wmain copy into rmsd reduce volumes by 10 40 parameter reduce selection all overwrite true mode average end end coor asn pdb Now run constant energy dynamics random initial velocities maxwell 250 all maxwel1 250 all maxwell 250 all vector do vx vector do vy vector do vz dynamics verlet nstep 500000 timest 0 001 ps iasvel current nprint 250 iprfrq 250 end stop 500 ps dynamics current velocities statistics output Al References U 10 11 BRUNGER A T X PLOR version 8 1 A System for X ray crystallography and NMR Yale University Press New Haven 1992 BRUNGER A T ADAMS P D DELANO W L GRos P GROSSE KUNSTLEVE R W JIANG J PANNU N S READ R J RICE L M AND Simonson T The structure determination language of the Crystallography and NMR Syste
30. e all interactions done by default anyway m prot pept prot pept prot pept represents the inter group contribution lt Optimization_Configuration gt lt Space_Constraints gt 4 ALA 5 LEU fix the peptide ligand s sequence peptide 8 LYS residues 1 3 6 7 are frozen already prolines 9 LYS lt Space_Constraints gt lt Seq_Input_File gt starting seq choose the initial sequence structure lt Seq_Input_File gt eg the endpoint of a previous run Figure 2 Proteus command file for an MC simulation 11 O O ON OO BUNE m 0 NOQ On BWN e 5 Rotamer library organization The protein rotamer libraries are stored in CPD rotamer The library recommended for use with the Amber ff99SB force field is in CPD rotamer ff99SB Tuffery95_ bind H There are five subdirectories Rota Chis Nbrot Pick and Rest Rota contains 3D coordinates for each rotamer the others contain rotamer information in the form of small XPLOR stream files For example the files corresponding to the serine SER sidechain are subdirectory files for SER sidechains content Rota SER_1 pdb SER_9 pdb 3D coordinates for each rotamer Chis SER_ 1 dat SER _9 dat Torsion angle values Nbrot SER dat Number of sidechain torsions Pick SER dat Stream file to extract the sidechain torsion values for a current 3D structure Rest SER dat Stream file to apply dihedral restraints corresponding to a current rotamer Specifically the files l
31. ectrostatics a biomolecular solute is viewed as a set of fractional atomic charges in a cavity delimited by the solute surface embedded in a high dielectric solvent medium L The electrostatic energy E is the sum of the Coulomb interaction energies be tween all solute charges and a solvation term AF the latter includes the interaction energies of each solute charge with solvent its self energy and a solvent screening contribution to the interaction energies between solute charges elec A EY 1 i lt j ij solv __ self int ABR AE AE 2 2 VJ 33 In the GB model the solvent contribution AE to the interaction energy between the charges q and qj is approximated by AEP l 3 Y r3 b b exp r 4b b 3 where r is the distance between the charges 7 is given by T 1 1 4 is the solvent dielectric constant and b is the solvation radius of charge i By analogy to the case of a single charge in a spherical cavity b is defined by 2 AE 2 _ 7h 5 2 2b L where AES is the self energy of charge i By partitioning the solute into atomic volumes following Lee amp Richards for gxample one can express the self energy AEP as a sum over all the solute atoms LO ld la 2 ABS E 4 rg D ER 6 2R L where R is a constant atomic radius to be determined close to the van der Waals radius and El is related to the integral of the electrostatic energy over
32. er calculated using Voronoi polyhedra using m an external progra 18 and reading them into X PLOR or assigned values from existing 34 libraries hl bd bd Note that the V are considered to be constants independent of the solute conformation This is essential to obtain tractable expressions for the GB forces see below With the above self energy approximations A E can sometimes become positive so that the necessarily positive solvation radius can no longer be defined by Eq 6 Therefore we use a definition proposed by Schaefer et al 2 2 Tdi 1 Tdi b ADS Ein DABS Demas A Belt Utes 2 Z if AES gt Emin 8 Here bma is an upper limit for the solvation radius which can be set to the largest linear dimension of the solute for example This definition leads to continuous energies and forces 8 2 2 Calculation of forces Interaction energy term We first consider the GB interaction term on the far right of Eq 2 and its gradient Y with respect to the position of solute particle n Noting that the solvation radii b b depend on all the atomic positions and using the chain rule for differentiation we have l OA Ei LR E OA Ei Vn gt AEM Y V rij gt Vb gt gt Vb 9 i lt j j i lt j Orij a i lt j Ob i lt j Ob Only terms with 2 n or j n contribute to the first sum on the right The second sum can be written DAR OA Enn l S aa Vn AES
33. es Rotamers are numbered using the internal proteus numbering which identifies both amino acid type and rotamer Conversion to a more verbose human readable format is done by proteus in a separate step The verbose or rich format includes residue types numbers and rotamer numbers with the numbering of the rotamer library From each sequence in this format a perl script can produce a PDB file with the wildtype backbone coordinates extended to include any frozen residues or ligands and with the rotamer numbers in the B factor field An XPLOR script reconstruct inp can then produce the full PDB structure sidechains included with or without some additional overall energy minimization A series of perl scripts is also available to compute sequence properties such as similarity to a reference alignment Table 1 Possible commands in the proteus command file XML tag name Description Mode heuristic MC mean field or POSTPROCESS Group_ definition Optimization__Configuration definition of the energy function Space_Constraints restrict possible states or force two residues to have the same type Seq_ Input_ File input file with the starting rotamers Trajectory_ Length length of an MC trajectory Trajectory_ Number number of MC trajectories Cycle_ Number number of heuristic cycles Sequence_Pass_ Number maximum passes over the structure per heuristic cycle group interaction energies are the basic elements of the energy function Lambd
34. ets flags for the choice of force field solvent model dielectric constant and so on XPLOR is run using a script PROJ build build inp which reads parameters str xplor lt build inp gt build out The main result is a Protein Structure File or PSF say allh_ protein psf which describes the topology or 2D chemical structure of the protein sequence atom types atomic charges covalent structure H d If a single inactive ligand is to be used it can be created in build inp and written to allh_protein psf If several ligands are to be used with mutations to exchange them one wildtype ligand should be included in the build inp step and the others should be created separately by the user each with its own PSF file Each one must be compatible with the force field employed 3 2 CPD setup files to edit For the system build only MYLIB parameters str had to be edited For the following steps a series of files in MYLIB should be carefully inspected and modified as needed e parameters str sets the force field solvent model dielectric constant and other pa rameters e sele str defines the groups that are active they can mutate inactive they can t mutate but are flexible or frozen their position is fixed e mutation_space dat defines the possible amino acid types for active sidechains ad ditional restrictions can be applied later on a position by position basis see below
35. ht include all amino acid types or a smaller set for some applications like pK calculatiions This file can be edited but restrictions on possible mutations are more readily applied at later stages A secondary task performed by setup inp is to analyze the solvent accessibility of each residue this is needed later to apply a pairwise additivity correction to the surface energy term Another secondary task when a GB solvent is used is to precompute the GB solvation radii for each backbone atom using a Native Environment approximation for the rest of the system sidechains ligands 4 The output from this step includes a PSF file setup psf written to the current directory usually PROJ matrix including giant residues and possibly multiple versions of one or more ligands A PDB file setup pdb is produced where the giant residues have unassigned sidechain coordinates except for the wildtype sidechain and positions are tagged as active inactive frozen A second PDB file bsolv pdb is produced containing the GB solvation radii and stored in PROJ matrix local Bsolv Diagonal matrix elements The energy matrix is computed with XPLOR in two main steps executed by two shell scripts A flowchart for the calculations is shown in Fig I The first step shell script runI sh XPLOR script matrixl inp computes several quantities for each position in the system The calculations are done as follows For each active or inactive amino acid posi
36. igand Feb 2013 299 300 301 302 303 304 305 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 around sabbcutoff and store9 and not name H end vector do vx rmsd name CA and not segid LIGA or store3 and resid 2 around sabbcutoff and store9 and not name H Initialize storage vector vector do vz 0 0 all Surface of positions I and J local backbone save in vz Modified to include ligand Feb 2013 surf rh20 rh20 accu accu mode access sele name CA and not segid LIGA or store3 and resid 1 or resid 2 around sabbcutoff and store9 and not name H end vector do vz rmsd name CA and not segid LIGA or store3 and resid 1 or resid 2 around sabbcutoff and store9 and not name H end if Begin loop laa2 for aa2 in mutation_space2 loop laa2 Initialize previous rotamer number eval rot2prev 9999 Get number of rotamers eval nbrotfile2 local Nbrot encode 2 _ aa2 dat nbrotfile2 eval nbrot2 nbrot if nativerot 1 then if aa2 bbaa2 then eval nbrotlib2 nbrotlib eval nbrotnat2 nbrotnat end if end if Determines rotamer space eval rotamers2 local EnrFltr encode 2 _ aa2 dat Begin loop lrot2 for rot2 in L rotamers2
37. inactive 80 if GB is active 38 SMOOTh lt real gt Determines the atomic widths in GB ACE denoted a in Eq 23 Default 1 LAMBda lt real gt Scaling factor for solvation radii in GB HCT denoted A in Eq 31 Default 1 OFFSet lt real gt Offset for atomic radii in GB HCT denoted py in Eq 31 Default 0 8 3 3 Setting up atomic volumes for GB Two approaches can be used Volume libraries Two sets of standard atomic volumes are available for proteins in two force field parameter files param19 gb pro and paramber gb inp located in GBX PLOR gbtoppar see Fie Orgaization below These volumes are automatically read along with the other force field parameters The first set was developed by Schaefer and coworkers and modified and tested for protein simulations by Calimet et al 24 and is meant to be used with the Charmm19 topology toph19 pro and parameter set The second was developed and tested by Onufriev et al E is meant to be used with the Amber all atom force field 25 Other volume libraries are available in the literature and can be formatted for X PLOR for example nucleic acid libraries The syntax of the NONBonded subcommand is modified accordingly NONB lt type gt lt real gt lt real gt lt real gt lt real gt lt real gt lt real gt reads the Lennard Jones parameters for a specified chemical type as before the first pair of reals is e the second pair is e 0 for 1 4 non bonded interac
38. ins 45 2001 144 158 CORNELL W CIEPLAK P BAYLY C GOULD I MERZ K FERGUSON D SPELLMEYER D Fox T CALDWELL J AND KOLLMAN P A second generation force field for the simula tion of proteins nucleic acids and organic molecules J Am Chem Soc 117 1995 5179 5197 Tsul V AND CASE D A Molecular dynamics simulations of nucleic acids with a Generalized Born model J Am Chem Soc 122 2000 2489 2498 WAGNER F AND SIMONSON T Implicit solvent models combining an analytical formulation of continuum electrostatics with simple models of the hydrophobic effect J Comput Chem 20 1999 322 335 43
39. ion specific information on allowed mutations rotamers and GB radii run sh does everything xxx conf files are proteus command files xxx seq are designed sequences in raw rotamer format Generate 3D structures from the rotamer information in proteus seq run sh does everything using CPD inp reconstruct inp PDB files are in scr 10 11 12 13 14 15 16 LY 18 19 20 21 22 23 24 25 26 27 28 29 10 11 7 XPLOR code listing for selected scripts The scripts explicitly included here are sele str parameters str matrix inp matrixIJ inp 7 1 lib sele str and lib parameters str remark remark Define selections remark Define inactive residues vector ident store1 not resn GLY or resn CYX or resn PRO Define active residues vector ident store2 not all Ligand center atom vector ident store3 segid LIGA and tag Ligand fitting atoms vector ident store5 segid LIGA Protein fitting atoms vector ident store6 not segid LIGA and name N or name CA or name CB or name C Ligand backbone vector ident store7 segid LIGA Protein backbone vector ident store8 not segid LIGA and resn GLY or resn CYX or resn PRO or name N or name H or name HN or name CA or name HA or name C or name O or name OXT or name OT or name H or name HT Whole backbone vector ident store9 not storel or store2 or segid LIGA or store7
40. m In International Tables for Crystallography Volume F M Rossmann and E Arnold Eds Dordrecht Kluwer Academic Publishers the Netherlands 2001 pp 710 720 DAHIYAT B I AND Mayo S L De novo protein design fully automated sequence selection Science 278 1997 82 87 SIMONSON T GAILLARD T MIGNON D SCHMIDT AM BuscH M LOPES A AMARA N POLYDORIDES S SEDANO A DRUART K AND ARCHONTIS G Computational protein design the Proteus software and selected applications J Comput Chem 0000 2013 in press BR NGER A ADAMS P CLORE G DELANO W GROS P GROSSE KUNSTLEVE R JIANG J KUSZEWSKI J NILGES M PANNU N READ R RICE L SIMONSON T AND WARREN G Crystallography and NMR System a new software suite for macromolecular structure determination Acta Cryst D54 1998 905 921 Brooks B BrooKs III C L MACKERELL JR A D NILSSON L PETRELLA R J Roux B Won Y ARCHONTIS G BARTELS C BORESCH S CAFLISCH A CAVES L Cur Q DINNER A R FEIG M FISCHER S GAO J HODOSCEK M IM W KUCZERA K LAZARIDIS T Ma J OVCHINNIKOV V Paci E PASTOR R W POST C B Pu J Z SCHAEFER M TIDOR B VENABLE R M Woopcock H L Wu X YANG W YORK D M AND KARPLUS M CHARMM The biomolecular simulation program J Comp Chem 30 2009 1545 1614 MOULINIER L CASE D A AND SIMONSON T Xray structure refinement of proteins
41. models and their growing pains Curr Opin Struct Biol 11 2001 243 252 KIRKWOOD J AND WESTHEIMER F The electrostatic influence of substituents on the disso ciation constant of organic acids J Chem Phys 6 1938 506 512 LEE B AND RICHARDS F The interpretation of protein structures estimation of static accessibility J Mol Biol 55 1971 379 400 SCHAEFER M AND FROEMMEL C A precise analytical method for calculating the electrostatic energy of macromolecules in aqueous solution J Mol Biol 216 1990 1045 1066 ONUFRIEV A BASHFORD D AND CASE D A Modification of the generalized Born model suitable for macromolecules J Phys Chem B 104 2000 3712 3720 SRINIVASAN J TREVATAN M BEROZA P AND CASE D A Application of a pairwise Generalized Born model to proteins and nucleic acids inclusion of salt effects Theor Chem Acc 101 1999 426 434 SCHAEFER M BARTELS C LECLERC F AND KARPLUS M Effective atom volumes for implicit solvent models comparison between Voronoi volumes and minimum fluctuation volumes J Comput Chem 22 2001 1857 1879 SCHAEFER M BARTELS C AND KARPLUS M Solution conformations and thermodynamics of structured peptides molecular dynamics simulation with an implicit solvation model J Mol Biol 284 1998 835 847 CALIMET N SCHAEFER M AND SIMONSON T Protein molecular dynamics with the Gen eralized Born ACE solvent model Prote
42. ook like this Chis SER_ 1 dat eval eae 62 0 eval chi2 60 0 Nbrot SER dat eval nbrot 9 Pick SER inp pick dihe resid resid and resn SER and name n resid rezid and resn SER and name c resid resid and resn SER and name cb resid resid and resn SER and name og geom eval chil result pick dihe resid resid and resn SER and name ca SEA AT and resn SER and name o resid resid and resn SER and name og resid resid and resn SER and name hg geom eval chi2 result Rest SER dat assign resid resid and resn SER and name n resid resid and resn SER and name ca resid resid and resn SER and name ce resid resid and resn SER and name og dihecons chil diherange 2 assign resid resid and resn SER and name ca See ace and resn SER and name cb resid resid and resn SER and name og resid resid and resn SER and name hg dihecons chi2 diherange 2 12 6 A test case the chignolin decapeptide This is a 10 residue peptide that forms a two stranded 8 sheet Additional test cases will be added later We refer to the test directory chignolin ff995B_gbsa as TEST Model pa rameters are assigned in TEST lib also defined as MYLIB The force field solvent model and many other parameters are set in parameters str We use the Amber ff99SB force field with a simple but well optimized GB variant we call GB HCT M 5 to be inactive and residue 3 to be active The others are frozen they do no
43. placed at position j by reading the local rotamer file created above If the minimum distance between the atoms of sidechains i and j is more than 12 A we discard this 7 rotamer and move on to the next one Otherwise energy minimization of sidechain j beyond Cg is done in two steps first in the context of the extended backbone as with sidechain 7 then in the context of the extended backbone plus sidechain 7 In this second step both sidechains and j beyond Cg are allowed to move with dihedral restraints The interactions considered are those of sidechain i with itself and the extended backbone SLO KD AOD LO GO 4 O On LOK LA Re pap KO GO 005 On L KD LA sidechain j with itself and the extended backbone and sidechain with sidechain j The final molecular mechanics energy and ij surface energy are computed omitting the dihedral restraint energy and written to the matrix file matrixIJ_ij dat The procedures for the ii and ij matrix elements are schematized below Procedure to compute ii diagonal matrix element foreach variable position i get mutation S for position i if nativerot then handle position i native rotamers if gb then compute position i backbone solvation radii foreach amino acid type ti in mutation space i get number of rotamers for amino acid ti foreach corresponding rotamer ri osition rotamer ri if gb then compute rotamer ri solvation radii minimize rotamer ri if gb then update rotamer ri solv
44. ripts to run the calculation PROJ matrix dat the actual matrix files will be written here PROJ matrix out XPLOR output files from the calculation are written here PROJ matrix err XPLOR error messages are collected here PROJ matrix local intermediate files are stored here PROJ matrix local Bsolv atomic solvation radii with GB solvent are written here in bsolv pdb PROJ matrix local Chis files defining native rotamers when used PROJ matrix local EnrFltr files defining the rotamers that have passed an energy filter test PROJ matrix local Mut position specific mutation spaces can be edited manually if needed PROJ matrix local Nbrot information on the number of rotamers at each position PROJ matrix local Rota the actual 3D sidechain structures for each rotamer po sitioned on the protein backbone PROJ proteus directory for the Monte Carlo simulations PROJ reconstruct directory for 3D structure rebuilding and postprocessing The PROJ lib subdirectory should be defined as the environment variable MYLIB 3 Using Proteus for a typical protein or protein ligand system 3 1 System preparation for XPLOR Protein setup starts from a PDB file usually edited so that the atom names conform to the conventions of the force field that will be employed The main model parameters are set by editing a single file MYLIB parameters str which is written in the XPLOR command language and where the user s
45. ristine Bathelt Alexey Aleksandrov Seydou Traor and Jialin Liu The first version of the proteus C code was based on an earlier program by L Wernisch Contents 3 4 eo ee 4 a ue ure oe 4 sa ene oe 5 3 Using Proteus for a typical protein or protein lig 3 1 System preparation for XPLOR 4222522224455 ek b54e26 5 6 3 2 CPD setup files to edit 0 00 0 0000000000000 0 000042 6 33 The energy o ea ee kb ae eK Sa ca Ee oe eRe ee T 4 _ Exploring sequence rotamer space with proteu 10 5 Rotamer library organizatio 12 6_ A test case the chignolin decapeptide 13 7 XPLOR code listing for selected script 14 7 1 lib sele str and lib parameters sty 2 ee 14 12 1 3 MA E TT 18 InpD MAatrixl AND xro ae eae RAR a PR a Peed bee ed ee G8 24 8 Implementation of Generalized Born solvent models in XPLOR 33 8 1 Introduction lc a a ces wa he eae Me Wee ais de cs os i ed 33 82 a atk odo Ge IA 33 T PA 33 E E E anos 35 37 a E ers taras 38 e E E 38 Nano ok Fee ee RRR tadi api empi napi 38 8 3 2 Setting the GB optiong 224 rara 38 8 3 3 Setting up atomic volumes for GH 4 4444 24 48 a a 39 8 3 4 Examplegy 2 a a a a 40 8 3 5 Molecular dynamics with GB HCOT oaoa a aa a a a a 40 1 Overview Proteus has four components 1 the molecular simulation program XPLOR li with local modifications 2 a sophisticated set of scripts written in the XPLOR scripting language
46. seif b1 2 then eval typel active end if if segl LIGA then eval typel ligand end if Determines mutation space eval mutation_spacel local Mut encode 1 _ typel _ bbaal dat Determines rotamer directories if typel ligand then eval rotlibdir1 LIGROTLIBDIR if nativerot 1 then eval rotnatdir1 LIGROTNATDIR end if else eval rotlibdir1 ROTLIBDIR if nativerot 1 then eval rotnatdir1 ROTNATDIR end if end if MOVED INSIDE ROT1 LOOP BECAUSE OF LACK OF VARIABLES lif sa 1 then Initialize storage vector vector do vx 0 0 all Surface of position I local backbone save in vx 1 COULD BE READ FROM LOCAL STORAGE Isurf rh2o rh20 accu accu mode access sele name CA and resid 1 around sabbcutoff and store9 and not name H end vector do vx rmsd name CA and resid 1 around sabbcutoff and store9 and not name H lend if Begin loop laal for Taal in mutation_spacel loop laal Get number of rotamers eval nbrotfilel local Nbrot encode 1 _ baal dat nbrotfilel eval nbroti nbrot if nativerot 1 then if aa1 bbaa1 then 27 199 200 201 202 203 204 205 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 24
47. t mutate or In sele str we set residue explore rotamers The individual steps are listed below with a few comments In the Proteus distribution most but not all of the output files have been left in place The sequence rotamer exploration is done by a few very short Monte Carlo runs at a fairly high temperature kT 1 kcal mol for illustration This leads to just four different sequences with the residue 5 sidechain types His Thr Cys and Met see the files TEST proteus proteus rich and pro tein_sequences dat Some of the designed structures are in TEST reconstruction scr 100 structures for each of the four sequences Some energy statistics information is temporarily disabled due to recent changes in the sequence file formats TEST subdirectory main files build build inp model pdb run sh build out lib sele str parameters str phia str refener str mutation_space dat matrix setup sh runI sh runlJ sh dat local matrix local Bsolv Chis EnrFltr subdirectories Mut Nbrot Rota proteus run sh proteus seq proteus MONTECARLO conf exploration reconstruction run sh scr 13 comments the chain termini are unpatched historical reasons with dangling NH and CO XPLOR stream files which set most of the model parameters mutation space and reference energies are needed for active position 5 the matrix calculation is run from here the matrix files are written in dat these contain the posit
48. the volume of atom k Notice that the charges of the other atoms dr do not appear here The effect of these atoms is merely to exclude solvent from the vicinity of atom i id The volume integral Eig is approximated in two steps The first step is to approximate the electric field by the Coulombic field of charge i IB This is simply the unscreened field that would exist if q were in a vacuum it radiates uniformly in all directions and falls off as 1 r with distance the corresponding energy density is 1 r The next step is to calculate the integral of 1 r over the volume of atom k The different GB variants do this in different ways In GB ACE for example Schaefer amp Karplus assume the density of each solute atom is a gaussian centered at the atom s position The integral E then has a tractable form which can be approximated by interpolating between a Gaussian form at short ranges and a 1 r form at long range leading to the Ansatz mi p 3 4 ep rih E a Y Here wip and Hig are simple functions of the atomic volume Vp the atomic radii R Ry 3V 47 3 and an adjustable smoothing parameter a which determines the width of the atomic gaussian distributions see below The atomic charges are taken directly from the existing force field The adjustable parameters of the model are then the volumes V and the ae mie a Ionic strength is not included although methods to do so have been a li Volumes V can be eith
49. tion 7 we loop over its possible types and rotamers Each rotamer is placed by superimposing a library rotamer structure onto the protein backbone based on the N Ca Cg and C atoms the rotamer coordinates replace the original ones for atoms beyond Cg Sidechain coordinates of position 7 beyond Cg are then energy minimized for Nin 15 steps using the conjugate gradient algorithm Harmonic restraints are applied to the dihedrals of i with a force constant of 200 kcal mol rad and a tolerance range of 5 around the ideal rotamer angle The rest of the protein is kept fixed The only interactions considered during the minimization are those of sidechain 7 with itself and with the protein backbone extended to include any frozen residues At this point we save the rotamer coordinates to a file in PROJ matrix local Rota We also compute the solvation radii of the sidechain atoms and store them in bsolv pdb Finally we compute the molecular mechanics energy with the considered interactions ri XPLOR protein pdb build inp allh_protein pdb psf ligand pdb P a Make PSF sele str active inactive residues XPLOR parameters str force field solvent model setup inp phia str atomic surface coefficients Make giant residues J PSF files for alternate ligands setup pdb psf bsolv pdb runl sh runlJ sh XPLOR XPLOR matrixl inp refener str matrixlJ inp unfolded energies local rotamers local mutation Off
50. tions The last two reals are V the atomic volume Eqs 6 25 and S the scaling parameter used for the HCT solvation radius see text following Eq 29 If the last two reals are omitted V and S will both be set to 9999 Thus for applications not using GB there is backward compatibility with X PLOR parameter files not set up for GB But for applications using GB V must be included in the parameter file for both GB ACE and GB HCT and S must be included for GB HCT Volumes calculated with an external program In some cases it may be desirable to calculate the atomic volumes corresponding to a particular family of conformations and or Ed The standard GB ACE volumes were obtained from atomic Voronoi volumes calculated for a large set of protein structures then proteins instead of relying on standard values averaged over each chemical type bd then reduced by a factor of 0 9 to account for systematic errors in the GB ACE self energy approximation Several programs have the capability to calculate Voronoi volumes for each individual atom of a particular protein eg the VORONOI package of Fred Richards If these are then stored in a particular field of a PDB coordinate file for example the field normally used for the temperature factors WMAIN this information can be read into X PLOR using the coordinate statement then made available to the GB routines internally To do this the volumes must be copied into the RMSD array then average
51. ts Dihedral restraints force constant eval dihecons 200 0 Dihedral restraints angle range eval diherange 5 0 Dihedral restraints maximum number of assignments eval nassign 300 Dihedral restraints scale eval scale 1 0 Minimization Minimization number of steps for matrix I eval nstepi 15 Minimization number of steps for matrix IJ eval nstepij 0 Minimization number of steps for reconstruction eval nstepReconstr 0 Minimization expected initial drop eval drop 10 Minimization print frequency eval nprint 5 Native rotamers eval nativerot 0 Reference energy CASA model eval casa 0 GB model eps 4 SA eval gbe4sa 1 L Output 22 SS SS Enriched output format eval enrichedoutput 1 Decompose surface terms by type n p a i eval decompsurf 0 Debug eval debug 0 17 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 7 2 inp matrixI inp remarks Compute energy matrix for protein design remarks remarks remarks by Anne Lopes Marcel Schmidt am Busch Thomas Gaillard and Thomas Simonson remarks Ecole
52. uivalent commands can be used Below we use these environment variables as shorthands for the corresponding directories 2 2 Source directories for Proteus Alongside the top XPLOR directory we define a top Proteus source directory say CPD This could be something like usr local Proteus The main CPD subdirectories are e CPD inp XPLOR scripts for system setup and energy matrix calculation CPD lib XPLOR macros or stream files for system setup and energy matrix calcu lation e CPD bin perl and shell scripts e CPD rotamers files that define the protein rotamer libraries e CPD rotamers_other some non protein rotamer definitions e CPD doc documentation files including this manual e CPD testcase a simple testcase The most important files are described in the next sections 2 3 User directories for a Proteus application For a user running a given application we define a top project directory say PROJ This could be something like home smith crk Crk is a small SH3 protein The subdirectory setup is partly imposed by the software especially the matrix calculation A typical setup would be the following PROJ build initial system setup for XPLOR PROJ lib or MYLIB local copy of the XPLOR stream files that define the main parameters for the calculation edit as needed especially parameters str sele str PROJ matrix top directory for the energy matrix calculation includes shell sc
53. with the generalized Born solvent model Acta Cryst D 59 2003 2094 2103 LOPES A ALEKSANDROV A BATHELT C ARCHONTIS G AND SIMONSON T Compu tational sidechain placement and protein mutagenesis with implicit solvent models Proteins 67 2007 853 867 STILL W C TEMPCZYK A HAWLEY R AND HENDRICKSON T Semianalytical treatment of solvation for molecular mechanics and dynamics J Am Chem Soc 112 1990 6127 6129 HAWKINS G D CRAMER C AND TRUHLAR D Pairwise descreening of solute charges from a dielectric medium Chem Phys Lett 246 1995 122 129 SCHAEFER M AND KARPLUS M A comprehensive analytical treatment of continuum elec trostatics J Phys Chem 100 1996 1578 1599 Qiu D SHENKIN P HOLLINGER F AND STILL W The GB SA continuum model for solvation A fast analytical method for the calculation of approximate Born radii J Phys Chem A 101 1997 3005 3014 42 13 14 15 16 I7 18 19 20 21 na 22 24 L 25 26 27 BASHFORD D AND CASE D Generalized Born models of macromolecular solvation effects Ann Rev Phys Chem 51 2000 129 152 Roux B AND SIMONSON T Implicit solvent models Biophys Chem 78 1999 1 20 CRAMER C AND TRUHLAR D Implicit solvent models equilibria structure spectra and dynamics Chem Rev 99 1999 2161 2200 SIMONSON T Macromolecular electrostatics continuum
Download Pdf Manuals
Related Search
Related Contents
Contrôle des véhicules automobiles affectés au trafic Philips DLA63143 Bamix Gastro 350 AEG L76800 Front Load Washer 取扱説明書/1.5MB Commtech 7900 D-Link DCS-3710 surveillance camera SonoSite Ultrasound System 1.75 Service Manual P01118-03 Copyright © All rights reserved.
Failed to retrieve file