Home

QuickStart guide

1. Optimum name wcsp out awk print 2 lowerbound ub 1b Sew upperbound bin toulbar2 name wcsp d a s ub Sub gt S name wcsp enum 2 gt amp 1 4 Solutions from name wesp enum are extracted sorted and translated into osprey format using the following command line scripts simple_ana sh mat VI SCORE REFINEMENT AND STATISTICAL ANALYSIS OF TOP SCORE MODELS First from the enumerated sub optimal sequence conformation models the unique sequences are extracted and their occurrences are determined This task is performed by the simple_ana sh script and the result file is name wcsp enum res ana A fasta format file is also generated from these unique sequences and Weblogo Crooks et al 2004 can be used to visualize the propensity of each amino acid type at each mutable position Second in order to evaluate the effect of the relaxation of side chains and backbone degrees of freedom of the best conformation for each unique sequence on the energy ranking energy minimization and rescoring steps are carried out as well as the number of conformation accessible to each mutant within some threshold 0 2 kcal mol here The extraction of unique sequences best conformation and structure building using osprey is performed by scripts GenStruct sh The tleap module of amberg is used to produce the inpcrd file of the generated models which are then subjected to energy minimizations using the sander module of amberg In
2. this example 1000 steps of minimizations are performed using the Generalized Born Surface Area GB SA implicit solvent model Hawkins et al 1996 Here is the command line for the generation of amber input files and the minimization of unique sequence structures scripts amberin pl scripts amber postmin sh The amberin pl script generates all input files required for the minimization of all selected conformations best conformation per sequence while the amber postmin sh script performs the minimization The energy of the refined structure is then reevaluated using osprey computeEnergyMol command In order to assess the effect of the minimization on the conformational variability a repacking optimization can be carried out on the minimized structures This is accomplished by performing a matrix computation using osprey and sub optimal enumeration using CFN based approach with some initEw value 0 2 for example for each of the unique sequences These tasks are performed by run_post_ana sh Here is given the description of some modifications accomplished in the configuration files of each of the unique sequences System cfg the parameter pdbName points to the pdb file of the minimized structure of the considered sequence KStar cfg unchanged DEE cfg the initEw parameter is set to 0 2 all lines res Allowed are deleted the minEnergyMatrixName erefMatrixName and maxEnergyMatrixName p
3. QUICK START GUIDE TO COMPUTATIONAL PROTEIN DESIGN USING Cost FUNCTION NETWORK CFN This document has been produced as a companion to Traor et al 2013 in Bioinformatics to get you started with using the CFN based approach for CPD It presents a detailed example of how to apply the approach to predict the optimal sequence or a sub optimal ensemble of sequences for a protein design problem targeting stability enhancement The goal of this example is to assist the user in setting up and applying this new CPD framework for their own protein design problems For reviewers we also included a Very QuickStart see section VII that avoids all problem generation steps requiring extensive installations amberg osprey 2 0 and therefore also MPIJava This is achieved by making available the energy matrices computed by osprey for all 35 CPD problems in the paper as well as their translation to the CFN wcsp format Together with Python translation scripts this should allow for a painless reproduction of computational results Otherwise the setup for performing this CPD approach can be devised in five main steps i Parameterization of molecular structural system ii Selection of search sequence conformation space iii Computation of pairwise energy terms iv Optimization of sequence conformations v Score refinement and statistical analysis of top score models The details of each step are described below I ARCHIVE CONTENTS
4. REYHOME src RotamerSearch java lt patch RotamerSearch patch patch SOSPREYHOME KSParser java lt patch KSParser patch Then recompile osprey the MPIJava library must be available too see Osprey documentation javac cp mpiJava lib classes SOSPREYHOME src java
5. SEQUENCE CONFORMATION OPTIMIZATION The CFN based optimization using toulbarz is performed by scripts CFN sh scripts CFN sh 1MJC matrix 28p 17aa usingEref txt This script involves the following steps 1 The translation of the pairwise matrix to the CFN wesp format mat 1MJC matrix 28p 17aa usingEref txt name of the matrix file scripts mat2wcsp pl mat mat mwcsp minself gt make_wcsp out Where mwcsp is a flag for translating to the CFN wesp format and minself specifies the use of reference energy The script creates the input for toulbara 1MJC matrix 2 p 17aa usingEref_self_digit8 wesp 2 The computation of the GMEC followed by the extraction of the solution from the output and the translation of the costs into energy values name 1MJC matrix 28p 17aa usingEref self digit8 bin toulbar2 name wcsp 1 3 m d s gt name wcsp out grep A 1 New solution Sname wcsp out tail 1 gt name wcsp sol scripts mat2wcsp pl mat mat minself tb2sol name wcsp sol gt name wcsp gmec out The file name wcsp sol contains the solutions found by toulbar2 and name wcsp gmec out contains corresponding energies translation of unary and binary costs into kcal mol and the corresponding total energy 3 The computation of sub optimal ensemble the cost of the GMEC is used to enumerate sub optimal solutions within some threshold from the GMEC energy 2 kcal mol ew 2 10 8 2kcal mol lb egrep
6. The archive of the pipeline used to generate energetic models based on a patched version of the open source solver osprey 2 0 the conversion to CFN models based on Perl scripts and CFN solving based on the open source solver toulbar2 is available at the following address http genoweb toulouse inra fr tschiex CPD SpeedUp tgz The example of protein design is in the example rMJC directory It contains the following directories osprey2 patched osprey 2 0 with MPIJava sources and classes compiled with Sun Oracle Java7 64 bits compiler bin contains toulbarz binary file and a binary file for sequence analysis conf_info will contain sequence conformation files dat directory for interaction energy matrix files during computation dat save will contain the saved matrix file files contains some intermediary files inp amber cplex and osprey input files patch patch to be applied to the original osprey2 0 sources if required pdbs generated structures from selected results scripts scripts to setup input files and making some analysis postmin_ana post minimization and repacking directory It is assumed that you have installed java amberg and that our patched version of osprey2 0 can be executed All executions are assumed to run under a 64 bits Linux system with a Bourne shell If you lack any of these software intermediary files are also available in the archive for IMJC or the Very QuickStart can be tr
7. arameters should have a different name for each sequence parameter AddWTRots and AddWT are set to true In the directory postmin_ana each sequence has its own subdirectory because text matrices are written into the dat sub directory For each sequence the matrix computation and CFN optimization are identical to the process defined above except that the reference energy is not used during the optimization step since each sequence is optimized independently flag minself is not used Also notice that if you just need the number of conformation within initEw the flag s can be omitted The following command lines perform the matrix computation using osprey as well as the CFN based repacking using toulbarz scripts run_post_ana sh cd postmin ana run_mutantMatrices sh run_mutantPostCFN sh For the rescoring and the energy matrix computation the script run_post_ana sh generates two files for each mutant computeMats sh and PostMinCFN sh The first one reevaluates the energy of the minimized structures and the second one carries out the repacking The computation for all mutants can directly be performed using run_mutantMatrices sh applies all computeMats sh and run_mutantPostCFN sh applies all PostMinCFN sh VII VERY QUICK START To be able to just reproduce the main results w o major efforts or software installation Please download the energy matrices produced by osprey 2 0 and their translated version to the w
8. e configuration files generated in the inp directory are System cfg information about the protein system being redesigned KStar cfg force field parameters and rotamer file specification DEE cfg parameters for energy matrix and DEE A computations TV COMPUTATION OF PAIRWISE ENERGY TERMS vV This stage consists of the pairwise energy terms computation and the generation of the corresponding matrix in text format what is required to build CFN models This is achieved using osprey 2 0 Chen et al 2009 Gainza et al 2013 You should try the patched an d compiled version available in the Osprey2 o directory which should work under most 64 bits Linux systems with Java 6 or above installed If not please look to the Appendix to patch and compile Osprey 2 0 yourself The command lines for computing pairwise energy matrices are java cp Osprey2 0 src Osprey2 0 src mpiJava lib classes Xmx2G KStar t 5 c inp KStar cfg computeEmats inp System cfg inp DEE cfg gt matrix out 2 gt amp 1 lt dev null The single and pairs interaction matrix files are saved into the dat directory These generated text matrices have to be concatenated into a _ single text matrix called 1MJC matrix 2 p 17aa usingEref txt which is then used to generate the input file for toulbarz The following command line performs the concatenation and saves the combined text matrix into dat save directory scripts concat_pairwise_matrix sh dat save
9. esp format the problems considered in the paper Each file compressed with the strong xz compressor available under most Linux distributions is available at http genoweb toulouse inra fr tschiex CPD Download and extract the wesp file of the problem of your choice the rMJC instance is used below in the example directory please be sure to have the required disk space available and uncompress it unxz 1MJC matrix 28p 17aa usingEref self digit8 wcsp xz This creates a possibly large wesp file for toulbarz e You can identify the GMEC using toulbarz directly on the wesp files as described in section V item 2 e You can enumerate all solutions within 2 kcal mol of the GMEC using toulbarz directly on the wesp files once the GMEC has been identified and stored above Just follow Section V item 3 For testing the ILP approach notice that IBM ILOG cplex is free for academics You must contact the IBM academic initiative to be able to download and install the cplex software Please proceed as described on the dedicated IBM academic initiative web site at http Avww o1 ibm com oftware websphere products optimization academic initiative You can then translate any of the wesp files to the cplex lp format using the wesp2cplex py scripts scripts wesp2cplex py 1MJC matrix 28p 17aa usingEref self _digit8 wcsp gt 1MJC 1p Under cplex command line interface you can identify the GMEC with the followi
10. ied section VII II PARAMETERIZATION OF MOLECULAR STRUCTURAL SYSTEMS You need a protein structure to redesign In the example described here we choose to redesign the Cold shock protein A from E coli The crystal structure of this protein is available in the II Protein Data Bank PDB id 1MJC After downloading the structure ATOM records are extracted from the rMJC pdb file and saved into the rMJC_edited pdb file Missing heavy atoms in crystal structures as well as hydrogen atoms are then added using the tleap module of the Amber 9 Here is given an example of input tleap file to accomplish this task for the edited 1MJC structure 1MJC_edited pdb tleap inp file source leaprc ff 99SB The force field parameter to be used X loadpdb 1MJC_edited pdb The edited pdb file set default PBradii mbondi2 Radii set to be used minimization step check X saveamberparm X 1MJC prmtop 1MJC inpcrd quit The command to run tleap is SAMBERHOME exe tleap f inp tleap in Note that the environment variable AMBERHOME must be set to point to the home directory of the amber package For visual inspection of the structure a pdb file MJC hbuild pdb can be generated using the following command SAMBERHOME exe ambpdb pqr p 1MJC prmtop aatm lt 1MJC inpcrd gt 1MJC hbuild pdb Finally the molecular system is subjected to s00 steps of minimizations with the sander module of Amber 9 Case D A et al ace using the Generalized B
11. itions depend on the burial of residues within the protein Residues are classified as core boundary or surface according to their solvation radius see methods Traore et al Bioinformatics 2013 For this purpose the atomic radii set included in the last column of the structure file model pdb is used According to this stratification the amino acids residues 19 and 30 are then classified into the core layer while the residue 17 is defined in the boundary layer Mutable residues in the core residues 19 and 30 are allowed to mutate to hydrophobic amino acids V L 1 F M Y W boundary residues residue 17 to hydrophilic amino acids S T D N E Q H K R and surface residues no residue in this example to both sets In addition the alanine type and the wild type residue are considered at all mutable positions The others residues of the core and the boundary regions residues 455575 8 IO II 20 21 23 28 29 31 32 33 36 43 44 48 0 SI 2 3 54 66 and 67 are enabled to repack i e flexible residues in order to allow structural rearrangements around mutable residues Below is given the command to accomplish the above tasks as well as the generation of configuration files required to compute the pairwise energy matrix using ospreyz o see osprey2 0 user manual for further details scripts Config_mutation_space pl inp model pdb inp plist The file plist contains the list of mutable residues 17 LAYER 19 LAYER 30 LAYER Th
12. ng commands read 1MJC 1p read inp cplex prm optimize write 1MJC sol REFERENCES Case D A et al 2006 AMBER 9 University of California San Francisco Chen C Y et al 2009 Computational structure based redesign of enzyme activity Proceedings of the National Academy of Sciences 106 3764 3769 Crooks G E et al 2004 WebLogo a sequence logo generator Genome Res 14 1188 1190 Gainza P et al 2013 osprey Protein Design with Ensembles Flexibility and Provable Algorithms Meth Enzymol 523 87 107 Hawkins G D et al 1996 J Phys Chem 100 19824 Schr dinger L 2010 The PYMOL Molecular Graphics System Version 1 3r1 i Although this is not described in this paper for curiosity other Python scripts wcsp2qp py wcspzsat py wesp2sat support py are provided in the scripts directory to translate to cplex Quadratic Programming format and to MaxSAT fath two different encodings Vill APPENDIX INSTALLING AND PATCHING OSPREY 2 0 This is required only if the patched provided version in Osprey2 o directory does not work on your system You can download the Java sources of osprey 2 0 from the following web site http www cs duke edu donaldlab software osprey request download html Once extracted the files to patch are in the src directory of the osprey 2 0 installation directory the path to this directory is assumed to be available in the OSPREYHOME environnement variable patch SOSP
13. orn Surface Area GB SA implicit solvent model Hawkins et al 1996 A harmonic constraint with force constraint of 1 kcal mol is applied to heavy atoms during this step in order to remain close to the starting conformation Underneath is given the input script used to perform this task with the sander module of amberg imin inp file Initial minimization amp cntrl imin maxcyc neyc ntb igb rbornstat gbsa intdiel cut ntr restraint wt restraintmask 1 500 250 0 ES E SS ou N i ll PRPRPBARPRPA o o li The commands for running the minimization step followed by the generation of the resulting structure are SAMBERHOME exe sander O i imin inp o 1MJC min out c 1MJC inpcrd p 1MJC prmtop r 1MJC restrt ref 1MJC inpcrd SAMBERHOME exe ambpdb pqr p 1MJC prmtop aatm lt 1MJC restrt gt model pdb The minimized structure model pdb can be visualized using pymol Schr dinger 2010 for example After checking the minimized structural model we can address the definition of mutation space using a metric based on the residue depth in the molecule its distance from solvent SELECTION OF SEQUENCE CONFORMATION SPACE Next it is necessary to determine which residues to mutate and which ones to repack as well as the amino acid type to be considered at each mutable position 3 mutable residues residues 17 19 and 30 are selected in this example IMJC Allowed mutations at these selected pos

QuickStart guide

Contents

Download Pdf Manuals

Related Search

Related Contents