Home

User Manual - Lewis Kay`s group at the University of Toronto

image

Contents

1. a compaction system is included which starts first by keeping in the initial pool only structures that have been selected at least once by ENSEMBLE after any of all performed runs so far If this number is higher than the user defined maximum size of the initial pool flag POOLSIZE only some of the selected structures are chosen in a random way On the other hand if this number is lower than 1 5 times the size of the selected ensemble flag NB STRUCTURES some other conformers are randomly selected from the initial pool The way conformers are selected from the initial soup and included into the initial pool 1s random The initial soup is split into several selection groups The program randomly chooses a selection group then a random conformer from this group All conformers of a group have the same probability to be chosen and every time a conformer is selected it goes to the downstream selection group Initially each selection group is attributed an a priori probability to be chosen A probability of 1 is given to pool 0 where structures have never been selected and 0 to the last one where structures have been selected the user defined maximum number of times flag MAX NUMBER SELECT Decreasing probabilities are given to the intermediate groups The probability for a group to be chosen depends on its a priori probability but also on the number of conformers it contains This latter 1s taken into account to define the a posteriori
2. IL ZZ A TILON ENSEMBLE requires a single parameter file which contains all information necessary for running it properly and efficiently The name of this parameter file must be given as an argument when launching ENSEMBLE The parameter file is made of several sections that are intended for a specific aspect of the program Here is a summary of all options and their meanings ENSEMBLE environment The ENSEMBLE environment section contains the main information related to the system environment of ENSEMBLE RUN WORK The RUN parameter corresponds to the directory where all results will be stored These latter encompasses the initial pools used along all ENSEMBLE runs and the selected ensembles as well as their energies In the RUN directory is also found the Save directory that keeps all information necessary to re run ENSEMBLE in the event of a crash and the PDBlist directory that contains all initial pool the program used Moreover the Analysis directory is created upon launching the analysis of an ensemble The WORK parameter is the directory where all calculations will be performed and temporary files crucial for the C core of the program stored This directory 1s like a trash bin which is entirely deleted at the end of calculations 1 STORE_DATA SEQUENCE CSP_FILE FREQ SAVE The name of this parameter corresponds to the directory where the CDP files are stored This directory is created in the directory o
3. Table 1 and only the modules with the highest priority lowest rank are activated The first step of the loop consists of saving the user defined data parameters the initial pools utilized the results selected conformers energies and the state of the different modules Then the files required by the C core of ENSEMBLE are written These latter encompass one main parameter file a starting weighting file three protocol files and one parameter file per activated module For some modules ENSEMBLE needs the structural file of the conformers of the initial pool In this case these latter are extracted from the CSP files and stored locally in the pdbs subdirectory of the WORK directory The C core is then able to perform calculations properly The output provides some information about the weighted energies of the different activated modules as well as a weighting file that contains the proportion of each conformer of the initial pool in the selected ensemble All this information 1s retrieved by the wrapper In the C core the energy of module lt is weighted by a faith factor The wrapper removes this faith factor to obtain the real or effective energy e for each module This effective energy is compared to the target energy e for the same module Table 1 The module is considered to fit 1 e the back calculated values fit the experimental data if 7 is lower than m E The MAX_RELATIVE_WEIGHT flag of the parameter
4. as individual The following table describes the way PRE restraints must be provided to ENSEMBLE All labels are the same than for NOE data Residue 1 Atom 1 Residue 2 Atom 2 Aver Low Up a T 6H 12 1 2 5 2 5 2 CD 8 H Laa Da PES 59 OD1 38 H 11 2 11 2 da 59 OD1 43 H 13 29 Zaid 292 5 PREratios rPRE The ratios of R2 values yielded by PRE experiments are related to the ratios of distances See reference 2 for more details ENSEMBLE computes all possible ratios and this is the reason this approach can be rather time demanding The way of providing restraints to the program is similar to the R2 restraints The approach is quite useful when the tumbling time 7 is unknown or cannot be determined Residue Atom R2 4 N 5 29162 5 N D220172 6 N 5330 19 7 N 9 53548 8 N 6 46877 6 R2 Residue Atom 20 20 R2 4 N GEN NES 9 N 6 225072 6 N D02079 7 N 9 53548 8 N 6 46877 7 J coupling Resi Atoml Res2 Atomz Res3 Atom3 Res4 Atom4 Jo Low Up 1 P N 2 CA 2 E Og 0 0 2 C 3 N 3 CA gt C Oe 2 0 0 3 e 4 N 4 CA 4 C pM 0 0 4 E 5 N 5 CA t E 5 14 0 0 8 Solvent accessibility _ Res o Atom 4 Group Access Low Up Asym Weignt 1 CA 1 1 178 118 118 1 0 1 0 2 N 1 0 000 0 00 0 00 1 20 1 0 e N AND 3 HN AND 3 CA AND 3 HA1 AND 3 HA2 2 14 021 1 40Z T2402 1 0 1 0 2 RES 2 74 021 T402 T402 1 0 D 9 Hydrodynamic radius For the hydrodynamic radius the restraint value has to be indicated di
5. back calculate the data from a structural model ENSEMBLE makes use of different predictors See Table 2 The installation process indicates whether the installation of each of these predictors proceeded successfully 7 Installation of the main modules These modules encompass the main wrapper that will have to be launched as well as all PERL libraries necessary to run it 8 Compilation of the C core of ENSEMBLE 9 Link to the wrapper If you are the administrator a link to the wrapper is created in the usr bin directory so that all users will be able to run ENSEMBLE Otherwise an environment variable 1s added in the cshrc and or bashrc files of your home directory J Couplin Solvent dubi accessibility Table 2 Predictors used within ENSEMBLE to back calculate data The star means that back calculation of data is internally computed by the C core of ENSEMBLE For more details about the methods utilized see 1 and 2 A http www embl em de contact php lang de amp Cat Contact When running ENSEMBLE for the first time the RUN directory specified in the main parameter file is created as well as the subdirectories Results where all results selected ensembles and related energies are stored PDBs where all the initial pools used during the ENSEMBLE runs are stored and Save where the data necessary to restore the progression of the calculations in the event of a crash are stored If ENS
6. file defines the final ratio between the faith factor of a module that would always fit and the faith factor of a module that would never fit over all performed trials Before the first trial of each run the faith factor of all modules is set to 1 After each trial the modules that do not fit the experimental data have their faith factor increased by a factor which corresponds to a linear progression of the value specified by MAX RELATIVE WEIGHT while the faith factor of modules that fit remains the same Then the faith factor of all activated modules is decreased so that the lowest one is always 1 Finally for ENSEMBLE efficiency the faith factors of all activated modules are scaled so that the sum of the weighted energies is equal to a pre defined energy flag TARGET_ENERGY Actual Faith Factor ENSEMBLE Energy Effective Energy Fitornot Updated Faith Factor Module A B A B A A 15 0 B 7 5 A Trial 1 1 000 1 000 4 050 11 052 4 050 11 052 X 1 000 3 250 Trial 2 2 502 8 131 12 516 86 906 5 002 10 688 1 000 5 500 Trial 3 1 568 8 622 21 162 66 017 13 496 7 657 X 1 000 7 750 Trial 4 1 373 10 640 12 352 100 731 8 996 9 467 X 1 000 10 000 Trial 5 0 965 9 647 11 558 64 642 11 977 6 701 X uus Actual Faith Factor ENSEMBLE Energy Effective Energy Ftor not Updated Faith Factor Module A B A B A B A 125 B 35 A B Trial1 1 000 1 000 14 551 4 940 14 551 4 940 1 000 1 000 Trial 2 Trial 3 Trial 4 6 884 2 295 84 258 14 021 1
7. preparing all files required to properly run ENSEMBLE and retrieving the crucial information selected ensemble energies The wrapper has additional function in adjusting the faith factor and attributing more weight to modules for which the back calculated values do not fit experimental data yet and less weight to modules for which the back calculated values do fit The following list summarizes the sequential steps of the wrapper Compaction of the initial pool if too large gt 5000 conformers Generation of new structures added to the initial pool at regular steps Preparation of files necessary to run ENSEMBLE Running ENSEMBLE C core oe a 9 A Adjusting parameters faith or terminating the process if back calculated values fit experimental data An interesting aspect of this wrapper concerns the integration of restraints during the selection process Each module is attributed a rank value and its restraints are integrated into the calculations only when the modules with lower rank i e higher priority have back calculated values that fit the experimental data 4 Requirements To run properly ENSEMBLE needs a main parameter file as well as one restraint file per selected module The pathway of these latter is specified in the parameter file which also contains the name of all the options related to the environment results directory number of structures in the selected ensemble and the protocols inclusion of
8. to the initial pool after each run There are three ways to add structures in the initial pool 1 Random selection from the initial soup flags NB PICK NEW 2 Generate with TraDES note that TraDES generated conformers are also inserted into pool 1 of the initial soup so that they can be selected again for future runs flags NB GEN NEW 3 Generate an unfolded state starting from one random structure of the best ensemble found so far flag NB UNFOLDED In this case the process of unfolding the protein makes use of simulations that depend on the temperature of the system flag UNFOLD TEMP and the timestep flag UNFOLD STEP ENSEMBLE modifies the value of the temperature to get even more unfolded extended structures by adding to the specified value a random value so that the final temperature can range between the specified temperature and twice its value The same procedure is used for the timestep If the generation of structures by TraDES is activated then the program will first start initiating four trajectory files one based on alpha helices one on beta sheets one on coils and the last one on alpha helices beta sheets and coils simultaneously These files are crucial to generate conformers and are immediately stored in the Save directory The choice of the trajectory file to generate new conformers is random 4 Finally before running the calculation loop the target energy of the selected modules is estimated
9. 0 10 12 15 21 21 23 24 27 27 27 27 28 28 29 29 29 30 30 31 31 31 32 33 36 Inside this manual you will find some light bulbs The yellow bulb indicates critically important information that should be included in the first reading while the purple light bulb highlights notes that are less critical and can be read at a later time ENSEMBLE Laboratory of J D Forman Kay A tool for describing disordered protein states Hospital for Sick Children University of Toronto INTRODUCTION 1 Historical development ENSEMBLE was created to describe the structural ensembles of disordered protein states including unfolded and intrinsically disordered states The development of ENSEMBLE was designed to meet the need to incorporate multiple experimental data into computational calculations of these states which are important for understanding protein stability and aggregation in the case of unfolded states and for understanding biological function and protein recognition in the case of intrinsically disordered states The ENSEMBLE concept first emerged in 2000 from the work of James Choy et al who wrote an algorithm to optimize the weights for each conformer in a set of pre generated structures in order to satisfy the available experimental data This original version was optimized later on by Chris Neal who considerably increased the speed of the program by implementing new pseudo energy minimization algorithms in C language and Jo
10. 2 240 6 109 X 1 000 1 000 Trial 5 5 450 5 450 78 291 20 108 14 365 3 690 Table 3 Faith factor variation along one ENSEMBLE run made of 5 trials and restrained by 2 modules A and B MRW indicates the maximum relative weight The target energy is specified between brackets in the Fit or not column The effective energy corresponds to the ratio between the ENSEMBLE energy and the actual faith factor Table 3 shows two examples of the evolution of the faith factor for two modules A and B along two ENSEMBLE runs each made of 5 trials The upper table helps explain the notion of maximum relative weight While module A always fits the experimental data module B never fits them Hence for the last trial the weight of module B is MWR 10 times the weight of module A The lower table was obtained with a MRW equal to 5 giving an increase of the faith factor of 1 0 per trial if a module does not fit After trial 1 none of the two modules fits Hence their faith factor is increased by 1 and becomes 2 As the lowest faith factor is then equal to 2 all faith factors are decreased by so that the lowest faith factor of all modules is 1 After trials 2 and 3 only module B fits leading to an increase of 2 of the faith factor of module A After trial 4 module A fits but not module B The faith factor of module A is decreased and the faith factor of module B is increased giving in both cases a faith factor of 2 which then become 1 PAPA METER
11. Ca Ca distances of the ensemble e caca map eps Graphical representation of the caca map stdv eps information contained in the caca map txt file Beside the main analyzing program analyzens some tools have been designed to get simple information about an ensemble The following list explains their role e rg list_file v This program displays the average radius of gyration and standard deviation of all conformers of the list file specified as argument This latter can be a structural file or an ENSEMBLE file If the v flag is specified then the radius of gyration of each conformer will also be indicated e extract_csp f CSP file p pathway x prefix 1 1 2 3 8 10 This program allows extracting conformers from a CSP file The optional flag p indicates the directory absolute or relative pathway in which conformers will be extracted The x flag is optional and specifies the prefix used to generate the name of the extracted conformers By default conf_ is used Finally the 1 flag optional restrains the number of conformers to be extracted by specifying a list The first conformer of each CSP file starts at 1 If a number higher than the number of conformers in the CSP file is specified then the program will issue a warning before stopping e make_csp f list_file o output csp h This program acts in the opposite way of extract_csp by generating a CSP file from a list file The o flag i
12. EMBLE is run again after a crash it will first check in the Save directory for the presence of the file ensemble dat which contains all user defined parameters Moreover once calculations are finished a directory called Analysis is created upon launching the script analysis pl found in the ENSEMBLE_TOOLS directory The PERL script performs a couple of analyses on an ensemble that fits all experimental data CSP files List files OY ife j p Selected Ou EpUe Ensemble A naels KO m AVA be A an oye n m 019 j a o 154 ext c aye yr Y a aa Fig 1 The ENSEMBLE machinery This is a schematic representation of the way the ENSEMBLE wrapper works ENSEMBLE Laboratory of J D Forman Kay A tool for describing disordered protein states Hospital for Sick Children University of Toronto The C core of ENSEMBLE performs the selection of an ensemble from all conformers present in the initial pool These latter are randomly selected from the initial soup which 1s fed either from the structures provided before running ENSEMBLE CSP or list files and or generated by TraDES package The selection process is iterated several times the flag CHOOSE NEW of the parameter file allows choosing the number of times with the same initial pool before selecting other conformers that will increase its size Each iteration will be referred to as a trial and a set of trials as a run To prevent an initial pool from becoming too large
13. HEHEHENENESESEBESESEEHESHSHHEH Installation directory Return home mkrzemin Softwares ENSEMBLE The specified directory does already exist Do you want to overwrite it Y N y warned and it is then proposed either to choose another directory or to overwrite the existing one If the computer cannot create or overwrite the specified directory permission issue you will be invited to choose another one 2 The C compiler ENSEMBLE is written in C and a compiler is required to install it The proposed compiler by default is gcc already efficiently tested and installed on most Linux distributions The installation program tests whether the proposed compiler comprises all the basic C libraries required by ENSEMBLE If the test succeeds the installation process goes directly to the next step otherwise another compiler is requested 1 3 The PERL version The wrapper is written in PERL Hence the installation process searches in all pathways specified in the environment variable PATH for all PERL versions If several of them are found a choice is C Compiler Return gcc given If only one version is found no Several PERL versions have been found on this computer Please select the one you wish to use 1 PERLS 10 1 choice is given and if no PERL 2 PERLS 8 8 version is found the installation Your choice 2 process is aborted The PERL version is tested in particular the libraries crucial for the wra
14. Laboratory of Julie D Forman Kay User Manual Dr Micka l Krzeminski November 2012 ENSEMBLE Laboratory of J D Forman Kay A tool for describing disordered protein states Hospital for Sick Children University of Toronto ENSEMBLE Version 2 1 Copyright The Hospital for Sick Children 2012 Distribution of substantively modified versions of any module of this software package is prohibited without the explicit permission of the copyright holder Any use of this work or derivative works in whole or in part for any commercial purpose of for monetary gain is prohibited NO WARRANTY This software package is provided as is without warranty of any kind expressed or implied ENSEMBLE Laboratory of J D Forman Kay A tool for describing disordered protein states Hospital for Sick Children University of Toronto Introduction 1 Historical development 2 The main core of ENSEMBLE 3 Running ENSEMBLE iteratively 4 Requirements Installation Deeper into ENSEMBLE Parameterization 1 ENSEMBLE environment 2 ENSEMBLE protocol parameters 3 Module parameters Module restraints Chemical Shift Residual Dipolar Coupling RDC Nuclear Overhauser effect NOE Paramagnetic Relaxation Enhancement PRE PRE ratios rPRE R2 J coupling Solvent accessibility Se e E M S SP Hydrodynamic radius 10 Small Angle X ray Scattering Running ENSEMBLE 1 Locally 2 Ona cluster 3 After a crash Results analysis References 1
15. RRAAgRAgR QT Choy W Y Forman Kay J D J Mol Biol 2001 308 1011 32 Marsh J A Neale C Jack F E Choy W Y Lee A Y Crowhurst K A Forman Kay J D J Mol Biol 2007 367 1494 510 3 4 5 Marsh J A Forman Kay J D J Mol Biol 2009 391 359 74 Marsh J A Forman Kay J D Proteins 2011 Mittag T Marsh J Grishaev A Orlicky S Lin H Sicheri F Tyers M Forman Kay J D Structure 2010 18 494 506 6 Marsh J A Dancheck B Ragusa M J Allaire M Forman Kay J D Peti W Structure 2010 18 1094 103 7 8 9 7804 5 10 11 12 768 773 13 Pinheiro A S Marsh J A Forman Kay J D Peti W J Am Chem Soc 133 73 80 Neal S Nip A M Zhang H Wishart D S J Biomol NMR 2003 26 215 40 Marsh J A Baker J M Tollinger M Forman Kay J D J Am Chem Soc 2008 130 Lee B Richards F M J Mol Biol 1971 55 379 400 Garcia De La Torre J Huertas M L Carrasco B Biophys J 2000 78 719 30 Svergun D Barberato C Koch M H J Journal of Applied Crystallography 1995 28 Feldman H J Hogue C W Proteins 2000 39 112 31
16. alue displayed in Table 1 e The R2 energy is computed as e 1 In this equation K is a user defined constant and r is the correlation coefficient between back calculated data and experimental data Hence for the target energy the user can choose the factor c which represents the minimum correlation above which a selected ensemble fits the experimental data e For scalar coupling J we merely considered a reasonable error of 0 4 Hz As the pseudo energy function is harmonic then the target score has the value displayed in Table 1 e In the cases of Hydrodynamic radius Ry and Solvent accessibility we arbitrarily define a target energy based on the specified error range e Finally for SAXS data the tolerated value corresponds to 10 times the sum of all squared intensities 3 Running ENSEMBLE iteratively The C core of ENSEMBLE is more efficient in terms of time when the number of structures in the initial pool is rather low up to 5000 Nevertheless ENSEMBLE has been designed specifically for disordered proteins and obtaining a reasonable sampling of conformational space for these is not possible with such a low number of structures Hence we developed an approach to run ENSEMBLE iteratively after modifying the initial pool by adding new structures and rejecting irrelevant ones so its size is kept reasonable This is performed within what is called the wrapper which consists of code for managing the initial pool
17. e Marsh who significantly extended the capacities of the program including integrating an iterative PERL routine that leads to better conformational sampling This latter version has been recently modified and enhanced by Micka l Krzeminski and includes an easy user interface as well as new approaches to treat data and analyze results The many experiments that yield information about disordered states of proteins together with the computational algorithms for predicting the same type of information from a structural model are utilized within ENSEMBLE to enable determination of an ensemble of conformations that represent the disordered state ENSEMBLE has primarily been developed using the unfolded state of the N terminal SH3 domain of Drk downstream of receptor kinase from Drosophila for which significant experimental data has been obtained It has more recently been successfully applied to intrinsically disordered proteins to gain insight into their structural propensities and make structure function correlations including the cyclin dependent kinase inhibitor Sicl both free and in complex with Cdc4 and regulators of protein phosphatase 1 PP1 9 The current version of ENSEMBLE is maintained by Micka l Krzeminski Please report any bug to mickael krzeminski sickkids ca 2 The main core of ENSEMBLE ENSEMBLE is a program written in C language which aims to choose from a large set of conformations called the initial pool an ensemble t
18. f the CSP or list file when permitted otherwise it is created in the Save directory This is the sequence specified with one letter code The sequence is used to generate initial trajectories with TraDES It is possible to include into the calculations pre generated structures This is done via the CSP_FILE flags A simple example of specifying CSP and list files to include looks like the following CSP_FILE_1 home user project X file1 csp CSP FILE 2 home user project X file1 list CSP FILE 3 home user project X file2 csp These latter can be CSP or list files The list files consist merely of a list of the absolute pathway of the structural files The program first checks whether each structure exists and creates from this list a CSP file with the same name followed by csp that is directly stored in the same directory as the list file if you have write permission otherwise in the Save directory Note that each CSP FILE flag is followed by a A number It is important to keep these numbers associated with the same files in the event of a crash and re start of the program At regular times or numbers of performed runs ENSEMBLE saves the progression state of the calculations This flag specifies the save frequency If a time is demanded then the value must be followed by m e g 120m for every two hours otherwise the value is considered as the number of runs 2 ENSEMBLE protocol parameters After these first pa
19. hat fits the available experimental data To achieve this goal the program makes use of a switching Monte Carlo process embedded in a simulated annealing protocol From the initial pool ENSEMBLE starts by randomly choosing a user defined number of conformers which constitute the first ensemble The agreement between the experimental data and the ones back calculated for each conformer of Qo is reflected through a scoring function the lower the better Then one structure from Qo is swapped with one structure from the initial pool that does not belong to Qo giving a new ensemble ENSEMBLE compares Q and and accepts or rejects the newly defined ensemble according to a Metropolis criterion that depends on the current temperature of the system The process is repeated a user defined number of times and the temperature is geometrically decreased at each step so that it becomes very close to zero at the end of the process The scoring function is the sum of the weighted pseudo energy terms attributed to each data type which we will refer to as modules The weight attached to each energy term is called the faith factor and allows giving more or less importance to a given module PRE ratio Solvent Accessibility 1 gt low up N 1 up low Table 1 Modules utilized within ENSEMBLE and their target energy values N is the number of data points n is one specific data value For chemical shifts o is the average standa
20. iori probability of each selection group of the initial soup If an algorithmic progression is chosen A the a priori probability of pool n is obtained by removing from the probability of pool n 1 a certain value given that pool has a probability of 1 and the last pool a probability of 0001 For instance if MAX NUMBER SELECT is set to 4 then the probabilities will be 1 0 0 75 0 50 0 25 and O for pools 1 2 3 and 4 respectively If a geometric progression is preferred G then the a priori probability of pool n 1s obtained by dividing the probability of pool n given that pool 1 has a probability of 1 and the last pool a probability of 0001 This latter is finally set to 0 for consistency MAX RELATIVE WEIGHT Final ratio between the faith factor of a module that would always fit and the faith factor of a module that would never fit once all trials have been performed SW ROUNDS The number of attempted swaps in the minimization process 1 Finally the parameters related to the different modules are specified In this section you 3 Module parameters first have to choose which modules you want to include in the calculations 1 e for which you have experimental data A value of 0 zero means no inclusion while a value of 1 will take the module into account Some options are common between all modules reference file rank starting faith and the scale up factor Most of these modules have the following common option
21. m cannot be solved please address your problem to us and we will do our best to help you Here we provide a simple method you might try and adapt to your needs First of all make sure that the ENSEMBLE directory 1s accessible from each node of the cluster Then create a script that contains the important environment variables which have been set when installing the program on your system Two examples are given in the ENSEMBLE directory one in C shell the other one in bash The following lines display the C shell version bin csh set job nb 0 foreach par_file path_1 file_1 par path_2 file_2 par job nb 1 printf bin csh n gt submit_S job_nb job printf setenv ENSEMBLE ENSEMBLE DIR M gt gt submit S job nb job printf SENSEMBLE ensemble Spar file n gt gt submit_S job_nb job SUBMIT STATEMENT submit S job_nb job Sleep 1 end After the installation the ENSEMBLE_DIRECTORY is automatically replaced by your ENSEMBLE directory However the submit_statement depends on the queuing system of the cluster you use and has to be manually modified 3 After a crash In the event of a crash simply re run ensemble with the same parameter file you used the first time you ran ENSEMBLE The program will be able to retrieve the exact state since the last time 1t was saved since the parameter file you specify contains the pathway of the RUN directory where the Save directory is located The program first check
22. modules minimization A template parameter file is provided in the installation directory of ENSEMBLE Another example of a parameter file can also be found in the example directory this one has to be used with the provided example in order to verify the correct installation of the program A full description of the parameter file and the restraint files is given later in this document NS AN IL IL 78 TP Ir INI The ensemble tar file that can be downloaded from http pound med utoronto ca JFKlab contains all the installation processes and is compatible with all Unix based platforms First untar the file with the following statement tar xf ensemble tar This command extracts two files 1n the current directory ensemble tar and ensemble inst csh The latter is a C Shell script that is run by simply typing ensemble inst csh at the prompt The installation process will lead you through the following steps 1 The location in which the program will be installed By default the location in which ENSEMBLE is installed is the home directory of the current user followed by Softwares ENSEMBLE This can be easily modified by entering another pathway If a previous installation has already been done in the specified directory you are mkrzemin amp gauss home mkrzemin Software ensemble inst csh BSEHSSHHSBESHEHHEHESEHEHSSHHEHENEHHHHEHEHHENHE Welcome to ENSEMBLE installation BEBESESSESUHE
23. pper to run properly If one is missing the user is warned and the installation process is aborted 1 4 Creation of the documentation directory In this directory this current user manual pdf file and the updates are found 5 Installation of the ENSEMBLE tools Some tools have been designed for a specific use within ENSEMBLE After the installation is successfully completed the environment variable SENSEMBLE TOOLS is created for directly accessing all the useful scripts located in the tools directory The tools include the following programs More details are given at the end of this document Managing Creates a CSP file from a list file Extracts some or all structures from a CSP file CSP files Combine two or more CSP files into one file Combine two or more CDP files into one file et getin 0 S Provides some information for any file created by aed ENSEMBLE accessor Calculates the solvent accessibilty of each atom of a accessurf protein Analysis Executable designed to fully analyze an selected ensemble Calculate the all Ca Ca distances of an ensemble pdb2seq Display the sequence of a protein from the PDB E Display the evolution of the best ensembles found along u ENSEMBLE runs se ict Calculates the secondary structures in an ensemble using H STRIDE CSP stands for Concatenated Structure Pool The CSP files contain the coordinates of structures in binary format 6 Installation of the predictors To
24. probability as following eo 09 ON Eq 1 gt ie a Where Siis the a posteriori probability of pool i niis the number of conformers in the pool i piis the a priori probability of pool i Nis the total number of pools Finally a newly generated conformer is automatically inserted into the initial pool as well as into pool 1 of the initial soup If some list files are provided flag TRAJECTORY_FILE_X they are first transformed into CSP files which are directly saved in the directory of the trajectory file 1f permissions allow it otherwise in the Save directory with the name of the trajectory file followed by csp From all CSP files the program back calculates the data for the selected modules and keeps them in memory These data are also immediately saved into the directory specified in the parameter file flag STORE_DATA as CDP files Concatenated Data Pool If there is write permission in the directory where the CSP file is located then the STORE_DATA directory is created there Otherwise it is created in the Save directory The name of the CDP file is the same as the CSP file followed by the name of the module For instance from filel csp the CDP file that contains the back calculated chemical shifts is called filel csp SHIFTX 1 ENSEMBLE enables one to choose the number of structures in the initial pool for the first run flags NB PICK START and NB GEN START and the number of structures to add
25. rameters the ENSEMBLE protocol parameters are specified POOLSIZE NB_STRUCTURES CHOOSE_NEW NB_PICK_NEW NB_PICK_START NB_GEN_NEW NB_GEN_START NB_BEST_TO_UNFOLD NB_UNFOLD_PER_CONF The maximum size of the initial pool The number of conformers in the final ensemble The number of times ENSEMBLE will attempt to find a set of structures from the same initial pool that can fit experimental data The number of structures to randomly choose from the initial soup to put into the initial pool after each run The number of structures to choose from the initial soup before the first run The number of structures to generate with TraDES after each run The newly produced conformers are inserted in pool 1 of the initial soup and put into the initial pool The number of structures to generate with TraDES before the first run The newly produced conformers are inserted in pool 1 of the initial soup and put into the initial pool After each run a conformer of the best ensemble found so far is randomly chosen This structure is used by TraDES as a starting point in an unfolding process This flag specifies the number of conformers to randomly pick up This is the number of unfolded conformers to generate from each randomly selected conformer flag NB_BEST_TO_UNOLD MAX_NUMBER_SELECT The number of times each conformer can be chosen from the initial soup and put into the initial pool PROB_PROG This parameter controls the a pr
26. rd deviation of the atom type n data retrieved from the Biological Magnetic Resonnance Data Bank and for R2 K is a user defined constant and c is the target correction ENSEMBLE currently includes ten modules and for each of them a target energy 1s set below which the back calculated data are considered to agree with the experimental data The target energy is always compared to the non weighted energy calculated for the modules Table 1 describes the different modules as well as their target energy The target energy is based on the equation that governs the energy calculation of its corresponding module e In the case of chemical shifts one fourth of the average standard deviation for a given atom type is tolerated The calculation of the energy for this module is based on a harmonic equation giving one sixteenth of the square of the average standard deviation e For RDC data a value of a quarter is tolerated per restraint As the equation is harmonic the target energy ends up with a value of one sixteenth of the total number of restraints e In the case of NOE and PRE restraints the equation that gives the penalty energy is also harmonic The program tolerates up to a fourth of Angstrom per restraint giving a target energy of one sixteenth per restraint e For PRE ratio rPRE data the number of data points is equal to N N 1 and the energy function is harmonic Based on the NOE energy function we obtain the v
27. rectly in the parameter file flag HYDRO_RH 10 Small Angle X ray Scattering A PERENNE Intensity Error 0 00000 ASYM 0 030000 D be Looe 0 0 1 0 0 035000 0 154920 0 0 1 0 0 205000 0 018079 Du kat 0 210000 0 016946 0 0 lx E TINA INT G INS ER MAT o TL DU 1 Locally Once the parameter and the module restraint files are ready ENSEMBLE can be run The program is accessible from any location of the system since the installation process created the link file in the usr bin ensemble 1f the user who installed it was the administrator or added the pathway of the program into the PATH environment variable of the user To run ENSEMBLE locally simply type the following statement ensemble parameter file The absolute or relative pathway of the parameter file can be specified After each run ENSEMBLE checks the remaining memory and will crash if there 1s not enough space 2 Onacluster ENSEMBLE has been designed to run on a single CPU However it is more efficient if it is launched on a cluster as the generated structures will better sample space and the selection from the initial soup will produce different initial pools Moreover a computational cluster is more efficient for adjusting or testing different parameters It is difficult to define a single way that would work with all possible clusters as each cluster is governed by specific rules Ask the administrator of the cluster in case you have problems If the proble
28. s FLAG RANK REFERENCE EXEC The restraints can be included into the calculations in a non simultaneous manner Thus it is possible to let the program try to fit some restraints before adding some others As previously explained the rank reflects the order of inclusion of restraints The lower the higher priority If the value of the flag is O then the module 1s not considered This flag indicates the pathway of the file that contains the experimental data which will be targeted by ENSEMBLE See next section for the format of restraints Refer to MODULE RESTRAINTS 1n the next section to see how to prepare the restraint files Location of the executable to predict the experimental observable based on the structural models This parameter is currently not used as the 4 program makes use of only one predictor per module However future versions of ENSEMBLE could allow a choice between different predictors ENSEMBLE Laboratory of J D Forman Kay A tool for describing disordered protein states Hospital for Sick Children University of Toronto zs MODULE RESTRAINTS ENSEMBLE can accommodate up to ten modules The restraints attributed to each of them must follow a strict format readable by ENSEMBE A line starting with or is a comment that is ignored In the following examples the first line must be commented if you want to include it in the restraint file 1 Chemical Shift Residue A
29. s optional and allows specifying the name of the output CSP file A relative or absolute pathway can be specified By default the name of the output file is the name of the list_file followed by the extension csp The h flag displays some help All specified conformer of the list file must exist and atoms must have the same order Otherwise the program will issue a warning before stopping e get info ENSEMBLE file ENSEMBLE file This program provides the characteristics of any file generated by ENSEMBLE For instance if a CSP file 1s given as argument get info yields the number of conformers and atoms that 1t contains e pdb2seq structural file This tool display the one letter sequence of a structural file given as argument e read best best dat The best dat file found in the Results of the RUN directory contains all best ensembles found along the calculations This tool reads best dat and displays the evolution of the best ensembles e read ens ENSEMBLE file This tool displays the conformer numbers found in an ENSEMBLE file given as arguments as well as its energy e ss distr list file By making use of STRIDE this tool displays the secondary structure distribution of a set of conformers The input can be a PDB list file or an ENSEMBLE file Some other analysis tools will be provided in the future 1 2 SOON CAS AAVAAAPAIPIAAVAVAVAAAVAAAAPAAAPAAAAAAZAAAAAPAAAA7APARAAAAAAAgRAAQPmAoZAARFRNAg8
30. s whether the ensemble dat file can be found This latter contains all user defined parameters and its presence means the other CSP and CDP files can be rapidly read and recorded i IE S amp UL IL DS ANALYSIS The software 1s provided with a number of programs to analyze the different ensembles calculated along the process most interesting being the ensembles that fit all the available experimental data These tools are found in the tools directory of the ensemble directory During the installation an environment variable has been automatically set so that you can directly access this directory from anywhere with SENSEMBLE TOOLS An executable has been specifically designed in order to facilitate the full analysis of an ensemble analyzens This latter needs as argument the name of an ENSEMBLE file which was saved in the Result directory For instance to analyze the best ensemble found so far simply type ENSEMBLE TOOLS analyzens home user RUN_directory Results best ens Following this statement a directory named BEST is created in the same directory as the best ens file and all results will be stored in it The analyzens script will output several files e analyze txt Contains information about the protein e radiusGyration txt Contains the radius of gyration of each conformer of the ensemble as well as the average e ss distr txt Contains the secondary structure distribution of the ensemble e caca_map txt Contains all averaged
31. tom Chemical Shift Error 3 E Ld To LO Dod 3 CA 52 43 0 0 S3 CB L920 0 0 3 H Se dl O0 4 e br 1 0 0 0 2 Residual Dipolar Coupling RDC Residue I Atom t Residue 2 Atom 2 RD 4 H 4 N m 016 0 S H 5 N SOs LOS 6 H 6 N cU LOO 8 H 8 N 0 1826 3 Nuclear Overhauser effect NOE The data extracted from a NOESY spectrum or PRE experiments can be interpreted in terms of inter proton distances In the table below Atoms 1 of Residue 1 and Atom 2 of Residue 2 correspond to the two atoms involved in the restraint Aver indicates the mean distance and Low and Up specify the distance range between below and above the average distance respectively The keyword OR denotes ambiguous restraints and the wildcard can be used Of note ambiguous restraints using the keyword OR and wildcarded restraints are treated differently With ambiguous restraints the energy of each restraint is calculated and the lowest energy contribution is kept When using a wildcard the energy is calculated using the average position of all corresponding atoms Residue 1 Atom 1 Residue 2 Atom 2 Aver Low Up __ 14 H 17 3HD2 8 0 8 0 0 17 THD 20 H OR 17 2HD2 20 H OR 17 3HD2 20 H 3 0 8 0 0 29 H 36 H 8 0 B ed O 29 H 46 H Oa 8 0 0 4 Paramagnetic Relaxation Enhancement PRE PRE data are interpreted like NOE data in terms of distance restraints However as they are issued from different experimental techniques we keep these two modules

Download Pdf Manuals

image

Related Search

Related Contents

改 定 版  共通取扱説明書 - アイリスオーヤマ  カタログダウンロード  Unicol AP8  Massive Table lamp 43221/30/20  IC-M32 SERVICE MANUAL  Guide de la culture fruitière 2014-2015    Digital Phosphor Oscilloscopes  

Copyright © All rights reserved.
Failed to retrieve file