Home

User Manual

1. Figure 4 The main window with the Goutsias transcription concentration file 2 3 Maximization algorithm tab As explained in 14 1 KInfer uses the Genetic Algorithm in order to maximize the probability density function calculated using the model and the concentra tion data provided by the user Our choice of the optimization algorithms is driven by the fact that our method could theoretically handle a high number of parameters with possible nonlinear relations between them because of that traditional optimization algorithms could have some troubles in finding the maximum of the likelihood function Genetic Algorithms GA 3 4 are population based stochastic optimization technique i e methods that starting from a set of initial guesses determine the next set of possible solutions to the optimization problem based on the results obtained from the preceding set These kind of methods have been designed primarily to address problems that cannot be tackled through traditional opti mization algorithms Such problems are characterized by discontinuities lack of derivative information noisy function values and disjoint search spaces Very briefly in the GA approach the evolution starts from a population of randomly generated individuals Then in each generation the fitness of every individual in the population is evaluated and multiple individuals are stochasti cally selected from the current population ba
2. KInfer is distributed in a jar package so in order to launch the program it is enough to double click on the jar file to run the project from the command line type the following command java jar KTnfer jar The initial application window should appear like the following tic KInfer version v 1 0 Beta File Edit Infer parameters Time series Concentrations Maximization options Initial values Results kl 3 amp FA ki 2 A 5 0 C 6 0 Automatic Model 9 Default ONE Default STE 1 2 3 4 5 6 Et Bo Use manual model Save reactions default reaction Figure 2 Initial KInfer window 2 1 The model tab In the left part of Fig 2 the user can input the set of chemical reactions describing the kinetics of the system with the following syntax aA bB gt a A b B k a B On the left hand side of the arrow the reactants A B and the reac tants stoichiometric coefficients a b are indicated separated by an empty space On the right hand side of the arrow the products A B and the product stoichiometric coefficients a b are indicated After the list of products the user has to specify the name for the rate constant k in the ex ample above associated to the reaction followed by a comma separated list of the partial orders of reaction a G for that reaction The specification of
3. principled handling of the noise inherent in biological data and it allows for a number of further extensions For a detailed explanation of the mathematical formulation of the method we refer the reader to 14 1 The tool consists of four main blocks Fig 1 1 the input interface 2 the model generator 3 the maximization algorithm 4 the output interface Input Time series of reactant concentrations and chemical reactions involved in the system Model generator Constructor Calculator of prediction intervals Calculator of probability density ofrate equations to initialize the rate constants function of the experimental data values Maximizationalgorithn __ Output Calculator of the model parameter which maximize Rate constants strength of noise the probability density function of the experimental observations Figure 1 Tool Architecture In this manual we describe how to install and use KInfer providing also some tutorial examples in order to help users learn how to use the software 2 Getting Started KInfer is a parameter estimation software package written in Java It requires the Java 2 Runtime Environment JRE version 5 or newer or an equivalent JRE KInfer relies upon the JAMA library a cooperative product of the Math Works and the National Institute of Standards and Technology NIST 2 this library is installed within the KInfer directory when the user install the software
4. a specie that is not in the model A numerical implementation with typical parameters is given by the set of ordinary differen tial equations in Fig 10 This system of equations has been used to create the 12 artificial time series of 51 data points between 0 and 5 Typical units might mM for the concentration and minutes for the times but the example could as well run on an hourly scale and with variables of different nature X ax 02X05 X1 to 1 4 X 03X95 4X9 X2 to 2 7 X 05X05 0e X95 X9 X3 to 1 2 Xi 07X95 05 X98 Xa to 0 4 Figure 10 A didactic example of biochemical network with four variables and the system of ordinary differential equations describing it The parameter values are listed in Table 3 The input model for KInfer is not a list of reactions because the kinetics laws contain order of reaction that are different from the default ones In KInfer is possible to input a manual model in the Manual model field of the graphical interface and the model for this example is the following f_x1 f_x2 f_x3 f_x4 ki 1 x317 0 8 k2 x 1 x1 7 0 5 k3 1 x1 7 0 5 k4 1 x2 7 0 75 k5 1 x2 7 0 75 k6 1 x3 7 0 5 x4 7 0 2 k7 1 x1 7 0 5 k8 1 x4 7 0 8 The user has to follow some simple rules 1 2 each row is representing the rate function for a single species in the system all the species in the s
5. the reaction must end with a semicolon While the user is typing KInfer is automating translating the list of chemical reactions into the correspondent rate equations showing the result in the upper right part of Fig 2 with possibly typing errors that prevent the tool to build the correct mass action model The list of the partial orders of reaction is optional if not specified the default value can be chosen by the user between 1 that is the option Default ONE and the stoichiometric coefficient associated with the species that is the option Default STE To load a model definition file into KInfer select the Load reactions item from the File menu This will show a classical open dialog box that can be used to open the directory containing the textual reaction list file usually provided with the kin extension Opening the Goutsias transcription model explained in section 3 the main window should look like in Fig 3 E Kinfer version v 1 0 Beta File Edit Infer parameters Time series Concentrations Maximization options initial values Results Example 2K 273 CS Ae Dt RA eG FA k1 2 AJ55 0 CJ6 0 mRNA gt mRNA M ki Default ONE Default STE f D k5 1 0 D 1 0 DNA 1 0 k6 1 0 DNAD gt mRNA DNAD k3 f_DNAD k5 1 0 D 1 0 DNA 1 0 k6 1 DNA k5 1 0 D 1 0 DNA 1 0 k6 1 mRNA
6. 4 k5 0 01567713 0 00000879 k6 0 4179236 0 0001901 k7 0 08387825 0 00000007 k8 0 5153785 0 000003 sigma 1 04972194 AAAA AAAA Figure 7 The Results tab In the textarea the user can find the list of all the parameters with an inferred value for each of the run of the GA that he she decided to make The user can select the data results from the textarea and simply copy paste them according to his her needs 10 3 Examples Here we provide some validation tests both on simple and complex biochemical networks More examples are provided in 1 For each case study we briefly describe the set of biochemical reactions and we report our estimates of the kinetic rate constants compared with the estimates obtained by other studies and approaches We did not include in the text of this tutorial the experimental and or synthetic time series of the concentrations we used as input of our procedure to infer the parameter They are separately provided as additional files when the user download the KInfer software tool All the results provided here are obtained with a single run of the maximiza tion algorithm with the default values of GA parameters The variability ranges of the inferred kinetic rates are the one estimated by the Stineman algorithm with an error on the concentration measurements that is chosen according to the numeric precision of the time series data 3 1 Gene Transcription In this test we co
7. 588 0 249 o 0 03 0 45 0 44 Table 3 Estimated parameter values for the network in Fig 10 3 4 Lotka Volterra The set of coupled autocatalytic reactions of Lotka Volterra 9 are given in Fig 11 The first reaction describes how a certain predator species Y gt reproduces by feeding on a certain prey species Y the second reaction describes how Y reproduces by feeding on a certain foodstuff which is assumed to be only insignificantly depleted thereby and the third reaction describes the eventual demise of Y gt through natural causes X Y1 E Y1 Y1 k1 0 0001 Y1 12 Y2 Y2 k2 0 01 k3 10 Figure 11 Lotka Volterra example We generated a synthetic dataset of time course of the amounts of X Y Ya to use as experimental input data to KInfer with the following values of the rate coefficients 6 0 0001 62 0 01 03 10 and the following initial amounts Y Y 10 X 10 and Z 0 as in 9 We generated 100 data points at time steps of about 0 3 and the results that we obtained are listed in 4 Parameter Actual value Initial guesses Estimated value k1 0 0001 0 00009739 0 0001035 ee 0 00001 k2 0 01 0 00689 0 00701 0 007 0 k3 10 6 599 6 753 ay 0 2 o 0 01 0 15 0 1 Table 4 Estimated parameter values for the network in Fig 11 14 4 Getting Help If the KInfer program does not function in accordance with the descrip
8. The Microsoft Research University of Trento Centre for Computational and Systems Biology The Microsoft Research University of Trento Centre for Computational and Systems Biology User Manual Alida Palmisano Paola Lecca Corrado Priami The Microsoft Research University of Trento Centre for Computational and Systems Biology KInfer cosbi eu Contents 1 2 Introduction 2 Getting Started 3 21 The model tab italia ee ea ee ee a 3 2 2 Time series concentration tab 04 5 2 3 Maximization algorithm tab o 6 24 Anitial valties Tab ot G hae eead a re a 8 2 07 Resultstab nica id e E EEE EA 10 Examples 11 3 1 Gene Tra s ripti n n 2 0 00002 eee 11 3 2 Transcription Regulation a oa 12 3 3 Didactic example of biochemical network 12 3 4 Lotka Volterra 2 4e daa EY ee ees 14 Getting Help 15 1 Introduction KInfer is a software prototype implementing the parameter estimation method proposed in 14 1 This is a new method for estimating rate coefficients from noisy observations of concentration levels at discrete time points The method is an alternative approach w r t the traditionally least squares estimator based on a probabilistic generative model of the variations in reactant concentration Our method returns the rate coefficients the level of noise and an error range on the estimates of rate constants Its probabilistic formulation is key to a
9. e the usability of the loading concentrations part bic Kinfer version v L 0 Beta File Edit Infer parameters The model Time series Concentrations Maximization options Initial values Results D ir DNAD 4 3 67669596214816 B 0 02899009823296 3 43050601587829 E 0 05283145516674 3 24782572795594 0 07262846847629 3 11466205018312 E 0 08924108535388 3 01866667408595 0 10332739762016 2 9498541052288 F 0 11538698638017 2 90058763353597 k 0 125799205060234 2 86522846707924 E 0 13485357243985 2 83970362627895 J 0 14277404923292 2 82111863051969 p 0 14973594955687 2 8074353302362 E 0 15587871 371520 2 79722301595659 x 0 16131517263848 2 78948227698184 i 0 16613797097289 2 78351665669409 y 0 170424225512814 2 77883720836726 0 17423908183990 2 77510139380315 Z 0 17763812227807 2 77206713166713 5 0 18066925200707 2 76956319479513 0 18337405045329 2 76746686466601 0 18578885975117 2 76568949831245 0 18794559931461 2 76416649805316 5 0 18987241537060 2 76284966554712 0 19159422198940 2 76170336907878 a 0 193133086922994 2 76070024860195 i 0 19450862997939 2 75981854737113 0 19573842579083 2 75904200984094 z 0 196837729993151 2 7583566299666 z 0 19782063528465 2 75775151486736 0 19869952077374 2 75721742849714 0 19948542124831 2 75674661315732 0 20018826563686
10. ecification of the set of reactions involved in the system KInfer requires the experimental time series data of the concentrations or number of molecules of the species present in the system In order to load those data the user has to select the Load concentrations item from the File menu This will show a classical open dialog box that can be used to open the directory containing the concentrations file This file has to be a Comma Separated Values CSV file whose first row contains the names of the columns with the following convention the first name is the keyword time followed by a comma and the list of all the reactants sp spy contained in the model always separated by a comma The following rows of the file contains the concentrations expressed by real numbers as follows to spilt spa PER spn ti spi gt spa REE spy tm lspi lsP2l gt gt spnl where t is the i th time instant value and sp is the concentration value of the j th species at time k Opening the Goutsias transcription concentration file and choosing the Time series concentration tab the main window should look like in Fig 4 The user can see the list of all the concentrations point for all the species of the system Fig 4 In this version of KInfer there is not the possibility of modifying those data from the interface but in the next version we plan to add this feature in order to improv
11. entification from metabolic profiles Bioinformatics 2004 20 1670 1681 Goutsias J A hidden Markov model for transcriptional regulation in single cells IEEE ACM Trans Comput Biol Bioinform 2006 3 57 71 Reinker S Altman R M amp Timmer J Parameter estimation in stochas tic biochemical reactions 2006 153 4 Lecca P Sanguinetti G Palmisano A amp Priami C A new method for inferring rate coefficients from experimental time consecutive measurement of reactant concentrations International Conference on Systems Biology 2007 ICSB 2007 Long Beach California USA 16
12. ered References 1 Lecca P Palmisano A Priami C Sanguinetti G Calibration of biochemical network models 2008 Technical Report TR 13 2008 http www cosbi eu Rpty_Tech php 2 http math nist gov javanumerics jama 3 Goldberg D E Genetic algorithms in search optimization and machine learning 1989 Addison Wesley Massachusetts 4 Mitchell M An introduction to genetic algorithms 1996 The MIT Press Cambridge MA 5 Golding I Paulsson J Zawilski S M Real time kinetics of gene activity in individual bacteria 2005 Cell 123 1025 1036 6 Sugimoto M Kikuchi S Tomita M Reverse engineering of biochemical equations from time course data by means of genetic programming 2005 BioSystems 80 155 164 x Fuguitt R Hawkings J E Rate of thermal isomeration of a pinene in the liquid phase 1947 JACS 69 461 15 8 Rodrigez Fernandez M Mendes P Banga J A hybrid approach for ef 10 11 12 13 14 ficient and robust parameter estimation in biochemical pathways 2006 BioSystems 83 248 265 Gillespie D T Exact stochastic simulation of coupled chemical reactions J of Physical Chemistry 1977 81 25 I Chun Chou Harald Martens Eberhard O Voit Parameter estimation in biochemical systems models with alternating regression Theoretical Biology and Medical Modelling 2006 3 25 2006 E O Voit and J S Almeida Decoupling dynamical system for pathways id
13. gt k4 f M k1 1 0 mPNA 1 0 k2 1 0 M 1 0 mRNA k3 1 0 DNAD 1 0 k4 1 0 mPNA DNA D gt DNAD DNAD gt DNA D M N gt D k7 D gt EN H 2 kB Save reactions transcriptionRegulation Figure 3 The main window with the Goutsias transcription model The user can save the list of the reactions selecting the Save reactions item from the File menu and selecting a path and a name for the new model file or he she can just click the button in the lower left corner this will save the model in the current directory with the name showed on the button label As alternative to the automatic generated model the user is allowed to insert a different model that can be entered in the Manual Model part of the interface Fig 2 The user is allowed to enter an ordinary differential equation model without specifying the reaction in the standard chemical notation and the syntax of the ODE model has to be the same as the one of the automatic generated model some examples are presented in section 3 In particular the manual model has to be used for specifying general mass action laws with real numbers as partial order of reaction in this version of the software it is not possible to specify a general kinetic law but we plan to add this feature in the future 2 2 Time series concentration tab Along with the sp
14. he system The dataset contains of 120 data point at the time resolution of 0 5 min As initial values we used M 2 D 4 DNA 2 and mRNA 0 DNA D 0 mRNA mRNA M e k2 kl 0 043s 1 M ee 0 0 sa k2 0 0007s DNAD mRNA DNAD 3 007155 1 mRNA k4 0 00395571 _ 1 DNA 4 D DNAD k5 0 02s Pe k6 0 4791s7 1 DNAD 3 DNA D k7 0 083s 1 M M D k8 0 5571 D M M Figure 9 Goutsias transcription regulation 12 13 Our estimates for the Goutsias transcription regulation model are summa rized in table 2 Parameter Actual value Initial guesses Estimated value k1 0 043 0 0432 0 0432 0 043290182 0 k2 0 0007 0 015116 0 015117 0 015117 0 000000805 k3 0 0715 0 06931 0 06938 0 069322284 0 000069368 k4 0 00395 1 87 3 03 E 04 0 00021555029 0 0001163084 k5 0 02 0 01567 0 01568 0 015681929 0 000008789 k6 0 4791 0 3862 0 3864 0 38645026 0 00019106 k7 0 083 0 0838782 0 0838783 0 083878241 0 000000073 k8 0 5 0 515376 0 515379 0 51537892 0 00000304 o 0 07 0 5 0 4 Table 2 Estimated parameter values for the network in Fig 9 3 3 Didactic example of biochemical network The system depicted in Fig 10 is representative of a small biochemical net work of 4 interacting species 11 10 The network has two feedback loops 1 the species X3 inhibits the production of species X1 and 2 the species X4 promotes a changing of X3 in
15. iable to be mutated After the selection the value of the variable is changed selecting again randomly from the possible values it can take excluding the current one Finally the innovation operator randomly select new solutions never tested to be performed Usually this operator is kept at low rate here at 5 trying to optimize the trade off between exploration and exploitation Once the new population of experiments is derived from the algorithm it is then proposed as a new generation for the next algorithm iteration The size of each population of solutions in each generation is maintained constant Using the Maximization options tab Fig 5 the user can modify the main parameters of the method in particular e Multiplier a real number that is used to perform subsequent run of the algorithm on the same objective function but with different initial value ranges for the unknown parameter E g using a 0 5 multiplier means that for each complete run of the GA the range of the initial parameters search space is halved e NTrial is the number of different runs of the algorithm that are performed applying each time the Multiplier parameter to the parameters bounds e NGenerations is the maximum number of generations of the GA e Nezp is the size of each population in a single step of the GA e Crossed is the number of population elements that are target of the cross over operation between two steps of the GA This number has
16. nsider the transcription of a single gene as given by the model of Golding et al in 5 The DNA for the tagged mRNA is switched on and off by polymerase binding and unbinding respectively Only polymerase bound DNA is transcribed into mRNA Fig 8 DNA_OFF 43 DNA_ON k1 0 0270 min DNA_ON DNA_OFF k2 0 1667 min E GED DNAN DNAON mRNA SF 040 min Figure 8 Gene transcription example The system is defined by the following list of reactions and our estimates are summarized in table 1 DNA_OFF gt DNA_ON k1 DNA_ON gt DNA_OFF k2 DNA_ON gt DNA_ON mRNA K3 Parameter Actual value Initial guesses Estimated value k1 0 0270 0 0226 0 0302 0 0248 0 0076 k2 0 1667 0 155 0 163 0 163 0 008 k3 0 40 0 374 0 425 0 376 0 052 o 0 03 0 45 0 44 Table 1 Estimated parameter values for the network in Fig 8 11 3 2 Transcription Regulation Let s consider the transcription regulation example explained in 12 13 The mRNA is translated into a protein monomer M that can dimerise The dimer D in turn can bind to its DNA and acts as a transcription factor to auto regulate its own mRNA production Both mRNA and protein are degraded at constant rates The set of reactions of this network is reported in Fig 9 As in 13 we used this set of reactions to generateto generate a synthetic dataset of the time series of the number of molecules for each component in t
17. sed on their fitness The chosen individuals are modified recombined and possibly randomly mutated to form a new population The new population will be used in the next iteration of the algorithm The algorithm terminates when either a maximum number of generations has been produced or a satisfactory fitness level has been reached for the population The selection operation involves the evaluation of each possible solution with respect to the target assigned the higher the log likelihood value is the better the solution is considered The next step is to select the solutions for the next generation in such a way that those with higher fitness have higher probability of selection to each guessed solution will be assigned a selection probability derived by the ratio of its square fitness and the sum of the squared fitness of all the solutions The selected solutions are then subjected to cross over mutation and innovation operators To realize cross over every two parents create two children in the following way the algorithm selects randomly from the first parent how many and which variables will have to be kept in the first child Then from the second parent the algorithm takes the complementary number of variables and uses these values to complete the first child The second child is then built with the remaining variables of the two parents The mutation operation with a low probability in our examples p 0 1 randomly selects one var
18. ting function running through a set of points in the xy plane and we solve the following system of equation si ti f XO 0 1 0 2 0n 0 siltu fi X 011 052 Bin 0 where the functions are the rate equations generated automatically by the set of reaction or the manual ones In general the system could be singular In those cases we considered a different time re sampling of the Stineman curve interpolating the experimental data The new set of time points are those in which the values of the inter polating curve has null slope In this way the curve is under sampled and the system has a less number of equations Nevertheless it may be not enough to avoid having singular systems The number of equations is cut down further on until the system can be solved to find the approximated values of unknown parameters to be used as initial guesses for the algorithm of optimization In an analogous way we solve a system of equations for the propagation of the concentrations error on the estimated parameters in order to find a range of variability for the single values found by the procedure described above This whole process can be avoided by the user that already knows a possible range of variability for each parameter in this case the user can simply load those data from a textual file written as follows k1 gt lower 1 0E 8 upper 1 0 k2 gt lower 1 8 upper 10 0 k3 gt lower 0 27 upper 1 0 sigma gt lower 1 0E 8
19. tions in this manual or if there are sections of this manual that are incorrect or unclear the authors would like to hear about it so that we can make improvements and fix bugs in the software Furthermore the author would appreciate feedback regarding new features or improvements that would be useful to users of this software Before e mailing the author it is a good idea to check the KInfer application home page to see if a new version has been released in which the specific problem may have been fixed The best way to contact the author is to send e mail to kinferATcosbi eu When reporting a bug or something that is suspected to be a bug please provide as much information as possible about specific installation of the KIn fer program In particular please provide us with the version number of the KInfer program that is used the type and version number of the Java Runtime Environment used e g Sun JRE version 1 4 1 and the operating system type and version Furthermore if the problem is with a specific model definition file please send us the model definition file and any model definition files input data that it includes If the problem generated a stack backtrace on the con sole or in a dialog box please include the full stack backtrace text in the bug report Providing this information will dramatically increase the likelihood that the author will be able to quickly and successfully resolve the problem that has been encount
20. to be lower than Nexp and it has to be even Li KInfer version v L O Bet File Edit Infer parameters 2 The model Time series Concentrations algorithm_package max GeneticAlgorithm 1 0 100 30 22 Figure 5 The maximization options tab 2 4 Initial values tab In order to limit the search space of the algorithm we developed a procedure that is able to select the parameter space in which the model is valid 7 7 o File Edit Infer parameters The model i Time series Concentrations Maximization options Initial values Results Approximate initial values ce 1 Kinetic parameters Lower bound 0 043290182029699815 Upper bound 0 04329018227409558 E i al Figure 6 The Initial Values tab Using the Initial values tab Fig 6 the user can take advantage of this procedure by simply clicking the Calculate button in the upper part of the window This will calculate the Stineman value for each of the parameter indicated in the model using the concentrations loaded before considering that each value has a measurement error associated For a detailed description of the procedure we refer the reader to 1 but summarizing the method we obtained the slopes in each time point from the experimental data by using the Stineman algorithm this algorithm provides a procedure of interpolation and returns the slope of the interpola
21. upper 1 0 The user can also dynamically modify each range value using the graphical interface Fig 6 it is enough to select the parameter that has to be changed in the central list and then the user can type the new range in the two textfields on the right part of the window To save those values for the current run of the inference algorithm the user has to click the Update ranges initial parameters button To save them for future runs the user can click the Save initial values button on the left and then select a file in which the values has to be stored 2 5 Results tab As soon as the model the concentration data the maximization algorithm and the initial values are loaded and correctly set up the inference algorithm can be called just by selecting the option Infer from the Infer parameters menu After the calculations that can take some time depending on the number of parameters that has to be inferred and the choices made for the maximization algorithm the results are listed in the Results tab Fig 7 File Edit Infer parameters The model Time series Concentrations Maximization options Initial values Results of significative digits For the results Graphical properties of the results F Font size 012345678 910 small medium vvvv vvvv parameter run 1 k1 0 04329018 0 k2 0 01511712 0 0000008 k3 0 0693314 0 00006937 k4 0 0002689395 0 000116308
22. ystem need to have an associated rate function if the species is not changing over time the rate function will be fS 0 each rate function has to start with the f string followed by the name of the species without spaces and taking care of lower upper case and the symbol the rate function is a summation of terms i e each term is separated by the following through a symbol a term in the rate function has to be like k6 1 x3 0 5 x4 0 2 So it has to contain the name of the parameter k6 followed by a mul tiplicative symbol and the coefficient 1 between round brackets followed by a list of product terms i e terms separated by the symbol and each of those terms is a species name x3 between square brackets elevatated to the power of a real number 0 5 between round brackets 13 In the next version of KInfer we plan to add a more easy way for specifying the kinetic laws from the graphical interface Within the experimental uncertainties the results showed in 3 are in agree ment with the expected ones and with those in 10 Parameter Actual value Initial guesses Estimated value 01 12 11 678 11 760 11 702 0 083 02 10 9 751 9 786 9 765 0 034 03 8 6 542 6 783 6 617 0 241 04 3 2 445 2 551 2 519 0 106 05 3 3 006 3 011 3 011 0 005 06 5 4 999 5 022 5 005 0 024 07 2 1 826 1 950 1 903 0 124 03 6 5 542 5 790 5

User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents