Home
User Manual
Contents
1. oe amp o amp 3 8 la 1lat 2b4 3b 24 3 1 la lat 2b 3b 24 3 1 1b 1b 2a 3a 24 3 1 and 1b 1b4 2a 3a 2 3 1 First carry out the la 2b 2 and 1a 3b 3 operations then the 2 3 1 one 3 9 1a 1a 2b 3a 2b 14 1a 1a 2b 3a 2b 14 and 1b 1b 2a 3b 2a 1p First carry out the la la 3a operation see below then 2b 3a 1 one 3 10 la lat 3b 1at 3 3a 1b 1b 3a 1b 3 3b and 1b 1b 3a 1b 3 3b First carry out the la 3b 3 operation then the 1a 3 3a one 3 11 la la 3a and 1b 1b 3b Connect the extreme special nodes of two la paths a b b a a b a o 3 12 lat 2b 2 lat 2b 2 and 1b 2a 2 Cut an external edge in the la path and join the special with the extreme special node of the 2b path b a a b a DB a b a e o gt ore 3 13 la 3b 3 1b 3a 3 and 1b 3a 3 Cut an external edge in the 3b path and join the special node with the extreme special node of the 1a path 3 14 2a42b64 34 3 24 3 1 and 2a 2b 34 3 2 3 1 First carry out the 2a 2b 3 2 operation see below then the 2 3 1 one 3 15 3a4 3b4 24 2 34 2 1 and 3a 3b 24 2 34 2 1 First carry out the 3a 3b 2 3 operation see below then the 2 3 1 one 3 16 2a 34 3 1at 3 3a 2b 3 3 1b4 3 3b and 2b 3 3 1b 3 3b First carry out the 2a 3 la operation see below then the 1a 3 3a one 3 17 3b4 24 2 1b4 2 2b 3a4 242 1a 2 2a and 3a 2 2 1a 2 2a First carry out the 3b 2 1b operation
2. 1 2 Sa 2 1 78b 4 1 6a L SP 2 1 78b 4 1 569a 2 1 C 1 2 Cut of the node 1 2 and union of the special nodes 5a and 69a into one node 569a See the result below 2 1 2 L 569a 78b 3 4 s 4 e LJ J 2 1 78b 4 1 569a 2 1 C aD bD 2 1_a4 1_b2 1 C Deletion of the special nodes 569a n 78b See the result below Now graph is reduced to the final form algorithm is over Examples of task 1 solutions The archive chromo zip contains several examples of task 1 solutions Example the pair of structures from Fig 1 and the pair of structures from chromosome Salmonella enterica to chromosome Escherichia coli Input data can be found in artificial and Salmonella_Escherichia the joint graphs can be found in artificial and Salmonella Escherichia the shortest sequences are described in artificial and Salmonella_Escherichia To run the utility on these examples it is necessary to run the following scripts from the archive run_example_l bat u run example 2 bat 21 Also the program was tested using the following 8 sets of artificial data artificiall artificial2 artificial3 artificial4 artificial5 artificial6 artificial7 artificialS The joint graphs are described in artificiall artificial2 artificial3 artificial4 artificial artificial6 artificial7 artificialS The shortest sequences for three variants of weights are described in artificial artificial2 artificial3 artificial4 artificial5 artificial6 ar
3. The content of each chromosome should start from the new line Chromosomes are described as the ce 99 sequence of the names of the genes together with signs or The sign indicates the position of the gene on the complement minus strand Each chromosome has the mark C if chromosome is cyclic and L otherwise Example for artificial data File input tre contains one line 1 2 3 4 5 6 File input chrom contains initial chromosome structures for six leaves g2 g3 g4 g5 g6 C g7 C g8 29 C ok 3k 3K 3K ak 3k 3k 3k 3k 3k ak ak k k kk 1 3 4 95 26 C 27 C g8 g9 C SK 3K 3K 3k 3 3k 3k 3k 3k ak ak ak ak k k k g1 g2 g4 g5 g6 C g8 C g7 29 C ok 3K 3K 3k 3 ak 3k 3k 3k 3k ak 3k sk k k k g1 g2 g3 g5 g6 C g8 C g7 g9 C ok 3 3K 3K 3K 3k 3k 3k 3k ok 3k ak k k k k g1 g2 g3 g4 g6 C 29 C g7 g8 C ok 3k 3K 3k 3k 3k 3k 3k 3k ak 3k ak ak k k k g1 g2 g3 g4 g5 C 29 C g g8 C HHH 18 Here each structure contains three cycles formed by five one and two genes each The structures differ by the content of the genes and by the order of the genes in chromosomes Second forth and sixth leaf contain genes from complement strand ChromoGGL parameters Chromo parameters The utility requires several parameters for execution the meaning of these parameters is revealed below Also the file with string weights of single operations is req
4. see below then the 1b 2 2b one 3 18 2a 2b 3 2a 1b 2 and 2a 2b 3 2a 1b 2 First carry out the 2b 3 1b operation then the 1b 2a 2 one 3 19 3a 3b 2 3a 1b 3 and 3a 3b 2 3a 1b 3 First carry out the 3b 2 1b operation then the 1b 3a 3 one Step 4 If the weight of the double cut and paste operation is greater than the weight of the sesqui cut and paste the actions 4 1 4 24 are repeated as long as they are applicable otherwise the steps 4 1 4 24 are repeated analogously 4 1 Loop circle or path K with a b node circle or path of the same type as K correspondingly Join the loop node with the b node by double cut and paste if K is not an isolated special node or by sesqui cut and paste otherwise b b ex lt _e O gt e eo 4 1 Same as 4 1 4 2 Circle circle or path K with b and a nodes circle or path of the same type as K Insert the circle by double cut and paste combining two b nodes near the b node from K on the side of the a node cut out the resulting common edge a a b a b a 6 Q a gt b a b a o _ gt lt _e__ o o __ _e_ 4 2 Same as 4 2 4 3 2a 2b 2 1 Cut out two 2b path nodes the extreme special a node and the neighboring common node by sesqui cut and paste and join the resulting terminus with the extreme special b node of the 2a path b a b a b a b a b a a 4 3 2a 2b 2 1 4 4 3a 3b 3 Cut an
5. 1 1b 4 18 2a 1 2a 2b 14 2b and 2a 1 2a set c b Cut an external edge in the 14 path and join the special node with the extreme special node of the 2a path b a b b a b e_ e_9 o_o _e_ e _ b a b a b 4 18 2a 1 2a 2a 1 2a 2a 1 2a set c b 4 19 3a 14 3a 3b 1 3b and 3b 1 3b set c b Cut an external edge in the 3a path and join the special node with the extreme special node of the 14 path a b a a b a o __ _e_ _e xo __e_ _e_ eo a b a b a 4 19 3b 1 3b 3b4 1 3b 3b 1 3b set c b 4 20 2 14 2 2 1p 2 and 2 1 2 set c b Cut an external edge in the 14 path and join the special node with the extreme special node of the 2 path b a b a a b a _e_ _e_ e909 Tame _2e_ _e _ b a b a b a 4 20 2 14 2 2 1p 2 2 1 2 2 1 2 set c b 4 21 3 14 3 3 1 3 and 3 1 3 set c b Cut an external edge in the 3 path and join the special node with the extreme special node of the 14 path 14 4 21 3 1 3 4 22 latlc la 1b 1 1b 1a 1 1 a 2b 1 2b and 3a 1 3a In all cases set c a 4 22 Empty action 4 23 For the remaining paths of type 1c set c b and perform the 1 1 1 operation 4 23 Empy action 4 24 Paths that have hanging edge are being enclosed into circles by join operation for paths 2a 2b 3a 3b by sesqui cut and paste operations with joining of special nodes paths 14 1a Ic 2 or without joining paths la 1b 3 When enclosin
6. external edge in the 3a path and join the special node with the extreme common node of the 3b path 10 4 4 3a 3b 3 4 5 2a 3 1a and 2b 3 1b Cut an external b edge in the 3 path and join the special node with the extreme special node of the 2a path b a b b a b a b a e e oe o e g 6 E gt e oo G amp ee HH D 4 5 2a 3 la 4 6 3a 2 1a and 3b 2 1b Cut an external edge in the 3a path and join the special node with the extreme special node of the 2 path a b a a b a b 4 6 3a 2 1a 3b 2 1Db 4 7 2a 2a 2a and 2b 2b 2b Join the extreme special nodes of the two paths b a b b a b b a b a b e e o o o oe 4 7 2a 2a 2a 4 8 3at 3a 3a and 3b 3b 3b Connect two extreme common nodes of the paths by a common edge and then cut out this edge a b a a b a o _ __x__ _e _ eo V7 a b a b a b a Vv a 4 8 3b 3b 3b 4 9 1la 2a la and 1b 2b 1b Connect the extreme special nodes of the two paths 11 a b b a b a b a b 0 o 0n o c n 0 o 4 9 la 2a la 4 10 1la 3a la and 1b 3b 1b Connect two extreme common nodes of the paths by a common edge and then cut out this edge b a a b a OG s amp e U b a b a b a J a 4 10 1b 3b 1b 4 11 2a 2 2 and 2b 2 2 Connect the extreme special nodes of the two paths b a b b a b a __e_ e Gg s Q amp s amp s 8 WV b a b a b a 4 11 2a
7. genes if we have the cyclic region we call it the special loop We now have the undirected graph We apply the following analogs of defined operations to the joint graph 1 Delete two equally marked edges and reconnect free resulting vertexes by two new not incident edges 2 remove the edge with some mark say a and connect one of resulting vertexes with another vertex which is not incident to a edge 3 Delete any edge 4 Connect two vertexes not incident to a edge by the b edge and vise versa 5 Remove special vertex of 1 Double cut and paste DP Initial edges belong to one structure gt ae T 2 Sesqui cut and paste SP ik ik ik o xo Ji i P k Ik I k 6 or I J ji 3 a edge deletion or b edge insertion C and b edge deletion or a edge insertion J it ity i i X e special loop If special vertex has two regular neighboring vertexes they are being connected by the edge The final form of the joint graph a b is defined as the common graph that contains only isolated regular vertexes and final 2 cycles It is easy to prove that the initial problem is equal to the problem of transforming the graph a b to the final form The algorithm for transforming a joint graph into the final form in case of different operation weights and extra operations of deletion and insertion Step 1 Delete all special a loops Step 2 Cut out a common edge not included in a 2 circle and cl
8. 2 2 2a 2 2 2b 2 2 4 12 3a 3 3 3b 3 3 Connect two extreme common nodes of the paths by a common edge and then cut out this edge a b a a b o _ __x _ _e_ _ e o _ __e_ eo ib a b a b a b y a 4 12 3b 3 3 12 4 13 2 2 2 1 Perform the sesqui cut and paste operation with cutting out the two nodes of the 2 path the extreme special a node and the neighbor common node and joining the resulting terminus with the extreme special b node of the other 2 path b a b a b a b a _e _ _e _ xo 0 0 _ _ b a b a b a a Q e o e Q o e 8 4 13 2 42 2 1 4 14 3 3 3 Cut an external a edge in the 3 path and join the resulting terminus with the bterminus of the other 3 path b a b a oe e e e oe b a b a 4 14 Empty action 4 15 la la la 15 14 1p and 14 1 14 set c b Cut an external edge in the 14 path and join the special node with the extreme special node of the other 14 path 4 15 1 1 15 1 1 14 set c b 4 16 1a 1 1 a 1b 1 1b and la 1 la set c b Cut an external edge in the 14 path and join the special node with the extreme special node of the la path a b b a b a b a b amp 4 16 la 1 1a 4 17 1a 14 1 a 1b 1 1b and 1b 1 1b set c b Cut an external edge in the la path and join the special node with the extreme special node of the 14 path 13 4 17 1b
9. ChromoGGL user manual authors K Yu Gorbunov V A Lyubetsky developers R A Gershgorin K Yu Gorbunov 2015 Contents OVEN W ies cae ea cca area Sa Mon ES Dele ane ESS aa aii cats Feder sen benet st tue te Kon een a el Ce al al cen ets hon Sati eey Beer SE raden 2 The algorithm for computing the distance between two chromosome StrUCTUFES cccccceeeseeeeesseeneeeeees 3 The algorithm for transforming a joint graph into the final form in case of different operation weights and extra operations Of deletion ANC insertion ccsccceeecccesececeneceeeneeseeeeseeuecseeueceeeeeeeees 5 Algorithm of generation of the evolutionary tree of chromosome StFUCTULES cccccceeeeeeeeeeeeeeees 14 Algorithm gradient descent from multiple random initial points for task 3 cccccceeseceeeeeeeeeees 15 Installation and execution of ChHromoGGL utilities ON Windows cccccssssssseeeeecccceaeesseeecccsessaaaaeeeeeeees 15 CNFOMO SPEER NENS SEE a E geile ERE RENEE SEEDEDE NE FE ERNE E EET ESTERE ENE REE ENTER RES 15 Krom MECOMSEMUCTION sir as Eee eN REN ERE 15 APULU ere er nen en eNO Pr OOo ERR SENERE ADRESSE ee 15 CI OO MPU COI aeea er ne aa EE E NNS 15 Hrom reconstructiominputdatd amin crests ener er era a a nere 16 CHEOMOGGL Parameters re a LEA SN kiss kanen nar EN needs 18 Chromo bafam ele rs orian a a a aa a a 18 Command line Parameters sie el are bede AN a E AAA 18 FIG wthoperations COSUS icenen aa a E N eecc
10. ebele 18 hrom reconstruction parameter Saser a a a hat coy ben aval te 18 Output Fe ore oE E N a ee RE SEERE EET ER STER 18 Examples OF task 1 SOIMIEIONS aoruc E T 20 Exam pels Ort KZ SOTOS ore ae rr rl arrene RE es new ene 21 Mir OMMFECONSTEUCEIOMOUT DUE i ar rare a aen 21 Biological examples for task 3 agnes ae Reel aie 22 Recommended standart DFOSKAINS se sssiec tates Gratin tein sacs eas shes eden aca des beso eatees a eee 22 Reor el lt a Meron SUR OTOP RE a RC SOROS RENE OG OC Ce IC OOS RTE OR Go en ee Ee Cet OR One eee VEN 22 Overview ChromoGGL utilities are designed for solving the following three tasks 1 Computation of the distance between the two given chromosome structures and finding a sequence of operations with the minimum total weight distance that transforms one structure into another The most common definition according to available papers is considered as an arbitrary set of paths and cycles representing the linear and circular chromosomes as well as the definition of the set of allowed operations The structures include gene paralogs the sequence of operations permit alternate gene composition Arbitrary weights of single operations are allowed The task is to minimize the total weight of all operations from arbitrary sequence The sequence for which the minimum is reached 1s called the shortest The mean of the total weight of the sequence from one structure to another and vice versa is called the distance betwee
11. er with arrangement It turns out that the solution of this problem when using breakpoint distance between the pairs of structures at the ends of the edge is a good initial approximation for the solution of the task 3 with the distance described in section 1 We will refer to this distance as the biological distance From now on we will refer to the problem 3 with breakpoint distance as the task 3a The task 3a can be restated as the integer linear programming task so its complexity is close to linear The package of programs for Lomonosov supercomputer in Moscow State University contains the appropriate program but it is not available freely After solving the task 3a the task 3 was solved An effective algorithm of descend from different initial breakpoint positions has been applied see 1 Both tasks can be solved by means of this algorithm of descent from different random initial points Next section contains the description of the task This algorithm is the key element of the solution of task 2 The solution of task 3 1s described after the task 2 The algorithm for computing the distance between two chromosome structures The model of chromosome structure is described as the finite number of the oriented chains and cycles including loops This set can be thought as the oriented graph We call this graph as chromosome structure The edge of the graph is defined as the gene each single chain or cycle defined as chromosome or com
12. g path 1 we set c b When enclosing path 2 we choose variant when two b nodes are joined after that we delete a node from 1 see Fig a below We also cut out the common edge from circles that were produced by 3a or 3b closure then we apply step 4 2 to these circles again 4 24 Same as 4 24 Step 5 Remove isolated special nodes and loops Cut out 2 circles using double cut and paste from circles without common edges to combine two b nodes accordingly the a node is included into the 2 circle fig b Delete special nodes from the 2 circles The end of algorithm description Algorithm of generation of the evolutionary tree of chromosome structures The generation of the optimal evolutionary tree for matrix of pairwise distances between the given chromosome structures uses UPGMA algorithm http en wikipedia org wiki UPGMA 15 Algorithm gradient descent from multiple random initial points for task 3 On each step of the algorithm we iterate over all inner vertexes of the tree and try to transform the structures by means of all possible operations We then choose the pair vertex operation which delivers minimum to the total cost of structures arrangement and by replacing the structure in the optimal vertex with the new one we get next structures arrangement on the tree We repeat this step till we can t decrease the total cost of structures arrangement Note that this algorithm is fast and effective when the initia
13. h the composition of 9 genes g1 g2 g3 g4 g5 g6 C 9 98 97 C 22 node 2 with the composition of 9 genes g1 g2 g3 g4 g5 g6 C 89 g8 C 87 C node 3 with the composition of 9 genes g1 g2 g3 g4 g5 g6 C g9 g7 C g8 C node 4 with the composition of 9 genes g1 g2 g3 g4 g5 g6 C 29 C g8 g7 C Here the root is vertex 1 vertex 2 is ancestral for the leaves 1 and 2 vertex 2 is ancestral for the leaves 3 and 4 4 is ancestral for the leaves 5 and 6 Biological examples for task 3 The archive hrom_reconstruction zip contains two biological examples First example contains tree tree structures description chromosomes and optimal arrangement arrangement The second example contains tree tree structures description chromosomes chromosomes and optimal arrangement arrangement Recommended standart programs To work with output data it is recommended to used the following standart programs Microsoft Excel to work with matrixes They are printed using tab separated values format tsv The dot is used as the separator in numbers It is necessary to set the correct separator for numbers it can be done through the menu File gt Options gt Additional gt Use system separators TreeViewX to visualize trees in the Newick format Any text editor to work with the joint graphs and the shortest secuences Reference 1 K Yu Gorbunov R A Gershgorin V A Lyubetsky Rearrangement and Inference of C
14. hromosome Structures Molecular Biology 2015 Vol 49 No 3 pp 327 338
15. ie culture the number of chromosome included to the structure from 16 this specie that is the number of paths and cycles in the chromosome label L if chromosome is linear and C if cyclic the number of genes in the chromosome sequence of genes in the chromosome it consists of gene names If gene is located on the minus strand is written in front of its name it is possible to mark such genes with as well For example the file for thwo structures from Fig 1 is presented below otructure a Structure 6 o e Fig 1 File with structures shown on Fig 1 2 Structure a 3 L3 1454 2 3 34 64 4 L1 9 Structure 6 4 L2 74 2 L2 844 L1 1 L1 3 Joint graph of two given on the Fig 1 structures where the special nodes are represented as circles of bigger size other nodes are circles of smaller size I Sa 2 7b 2 l Ge OB e e 9a 3 6a 4 Sb 4 S e e Fig 2 hrom_reconstruction input data hrom_reconstruction utility requires two input files input tre and input chrom input tre contains one single line the tree in Newick format The leaves of the tree are being enumerated left to 17 right by numbers 1 2 3 input chromo contains chromosome structures assigned to the leaves of the tree The structures in this file should be listed in accordance with their number in the tree The descriptions of the structures are separated by the line containing multiple symbols
16. l arrangement is chosen correctly The problem of initialization was briefly described as task 3a Even approximate solution of task 3a let us to limit the set of possible initial points Installation and execution of ChromoGGL utilities on Windows Chromo Chromo command line utility is implemented using C 32 bit version of executable module is available for download Utility does not require installation Several steps are required to start use utility download archive chromo zip and unpack it to any directory for example f Chromo run console Windows Start gt Run gt cmd and enter the directory with archive content f cd f Chromo run command chromo h In case of successful execution short instruction should appear on the screen Otherwise the information about the error will be printed It is recommended to test utility with run example 1 bat on the short example hrom_reconstruction hrom_reconstruction utility is implemented as 32 bit executable module It does not require installation Several steps are required to start use utility download archive hrom_reconstruction zip and unpack it to any directory for example f hrom_reconstruction run console Windows Start gt Run gt cmd and enter the directory with archive content f cd f hrom_reconstruction Input data Chromo input data The input data consists of one file string with the following data the number of chromosome structures name of spec
17. n them Despite such a common task statement our original algorithm has linear time and memory complexity it has proved to be quick when performing computations on supercomputer 2 Computation of the matrix of pairwise distances for the given set of structures and generation of the tree that matches the matrix best These chromosome structures correspond to the leaves of the obtained tree The algorithm with linear complexity based on the algorithm for the first task is used 3 The reconstruction of the structures on the evolutionary tree defined by the structures at its leaves usually from the section 2 not necessarily binary on the ansestral nodes of the tree all structures can include paralogs The task contains the paralogs numeration which allows to build the correspondence between paralogs at the end of edge and them at the beginning of the edge and also to see which paralogs have been lost and which appeared at each node of the tree In other words the arrangement of chromosome structures with paralogs numeration for the tree with the only structures in leaves defined without numeration 1s being searched The task is to minimize the total price of arrangement as the sum of the distances between the pairs of chromosomes on the ends of any edge The difficulty of the task is that the distances are computed for the structures with fixed numerations that is numeration of all paralogs should be searched togeth
18. ng the special genes we join the ends of common genes which are neighbors of the removed region Otherwise if we insert the region of special genes inside of chromosome we firstly cut two ends We do not consider the cuts of special genes regions because such operation do not improve the solution of the task So we have six operations and the positive rational number is defined for each operation We define this numbers as costs of the operations Including the costs of operations into the model is important difference of this model from other models The task is to find the shortest sequence of this operations which transforms the structure a into the structure b By shortest sequence we mean the sequence with the minimal total cost The definition of the joint graph and its final form The vertexes of the joint graph a b are the ends of common genes and all maximal regions of the special genes Each end of gene is included only once We write the name of the gene with the index 1 or 2 which indicates the the beginning of the gene or its end The vertexes of the first type are defined as the regular vertexes the vertexes of the second type special vertexes We define the border edge with the special end as the hanging edge Edges are marked as a or b according to the structure in which this vertexes are connected Two vertexes can be connected by up to two edges The graph may contain isolated vertexes the regions of the special
19. ns preceding this step This trick ensures that the algorithm has no more than one branching which emerges when the type 1 is assigned a b b a a b a a e o e amp e o Q d e b a b b a D b a b i 3 2 2a 3b 12 2b 3a 1a 2b 3a la 2b 3a 1a and 2b 3a 1 Hereafter the algorithm execution is described for the first case only other cases are similar cut an external edge in the 3b path and join the special node with the extreme special node of the 2a path 3 3 2 3 1 Cut an external edge in the 3 path and join the special node with the extreme special node of the 2 path This results in a path of type 14 or Ip 1 e of type 1 depending on which of two external edges was cut a b a b a a b vy b b a b a a b a b a b a 3 4 1b 2at 3 24 3 1 1a 2b 3 2 3 1 and la 2b 3 2 3 1 First carry out the 1b 2a 2 operation see below and then the 2 3 1 one 3 5 la 3b 2 34 2 1 1b 3a 2 34 2 1 and 1b 3a 2 3 2 1 First carry out the la 3b 3 operation see below and then the 2 3 1 one 3 6 la 2 2a and 1b 2 2b Cut an external edge in the la path and join the special node with the extreme special node of the 2 path b a a b a b e e _e_ o Q e o it b a b a b _x_ __e_ _e_ _e __ 3 7 lat 3 3a and 1b 3 3b Cut an external b edge in the 3 path and join the special node with the extreme special node in the la path a b b a a b a o __ e_ ox _ o o E gt e
20. ose it into the final 2circle using a double internal edge or a sesqui cut and paste extreme edge or a join single edge operation Repeat the operation if possible If the double cut and paste weight does not exceed that of sesqui cut and paste all double operations are performed first otherwise all sesqui operations go first a a a b a aa a a E 66 0 o gt gt gt 6 e o a a b a li f a hk o _____ e _ gt e P ee c 2 gt gt Step 3 Let us start with definitions An odd even path is a path of an odd even length An a path is an odd path with extreme non hanging edges labeled as a b Path is defined in a similar way Types are assined to paths or circles remaining after steps 1 2 except the final 2circles and isolated common nodes 2 circles containing an a node but no b nodes are considered as a circles opposite structures as b circles A circle containing both a and b nodes belongs to the circle type Special b loops belong to the loop type a Paths are assigned to the following types Ja if the path has a single hanging edge 2a if it has two hanging edges 2a it is an isolated special b node 3a if it has no hanging edges but has both a and b nodes then its length is greater than 1 and 3a if there are neither hanging edges nor b nodes then its length equals 1 b Path types are defined in a similar way Note that we split paths into types with accent and withou
21. ponent We attach the name to each gene usually it is the number i of this gene the number can repeat in case of paralogs in this case this number becomes i j Such a model usually does not include the lengths of genes and intergenic regions as well as the content of genes and intergenic regions The direction of gene indicates to which chain this gene belongs The vertex of this graph defines the place of connection of the neighbouring genes Usually there are lots of chains and cycles in the structure this leads to interaction of these components That is why the case of multiple chromosomes differs from the case of the single chromosome The model includes the following standart operations for chromosome structures The double cut and paste the cut of two pairs of ends of genes and new reconnection of these ends sesqui cut and paste the cut of the pair of ends and the connection of one end with some new free end another end stays free cut of two connected ends join of two free ends If we have two chromosome structures a and b and the gene is present in both structures we will define this gene as the common gene otherwise special If the gene is present in the structure a and 1s absent in the structure b we will define this gene as a gene Otherwise we define the gene as b gene The model includes two additional operations the deletion of the region of special a genes and the insertion of the region of b genes When removi
22. sented as _ after it the name of structure to which this edge belongs is printed The edge to the special node is represented by the same symbol the name of the structure is not printed At the end of file the number of special a and b nodes in the joint graph is printed It is represented as spnodes If the gene partition is not well defined it is enclosed in the brackets Example of the first file for the joint graph from Fig 2 Joint_graph cycles 0 paths 7 1 1 2 2 3 1 4 2 9a 1 2 Sa 2 1 7b 3 2_6a_4 1_8b a spnodes 3 b spnodes 2 The second file contains the shortest sequence of operations transforming the joint graph to its final form Its first line contains the number of operations in the sequence its total weight Then operations are listed Each operation contains component to which one or several operations are being applied short names of operations result Example of the second file for the joint graph from Fig 2 5 5 4 1 2 Sa 2 1 7b L 3 2 6a 4 1 8b L C 1 2 Sa 2 1 78b 4 1 6a 3 2 L Here special nodes 7b and 8b from two paths are being united into special node 78b at one path Labels L and C mean path and cycle The result can be seen on the figure below L Sa 2 78b 4 6a 3 e a M 1 2 Sa 2 1 78b 4 1 6a 3 2 L 9a L SP 3 2 1 2_5a_2 1_78b_4 1_6a L Cut of the node 3 2 and union of special nodes 6a and 9a into one node 69a The result is on the figure below 20
23. t accent because we need to mark out paths without b nodes or a nodes Even paths are assigned to the following types if the path has a single hanging edge and a bnode I if it has one common node and one special a node incident to it 7 1f it has one common node and one special b node incident to it 2 if it has two hanging edges and non hanging edges 2 if it has only two hanging edges 3 if it has at least one edge and no hanging edges Type I is subdivided into types 14 and 7 if the path includes a hanging a node and b node respectively The algorithm performs the actions described below each action is applied to a set of paths from 2 to 4 of the corresponding types specified in the beginning of the paragraph and separated by the plus sign Each action is repeated as long as it is applicable For brevity we denote 2a n 2a as 2a 3b and 3b as 3b 1p and 1 as 1p 2 n 2 as 2 3 1 la 1b 1 Cut an extreme non hanging edge such edges are called external edges in one of two paths of types la and 1b and join the corresponding special node with the extreme special node of the other path sesqui cut and paste operation The two variants here paths are selected only at steps 4 15 4 23 of the algorithm This uncertainty is a characteristic of our algorithm Specifically an intermediate path type 1 corresponding to la or 14 is introduced At these steps c 1s set equal to either a or b for the whole chain of operatio
24. tificial7 artificialS The variants are separated from each other by empty line The task 2 solution contains two files distance matrix of pairwise distances between chromosome structures tree the evolutionary tree of chromosome structures that matches best to the matrix Exampels of task 2 solutions The first example of the set of chromosome structures contains 66 plastids from rhodophytic branch All chromosomes are circular The second example of the set of chromosome structures contains 18 mitochondrion of class Aconoidasida There are both circular and linear chromosomes in this set Input chromosome structures can be found in rhodophytic_branch and mitochondria matrixes of pairwise distances are in rhodophytic branch and mitochondria the evolutionary trees are described in rhodophytic_branch and mitochondria The archive also contains all this data suitable for Microsoft Excel To run utility on these examples it is necessary to run the following scripts from the archive run_example_3 bat and run example 4 bat hrom reconstruction output The inner vertexes of the tree in output are enumerated in order they are being visited by DFS algorithm which starts from the root of the tree and visits the leaves in accordance with their numbers The resulting structure is being printed for every inner vertex Example of output File output leaves 6 species 6 genes 9 assignment cost 9 assignment itself node wit
25. uired Command line parameters m execution mode it equals dist if task 1 is being solved tree if task 2 is being solved c path to the file with chromosome structures o path to the file with operations weights r path to the directory with the results of utility execution File with operations costs Weights are placed one by one in the single string all of them are represented by one positive double number DP Sedo i a E gs Se DD Sa Example of the file string with operations weights DP 1 2 SP 1 1 J 1 C 0 9 aD 0 8 bD 1 5 hrom_reconstruction parameters hrom_reconstruction does not require commandline parameters or additional files Output data In task 1 two files are being created joint_graph the description of the joint graph and shortest_sequence the description of the shortest sequence of transformations The first file has heading joint_graph heading cycles which includes the number of cycles in the joint graph then they are listed heading paths indicates the number of paths then they are listed Each cycle is represented as the sequence of its nodes common node is represented as a gene name followed by the number 1 if it is the start of gene and 2 if it is the end Special node consists of the names of special genes from the block it is followed by the name of structure to which this 19 edge belongs The edge between common nodes is repre
Download Pdf Manuals
Related Search
Related Contents
Samsung CTN364N003 manual de utilizador Manual SC Master 16 FEUILLE D`INSTRUCTION N° 04 Vãos Envidraçados: Otimização e Eficiência Energética em Edifícios Kopie - copie - Operator`s Manual igadgitz receptor y transmisor de Audio Bluetooth Lab manual as of 12/5/2012. The manual is only Bedienungsanleitung (Deutsch) Copyright © All rights reserved.
Failed to retrieve file