Home
User Manual 1.0
Contents
1. Note that in order to use methods based on Integer Linear Programming you need to install Gurobi Optimizer Gurobi Optimization Inc 2013 Gurobi is free for academic use and when it is installed on your computer PDA will call it automatically 2 1 Binary release 1 Download the executable version of PDA for your operating system if it is available pda XXX OS tar gz or pda XXX OS zip where XXX is the current version number and OS the operating system from http www cibiv at software pda 2 Extract the files e g with tar xvzf pda XXX 0S tar gz under Unix This should create a directory pda XXX 0S 3 You will find the executable in pda XXX 0S This executable you should rename to pda or pda exe on Windows systems and copy it to your system search path such that it is found by your system If you encounter problems please ask your local administrator for help 3 Command line options If you run pda h PDA will display a usage screen The meanings of the options are mainly what you see For explanations and possible usages see the subsequent sections Usage pda OPTIONS lt file_name gt lt output_file gt GENERAL OPTIONS h Print this help dialog lt file_name gt Input tree in NEWICK NEXUS format or network in NEXUS format lt output_file gt Output file to store results default is lt file_name gt pda k lt num gt Find optimal set of size lt num gt k lt min gt lt max gt Find opti
2. Biol Evol 21 255 265 13 Faith D Reid C and Hunter J 2004 Integrating phylogenetic diversity complementarity and endemism for conservation assessment Conserv Biol 18 255 261 Faith D P 1992 Conservation Evaluation and Phylogenetic Diversity Biol Conserv 61 1 10 Faith D P and Baker A M 2006 Phylogenetic diversity PD and biodiversity conser vation Some bioinformatics challenges Evolutionary Bioinformatics Online 2 70 77 Gurobi Optimization Inc 2013 Gurobi optimizer reference manual http www gurobi com Huson D H and Bryant D 2006 Application of phylogenetic networks in evolutionary studies Mol Biol Evol 23 254 267 Lewis L A and Lewis P O 2005 Unearthing the molecular diversity of desert soil green algae Syst Biol 54 936 947 Lewis P O 2003 NCL a C class library for interpreting data files in NEXUS format Bioinformatics 19 2330 2331 Maddison D R Swofford D L and Maddison W P 1997 NEXUS An extensible file format for systematic information Syst Biol 46 590 621 Minh B Q Klaere S and von Haeseler A 2006 Phylogenetic diversity within seconds Syst Biol 55 769 773 Minh B Q Pardi F Klaere S and von Haeseler A 2009 Budgeted phylogenetic diver sity on circular split systems IEEE ACM Trans Comput Biol Bioinform 6 22 29 Pardi F and Goldman N 2005 Species choice for comparative genomics B
3. both lt file_name gt lt k gt greedy for greedy algorithm and lt file_name gt lt k gt pruning for pruning algorithm e Otherwise lt user_tree gt lt k gt pdtree In case when viability constraints are used PDA outputs a food web restricted to the optimal subset of taxa into file lt food_wed subFoodWeb gt NOTE e Two options 1o0ut and oldout are mutually exclusive That means if you specify both the later specified option in the command line will override the earlier one e For the case of split network no resulting sub network is written e If you choose option to generate a random tree network it will be written to lt file name gt it acts as output instead of the input file 5 Example usages 5 1 Example usages for trees pda test tree k 4 Infer the maximal PD tree of 4 taxa from the tree in test tree in NEWICK format gPDA or pPDA algorithm Minh et all 2006 will be determined automatically Result ing tree will be written to test tree 4 pdtree Resulting taxa set will be printed to test tree 4 pdtaxa NOTE The program will automatically detect the type of the input file either NEWICK or NEXUS to apply appropriate PDA algorithms It should not depend on the file name tree or nex does not matter pda test tree k 4 o c 10 Compute the rooted PD the tree is rooted at taxon c c will be included into the final PD set pda test tree k 4 g Same as the first comma
4. details how to install the software An easy to use web interface is now available at http www cibiv at software pda web pda We suggest that this documentation should be read before using PDA the first time To find out what s new in the current version please read the Version History section 7 1 1 Methods The methods have been described in details in e O Chernomor B Q Minh F Forest S Klaere T Ingram M Henzinger and A von Haeseler 2014 Split Diversity in Constrained Conservation Prioritization using Integer Programming submitted e B Q Minh S Klaere and A von Haeseler 2010 SDA A Simple and Unifying Solution to Recent Bioinformatic Chaallenges for Conservation Genetics Proceedings of the 2nd International Conference on Knowledge and Systems Engeneering KSE 2010 Hanoi Vietnam 33 37 IEEE Computer Society Los Alamitos CA USA e B Q Minh S Klaere and A von Haeseler 2009 Taxon Selection under Split Diver sity Syst Biol 57 586 594 e B Q Minh F Pardi S Klaere and A von Haeseler 2009 Budgeted Phylogenetic Diversity on Circular Split Systems IEEE ACM Trans Comput Biol Bioinform 6 22 29 e B Q Minh S Klaere and A von Haeseler 2006 Phylogenetic diversity within sec onds Syst Biol 55 769 773 2 Installation See below for information how to install build the different versions of the PDA software Executables are intended for a number of operating systems
5. gt The default limit is 100 if you don t specify This is simply to avoid PDA from memory overflow if millions of such optimal sets exist min option This option tells PDA to find the minimal PD sets instead of the default maximal ones Note that algorithmically on trees the greedy algorithm does not work anymore for PD minimization However the dynamic programming algorithm presented in Minh et al 2009 can be easily adapted for this case by negating all the branch lengths 1out option This tells PDA to write the list of taxa sets into pdtaxa file and the PD scores into score file oldout option This is for compatibility reason only since by default version 0 5 only produces the output file pda which contains more information than only optimal sets scores and sub trees So this option tells PDA to write extra resulting files as outputed in version 0 3 v option This option tells PDA to print more intermediate information while running 3 2 Options for budget constraints From version 0 5 PDA is extended to cope with budget constraints The extended problem is formulated as follows Given a tree or split network integer preservation costs c for each taxon s and a total integer budget B Find a subset S of taxa to maximize PD S such that the total cost do not exceed the given budget gt ses Cs lt B The restriction to integer numbers is not limitation since budgets are normally expressed in integer If not it
6. you are sure that some taxon must be present in the final PD set e g the taxon with a very long terminal split Other basic options i e lt file gt should also work fine with split network pda test nex k 4 mk 2 Identify all optimal PD sets containing 2 to 4 taxa The resulting PD sets will be printed to test nex 2 pdtaxa test nex 3 pdtaxa test nex 4 pdtaxa The PD scores are written to test nex score containing several lines Each line as lt sub_size gt lt corresponding score gt where lt sub_size gt should go from 2 to 4 pda test nex k 3 all Find all multiple optimal PD sets if there are more than 1 optimal 3 set all of them will be printed The second optimal set will be in test nex 3 pdtaxa 1 the third in test nex 3 pdtaxa 2 etc NOTE This optimal might lead to exponential computing time as it actually depends on the number of optimal PD sets pda test nex k 4 mk 2 all Combine the features of the two previous commands 6 Howto employ the method for networks A way to employ this new feature is to use together with program SplitsTree 4 Huson and Bryant 12006 available on the website First you recontruct a cir cular network by e g e enn Se nace al Hin nod The resulting network is then saved to a NEXUS file e g mynet nex Then you can feed mynet nex directly to PDA There could be several blocks in mynet nex input file However PDA only cares for TAXA and SPLITS blocks All o
7. PD SD compl lt areas gt Compute complementary PD SD given the listed lt areas gt OPTIONS FOR VIABILITY CONSTRAINTS eco lt food_web gt File containing food web matrix diet lt min_diet gt Minimum diet portion to be preserved for each predator MISCELLANEOUS dd lt sample_size gt Compute PD distribution of random sets of size k 3 1 General options e lt file_name gt option The lt file_name gt will be the input tree file in NEWICK format or input split network in NEXUS format The only exception is when you set r u lt num_taxa gt the program will generate a random tree and write it into the lt file_name gt file More information on NEWICK tree format can be found at http evolution genetics washington More information on NEXUS file format can be found in the article Maddison et al 1997 or at http awcmee massey ac nz spectronet nexus html lt output_file gt option This is to set the output file name instead of the default lt file_name gt pda where lt file_name gt is the input file name defined above e k lt num_taxa gt k lt min gt lt max gt and k lt min gt lt max gt lt step gt option With k lt num_taxa gt PDA will compute the optimal PD sets of size lt num_taxa gt With the new option k lt min gt lt max gt PDA will compute the optimal PD sets for k from lt min gt to lt max gt So you do not have to run PDA several times on the same tree or network for differe
8. PDA Phylogenetic Diversity Analyzer PDA Manual Version 1 0 August 2014 Core developers Bui Quang Minh minh bui at mfpl ac at Olga Chernomor olga chernomor at mfpl ac at Support Arndt von Haeseler arndt von haeseler at mfpl ac at Steffen Klaere steffen klaere at gmail com Contact address Center for Integrative Bioinformatics Vienna CIBIV Max F Perutz Laboratories University of Vienna Medical University of Vienna Dr Bohr Gasse 9 A 1030 Vienna Austria License Agreement This program is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WAR RANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details Contents 1 Introductio a EA 2 Installatio 2 1 Binary release a he eae eo ee ae o e ea A ge EE EEE eos apa Sees ee ees es pune oe eee edema A 5 1 Example usages for treeg oo aaa a 5 2 Example usages for networkg ooo a e a a Howto employ the method for network N 10 10 11 12 13 13 1 Introduction Phylogenetic Diversity PD coined by 1992 is a quantitative measure to assess the biodiversity of species based on a phylogen
9. can be easily transformed into integer This problem is not solvable by a greedy strategy but by a dynamic programming algorithm A paper describing this is still in preparation e u lt file gt option This file is in the following format The first line contains the total integer budget Each of the subsequent lines contains a taxon name and an associated integer cost separated by blank s Note that any taxon which are not given a cost will be automatically assigned a cost of ZERO e b lt budget gt option If you don t want to use the budget written in the file specified by u lt file gt option use this option The budget specified here will be taken for laler analysis e b lt min gt lt max gt lt step gt option This has the same effect as described for k lt min gt lt max gt lt step gt option see Sec tion B I 3 3 Options for area analysis PDA is capable of computing and maximizing the PD SD scores of areas An area simply refers to a user defined subset of taxa It also can compute the exclusive endemic and complementary PD SD of areas In the following we only describe PD for the sake of simplicity but all options work with SD as well e ts lt area_file gt option The list of areas is given in lt area_file gt This file can be in one of the two formats simple text file or NEXUS format PDA will automatically detect the type of this file Simple text file The first line is the number o
10. ea names separated by commas Let B the union taxon set of all given areas Then the PD complementarity of a particular area A is PD A B PD A UB PD B See Faith et all 2004 for more details and interpretation 3 4 Options for viability constraints NEW In case when the taxon sets from the tree split network and from the food web are not equal the analysis will continue with the union set Those taxa not present in the food web won t be constrained by viability However it is advised to use the tree split network and food web which do not differ much in the species composition General options which can be used in combination with viability constrained analysis lt file_name gt User tree in NEWICK format or split network in NEXUS format lt output_file gt Output file to store results default is lt file_name gt pda k lt num_taxa gt Find optimal set of size lt num_taxa gt o lt taxon gt Root name to compute rooted PD default is unrooted if lt file gt File containing taxa to be included into optimal set V Verbose mode and the following are the additional options eco lt food web file gt option This file contains the food web matrix The first line specifies the number N of taxa in the food web Each next line starts with a taxon name followed by N matrix entries Each matrix entry wij gt 0 defines the portion of prey i in diet of predator j The matrix can also be defined just by 0 or 1 meani
11. eing greedy works PLoS Genet 1 672 675 Steel M 2005 Phylogenetic diversity and the greedy algorithm Syst Biol 54 527 529 14
12. f taxa n of the first area The next n lines are the names of n taxa in the first area Then comes the number nz the number of taxa of the second area and ng lines storing the names of those taxa in the second area This repeats until the last area or reaching the end of file NEXUS format PDA will read the SETS block of the NEXUS file For simplicity here is an example nexus begin sets taxset al a b c taxset a2 c d g n taxset a3 h i taxset a4 j k taxset a5 1 m taxset a6 h i taxset a7 j K taxset a8 1 m end sets e k lt num_area gt option This tells PDA to compute the maximal PD set of areas e excl option This tells PDA to compute the exclusive PD of all areas as well In short given X the set of all taxa and A_an area the exclusive PD of A is simply ePD A PD X PD X A See 2005 for the original description of this measure e endem option This tells PDA to compute the endemic PD of all areas as well Given X the set of all taxa and Aj Ao Am all areas you have Let U be the union taxon set of all areas Then the endemic PD of a particular area A is PD A U UAm PD A U A 1UA 1U UAm See Faith et al 2004 for more details and interpretation e compl lt areas gt option This tells PDA to compute the PD complementarity of all areas given the list lt areas gt The list can contain one area name or several ar
13. mal sets of size from lt min gt to lt max gt k lt min gt lt max gt lt step gt Find optimal sets of size min min step min 2x step k lt k_percent gt Find optimal set of size in percentage o lt taxon gt Root name to compute rooted PD default unrooted if lt file gt File containing taxa to be included into optimal sets e lt file gt File containing branch split scale and taxa weights all Identify all multiple optimal sets lim lt max_limit gt The maximum number of optimal sets for each k if a is specified min Compute minimal sets default maximal 1out Print taxa sets and scores to separate files oldout Print output compatible with version 0 3 V Verbose mode OPTIONS FOR PHYLOGENETIC DIVERSITY PD root Make the tree ROOTED default is unrooted NOTE this option and o lt taxon gt cannot be both specified g Run greedy algorithm only default auto pr Run pruning algorithm only default auto OPTIONS FOR BUDGET CONSTRAINTS u lt file gt File containing total budget and taxa preservation costs b lt budget gt Total budget to conserve taxa b lt min gt lt max gt Find all sets with budget from lt min gt to lt max gt b lt min gt lt max gt lt step gt Find optimal sets with budget min min step min 2 step OPTIONS FOR AREA ANALYSIS ts lt area_file gt Compute maximize PD SD of areas combine with k to maximize excl Compute exclusive PD SD endem Compute endemic
14. nch lengths split weights will be multiplied with the scaling factor and then all external branch lengths trivial split weights will be increased with the specified taxon weights This processed tree network will replace the input tree network for the analysis More information on those additional parameters can be found in Steel 2005 if lt file gt option The file containing all taxon names which you want to always include into your final PD set irrespective of any constraints The format is simply a list of all taxon names separated by blank s or new line An error will be displayed if some taxon name does not appear in the input tree network This option might be handy in comparative genomics when you have already se quenced several species and have to make a decision what species to be sequenced next Then the species names which were already sequenced can be listed in this file See Pardi and Goldma 2005 for a discussion a all option This option allows you to identify multiple optimal PD sets for a specific k This is useful in case there are more than one optimal sets with the same PD score Note that if you specify k lt min gt lt max gt PDA will handle correctly for each k The a option can be used in conjunction with lim lt max_limit gt option lim lt max_limit gt option When you set a option the number of multiple optimal PD sets for each k to be reported will be limited to at most lt max_limit
15. nd but only apply the gPDA algorithm pda test tree k 4 b Run both algorithms Resulting trees will be written into test tree 4 greedy and test tree 4 pruning pda test tree k 4 e test pam Read the weight information from test pam file and integrate this into the tree in test tree Then run the program as the first example command pda test tree k 4 i test taxa Include the favourite taxa listed in test taxa into the final PD set pda test tree k 4 e test pam i test taxa Combining both features of the above two example commands pda 1000 tree r 1000 Generate a 1000 taxa random tree under Yule Harding Model Write resulting tree into 1000 tree file under NEWICK format 5 2 Example usages for networks pda test nex k 3 Find the maximal P D3 set of the split network in test nex in NEXUS format as produced by e g SplitsTree 4 program Huson and Bryant 2006 PDA will detect whether the input split system is circular or not If yes apply the dynamic programming algorithm otherwise use exhaustive search Resulting taxa set will be printed to test nex 3 pdtaxa 11 pda test nex k 3 o 2 Compute the rooted PD the split system is rooted at taxon 2 2 will be included into the final PD set NOTE With this option the program will normal perform much faster the time complexity reduces by a factor of n where n is the number of taxa So always specify o lt taxon_name gt if
16. ng that taxon 1 is not a prey or a prey of predator j respectively Important PDA supports only acyclic food webs In case of cannibalism w 4 0 PDA will set the entry to 0 and continue with the processed food web However if there are still cycles PDA prints the message and stops Example of a food web file 5 taxon_name_1 000 taxon_name_2 1 0 0 taxon_name_3 1 0 0 taxon_name_4 0 1 1 101 OoooooOo oooooOo taxon_name_5 diet lt diet gt option This option specifies the minimum diet portion to be preserved for each predator Skipping it or using 0 results in the naive viability constraint For diet greater than 0 the d viability constraint is used 4 Outputs All outputs will be written to lt file_name gt pda by default or lt output_file gt if you specify it in the command line If you specify 1out all the taxa sets are additionally written to lt file_name gt pdtaxa and the PD scores are written to lt file_name gt score If you specify oldout additional files are written as of version 0 3 as follows Resulting PD taxa set will be written into lt file_name gt lt k gt pdtaxa If the option a or all is specified and multiple optimal is observed subsequent optimal taxa sets will be written into lt file_name gt lt k gt pdtaxa 1 lt file_name gt lt k gt pdtaxa 2 The score is printed to lt file_name gt score For tree resulting sub trees are written into e If you specify b or
17. nt k and thus save a lot of computational time It is even more convenient in some case with k lt min gt lt max gt lt step gt PDA will only report the optimal PD sets of size k from from lt min gt to lt max gt with the a jumping step of lt step gt That means k will iterate through lt min gt lt min gt lt step gt lt min gt 2 lt step gt until not exceeding lt max gt k lt k_percent gt option This option is similar to k lt num gt but defines the subset size as a percentage of the total number of taxa or areas e o lt taxon gt option From version 0 3 one can distinguish between unrooted and rooted PD by this option See 20068 for a discussion Tf your tree network has a specific root or outgroup always specify it by this option The root will then be always included into the final PD set e e lt file gt option The lt file gt must be in the following format 1 First line contains a scaling factor for every branch length split weight in the tree network 2 Second to last line each line contains a taxon name and its taxon weight the importance of that taxon Any taxa not listed here will be assigned a taxon weight of ZERO If you prefer some taxa you can give them a positive taxon weight Specify a very high taxon weight to your favourite taxa if you want to always include them into your final optimal PD set Then the input tree network will be processed as follows all bra
18. ther will be ignored including CHARACTERS DISTANCES NETWORKS 12 ST_ASSUMPTIONS etc You can also prepare your own split network A simple input file is inside the src folder under the name test nex NOTE The algorithm for circular network is very fast So always specify the CYCLE com mand inside the SPLITS block of the NEXUS file Otherwise an exhaustive search will be applied and very slow 7 Version History 1 0 Extension to viability constrained analysis 0 5 1 Some bugs fixed and codes cleaned A new user manual 0 5 Extension to budgeted PD Being able to compute PD and PD related measures for areas 0 3 Extension to split networks Distinguish between unrooted and rooted PD Print also the taxa set now 0 21 Fix a minor bug with STL vector constructor while compiling 0 2 Inclusion of i lt file gt option 0 1 Initial version 8 Credits The parser for NEXUS file format is derived from the Nexus Class Library Lewis 2003 Acknowledgement The authors would like to thank Dan Faith for helpful suggestions on the subject and Heiko Schmidt and Tanja Gesell for fruitful discussions We also thank Mike Steel for constructive comments Financial support from the Wiener Wissenschafts Forschungs and Technolo giefonds WWTF and the University of Vienna are greatly appreciated References Bryant D and Moulton V 2004 Neighbor net An agglomerative method for the con struction of phylogenetic networks Mol
19. y Given n taxa or species connected by a phylogenetic tree the PD of any subset of taxa is defined as the sum of the branch lengths of the minimal subtree connecting these taxa We have recently proposed Split Diversity SD as an extension of PD for split networks or split systems Given a split system of n taxa the SD of any subset of taxa is equal to the sum of the weights of all the splits separating at least two taxa in the subset This definition coincides with the PD when the underlying split system corresponds to a tree Phylogenetic Diversity Analyzer PDA provides a wide range of biodiversity analysis us ing Phylogenetic Diversity PD Split Diversity SD and related measures based on both phylogenetic trees and networks This provides conservation biologists with an objective decision making process The major features include e Maximizing PD and SD given various types of constraints including budgetary geo graphical and ecological constraints e Minimizing budget given diversity threshold e Evaluation of predefined sets of taxa e g in an area including exclusive endemic and complementary PD SD PDA was previously named Phylogenetic Diversity Algorithm Since version 0 5 we decided to change it to Phylogenetic Diversity Analyzer due to its extended functionalities PDA is available free of charge under GNU GPL license from http www cibiv at software pda Please read the Installation section 2 for more
Download Pdf Manuals
Related Search
Related Contents
Infortrend A24F-R2224 4U 24-Bay Rackmount MHPF User Guide 2011.indd - Leicestershire Partnership NHS Trust Dicota Code Backpack 11-13 Lire un extrait Dextrose, Monographie de produit a Aura 15 - STI Telecom SA Mercedes-Benz 600 SEL Automobile User Manual Samsung SC7066 Инструкция по использованию Manual de instalación Copyright © All rights reserved.
Failed to retrieve file