Home
User Manual for TreeCloud version 1.3
Contents
1. e liddell 1 011022 O12021 C1C2 3in fact it is a cooccurrence dissimilarity as the triangular inequality may not be satisfied e dice 1 2011 R C1 e hyperlex 1 max O1 R O11 C1 e poissonstirling O1 log O11 log E11 1 e chisquared 1000 N Ou Ey1 E11 E22 e zscore 1 Oy Fy1 VEn e ms 1 min O1 R1 O11 C1 e oddsratio 1 log O1 022 O12021 e loglikelihood 1 2 O11 log O11 F 11 O12 log O12 F 12 O21 log O21 F21 O22 log O22 E22 e gmean 1 O11 V RC l1 O11 V N E1 e mi mutual information 1 log O11 E11 e ngd normalized Google distance maz log R1 log C1 log O11 N min log Rj log C1 Ugly and dirty implementation tricks are used to avoid obtaining infinite numbers The dissimi larity matrix is then normalized i e all its values are shrinked to the interval 0 1 in a linear way if they are all positive in an affine way otherwise 5 License 5 1 How to cite Although TreeCloud is a free program licensed under the GPL license we would appreciate if you would link to the website www treecloud org or cite the following publications when you use it e Philippe Gambette and Jean V ronis Visualising a Text with a Tree Cloud IFCS 09 2009 software freely available from www treecloud org 2 e Daniel H Huson SplitsTree analyzing and visualizing evolutionary data Bioinformatics 14 1 68 73 1998 software fr
2. see next section If a puzzling error appears please contact me If you have writing rights on the folder containing TreeCloud then any change of configuration will be recorded in Treecloud ini so that the same parameters appears the next time you launch the program Marreco lolx Distance This program is a graphical user interface for Treecloud it builds the tree cloud of a text Download Treecloud as well as its san liddell Python source code and user manual at http www treecloud org a F gmean English jaccard Python path download version 2 x on www python org C dice C Python26 pyth OT pie 2 ython265python exe l l B C zscore SplitsTree path download version 4 on www splitstree org C hyperlex C Treecloud s plitstree Ink C chisquared poissonstiling Text to visualize E Open a text file loglkelihood lt acte acte1 gt oddsratio 3 s ngd lt scene scene1 gt mi lt perso Emilie gt impatients d sirs d une ilustre vengeance Nant la mart da man n ra a farm la naircanna Sliding window Width a Sliding ste Words in the tree cloud E Open a word list Word sizes E Open size list Word colors E Open color list a Cinna Cinna Arial PLAIN 18 Cinna 255 51 0 ta ta Aria PLAIN 17 ta 255 51 0 Separator tu tu Arial PLAIN 16 tu 231 98 0 Stoplist Colors c Sites G ambetteLirmm T reecloudDistribution treecloud S
3. to thank all people who gave some feedback on early versions of the program and the resulting tree clouds Please refer to the acknowledgement section of 2 for more information on who helped develop the more scientific and theoretical aspects of the program References 1 Stefan Evert The Statistics of Word Cooccurrences Word Pairs and Collocations PhD thesis University of Stuttgart 2005 2 Philippe Gambette and Jean V ronis Visualising a text with a tree cloud In IFC S 09 2009 Software freely available from www treecloud org 3 Daniel H Huson Splitstree analyzing and visualizing evolutionary data BIO 14 1 68 73 1998 Software freely available from www splitstree org 11
4. Stefan Evert 1 However this thesis provides many similarity formulas so they are transformed into dissimilarities as described below The stability of the following cooccurrence distance formulas i e how the tree is modified after small changes in the input text depending on the chosen formula were compared on the Obama speech corpus 2 Following these tests we advise against poissonstirling oddsratio ngd and mi Given two words A and B and e Oy observed number of sliding windows containing both A and B e Oj observed number of sliding windows containing A but not B e O21 observed number of sliding windows not containing A but B e O22 observed number of sliding windows containing neither A nor B the following variables are defined e Ry Oy O12 number of sliding windows containing A e R O21 O22 number of sliding windows not containing A e Cy Oy O21 number of sliding windows containing B e C2 O12 O22 number of sliding windows not containing B e N Ri Ro Cy Co number of sliding windows e Ey R1C1 N expected number of sliding windows containing both A and B e Ey RiC2 N expected number of sliding windows containing A but not B e E R2C1 N expected number of sliding windows not containing A but B e E R2C2 N expected number of sliding windows containing neither A nor B The definitions of cooccurrence formulas are the following e jaccard 1 O11 011 O12 O21
5. User Manual for TreeCloud T TreeCloud version 1 3 13 12 2009 Philippe Gambette May 6 2010 Contents Contents 1 Introduction 2 Obtaining and Installing the Program 3 Using the Program 4 Parameters 5 License 6 Version History 7 Acknowledgements References 10 11 11 1 Introduction TreeCloud builds a tree cloud visualization of a text which looks like a tag cloud where the tags are displayed around a tree to reflect the semantic distance between the words in the text TreeCloud is a free software licensed under the GPL license However we would greatly appreciate that you follow the instructions of section 5 on how to cite the program when you use it If you have any problem using TreeCloud you can contact me place financial street h wall democrats war ctisiS state compete market republicans irag kids system fight lil education overnment school opportunity pa money g 3 teachers schools congress hafe le support a tially dd responsibility business means businesses parents job health americans hope children care sense homes common college i lead provide national Ted called miljjons arn sident tind mid d le p workers party stpaggiing lobby year obbyt bri s was ington lass ecmpanios choice cut family change an eis insurance days teal taxes country pay 2eagomic espe win United sides mecal cuts failed JOHN election states PA toks PER reee som peopie jobs farnilies d
6. by a special character suggested by D Barrowcliff correction of Linux line ending bug thanks Nicolas correction of negative colors 2009 04 24 1 1 new distance formula ngd Normalized Google Distance new text alteration method random block deletion new word selection load custom word list possibility to load a distance matrix correction RF distance removed trivial splits from computation 2009 03 30 1 0 user manual licensed under GPL License new color method dispersion text alteration method random word deletion for bootstrap 10 2009 03 09 0 3 parameter normat splitstreepath distance matrix normalization new color method chronology split extraction from the tree Robinson Foulds distance between trees 2009 03 04 0 2 parameters unit color new color set berry distance between two distance matrices 2009 02 20 0 1 computes the treecloud of a text and displays it in SplitsTree computes arboricity of the distance matrix parameters nbwords minnb window distance 7 Acknowledgements I thank Jean V ronis who originated the project and helped on many parts of the code TreeCloud could not exist without the SplitsTree software started by Daniel Huson who also wrote the user manual for Dendroscope which was used as the basis of this manual Jean Charles Bontemps and Nicolas Moreau found some bugs in the program and helped correcting them I finally want
7. ds txt Then use the parameter words C TreeCloud KeptWords txt when you call TreeCloud Note that this option has priority over parameters minnb or nbwords which will not be taken into account if you use parameter words Note that the word list is loaded after the stop list Thus if some words of your word list appear in the stop list they will not be in the tree cloud modify the stoplist if you really need them 3 5 Modifying the code and using the functions Don t hesitate to use TreecloudFunctions py as a library of functions for your own scripts The source code is well commented For example with the few code lines below you can transform a CSV file containing a distance matrix into a Nexus file which is then loaded into SplitsTree to compute a tree matrix openMatrix filepath distance matrix 1 keptWords matrix 0 exportToNexus distance keptWords filepath Nexus 1 nexusOrders distance keptWords filepath Nexus 0 3 6 Created Files We consider that we build a tree cloud of the file lt filename gt with the distance formula lt formula gt The list of words and frequencies is written in lt filename gt freqs txt This file can be used by TagCloudBuilder to build a word cloud The distance matrix is written in lt filename gt lt formula gt csv and in the Nexus format in lt filename gt lt formula gt nexus A set of SplitsTree commands is saved in the Nexus format in lt filename gt lt formula
8. eely available from www splitstree org 3 5 2 License TreeCloud v 1 3 13 12 2009 http www treecloud org Copyright 2009 2010 Philippe Gambette 4see source code for details fuction distanceFromCooccurrence in treecloudFunctions py but most of the time for a variable x which may cause problems if equal to zero we consider max x 0 00000000001 instead TreeCloud is free software you can redistribute it and or modify it under the terms of the GNU General Public License as published by the Free Software Foundation either version 3 of the License or at your option any later version TreeCloud is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with TreeCloud If not see http www gnu org licenses 6 Version History 2009 12 13 1 3 word focused coloring of the tree cloud thanks J M Viprey possibility to use a custom word list to display in the tree cloud possibility to use custom color and size lists for the words in the tree cloud possibility to view the tree cloud in Dendroscope if words without special characters thanks Daniel possibility to use whitespaces in name files thanks Sbastien 2009 06 09 1 2 cooccurrence computation in windows separated
9. gt nexorders then SplitsTree executes this file builds the tree and saves the result in lt filename gt lt formula gt nocol nexorders TreeCloud opens this file and colors the tree and finally executes SplitsTree to display it Then you can use SplitsTree to modify the appearance of the tree cloud rotate move some edges etc A bug with the current version prevents from saving the colors in the picture of the tree cloud Thus you should press PrintScreen and paste the picture in an image editor to save it 3 7 Using file cooccurrence instead of sliding window cooccurrence By default the distance between words in the tree cloud is computed according to word word cooccurrence in a window sliding from the beginning to the end of the text see parameters window and step in section 4 But you can use an alternative cooccurrence computation two words are considered to cooccur if they appear between two separating characters see parameter sepchar in section 4 So if you need to compute distances based on word word cooccurence across a set of documents just join the documents separated by a special character or string which does not appear in the documents surrounded with whitespaces lower case like the string aaaaaaa or a character which is not a punctuation mark Then use TreeCloud on this file with the parameter sepchar aaaaaaa If you use this method there should be a sufficient number of windows separated by sepa
10. i policies running kig work Seam cost single as future american economy moment easy i investterm months Chance energy life int OF romise j women technolo Lig Aba ey men policy century inerica 0 generation 21st world power bu nation challenges Figure 1 Tree cloud of a corpus of Obama s speeches for his 2008 presidential campaign TreeCloud was used with parameters english stoplist NJ tree color chronology nbwords 150 window 30 distance oddsratio gambette lirmm fr or http www lirmm fr gambette PersoContactENG php 2 Obtaining and Installing the Program Python version 2 X should be installed on your system If you have OpenOffice you may find Python among the OpenOffice files for example in C Program Files Open0ffice org 2 4 program python bat Then you should download and install SplitsTree from www splitstree org You will also need Java to run this program Please install SplitsTree in a folder whose path contains no space Otherwise create a link to SplitsTree whose path contains no space for example C TreeCloud SplitsTree 1nk Finally visit www treecloud org to download the archive Treecloud zip and extract it to a folder where you have writing rights Spaces should not appear in the path of this folder The two main files of the program are Treecloud py and TreecloudFunctions py and the Windows pro gram Treecloud exe is a graphical user interface which calls Treecloud py with the appropria
11. nglish txt C TreeCloud Example txt Then you can add some options the exhaustive list is given is Section 4 to customize your tree cloud C Program Files Open0ffice org 2 4 program python bat C TreeCloud Treecloud py splitstreepath C TreeCloud SplitsTree 1nk stoplist C TreeCloud StoplistEnglish txt C TreeCloud Example txt distance hyperlex nbwords 40 window 100 unit 1 color chronology This command line will provide a tree cloud of the 40 most frequent words built with the Hyperlex distance where cooccurrence is computed on sliding windows of width 100 words The tree will have edges of length 1 and the colors will reflect the average position of the words red in the beginning of the text blue in the end Instead of specifying the number of words of the tree cloud with parameter nbwords you can use minnb 3 to express that you want to make the tree cloud of words appearing 3 times or more 3 3 Directly with the command line on Linux Apply the procedure described in the previous section apart from the part where you launch the command line with the Start menu but I guess you know how to open a terminal on Linux 3 4 Forcing the list of words which appear in the tree cloud With the command line version you can use a custom word list to define the words which will appear in the tree cloud instead of the 30 most frequent words by default Save your list as a text file with one word on each line say in C TreeCloud Kept Wor
12. rating characters much more than the number of words in the tree cloud Furthermore it is recommended that those windows contain a similar number of words Otherwise the available distance formulas in TreeCloud may not be adapted and an intertextual distance formula not yet implemented may better suit your data 4 Parameters 4 1 Summary Here is a summary of all parameters of the program stoplist lt filename gt lt filename gt is used as a stoplist i e each line of the file stored in lt filename gt contains a word which will not be considered during the rest of the analysis words lt filename gt only words present in lt filename gt one word per line will be kept for the analysis minnb lt n gt the tree cloud contains words appearing at least n times nttp www freecorp org FRA programmesdivers htm TagCloudBuilder nbwords lt n gt the tree cloud contains at most nbwords words default n 30 window lt n gt width of the sliding window for cooccurrence distance default n 30 step lt n gt sliding step of the sliding window for cooccurrence distance If you use n 30 for parameter window 30 then you only consider disjoint windows for cooccurrence computation default n 1 most accurate sepchar lt string gt separation character to separate cooccurrence windows instead of using sliding windows of constant width default not used see parameter window use sepchar aaaaaaa for example s
13. te parameters On this website you can also find stoplists for English French and German to remove useless words in the tree cloud and of the 3 Using the Program 3 1 With the graphical interface on Windows When you execute the program Treecloud exe the window illustrated in Figure 2 appears If the color red appears somewhere in the window it means that there is a problem you should solve either Python or SplitsTree was not found or there are whitespace in their filenames or the stoplist was not found Correct the problem using the appropriate buttons Once you have set all desired parameters load a text using the button labeled Open a text file the file name should not contain the symbol or paste a text in the area just below Use the appropriate stoplist depending on the text language and click on Compute the tree could with TreeCloud The command line which appears just above this button is then saved into an MS Dos command file TreecloudCommand bat in the same folder as TreeCloud which is executed silently Thus nothing seems to happen until the tree cloud is computed and appears in SplitsTree this computation takes about 40 seconds for the 100 word tree cloud of a 100 Kb text on a 2008 Dell laptop If nothing happens at all then you can try to identify the problem by copying the command just above button Compute the tree could with TreeCloud and paste it into the command line
14. toplistFrench txt GJeustom v yahoo Number of words in the tree cloud or minimal number or occurrences to be present in the tree cloud C bery 2 C chronology C dispersion Command line chronodis words C TreeCloud Corneille CinnaSpecifs txt customsize C TreeCloud Comeille CinnaSpecifsSize txt customcolor C TreeCloud Comeille CinnaSpecifsColor txt a Cc E distance gmean coloryahoo C TreeCloud Corneille Cinna txt al Edge length unitary teal l Compute the tree cloud with TreeCloud Figure 2 Graphical user interface for TreeCloud on Windows 3 2 Directly with the command line on Windows You should first open the command line window Start Execute type in cmd then press Enter Then execute the Treecloud py file with Python on the text file whose tree cloud you want to build Once again the path of this file should not contain the symbol In the following we will consider that we want to create the tree cloud of the file C TreeCloud Example txt Recall that Python should be installed on your system say at C Program Files Open0ffice org 2 4 program python bat and that the filenames you use should not contain any space Then you can use the following command to build the tree cloud with default options C Program Files Open0ffice org 2 4 program python bat C TreeCloud Treecloud py splitstreepath C TreeCloud SplitsTree 1nk stoplist C TreeCloud StoplistE
15. tring in lower case only do not use a punctuation mark distance lt formula gt lt formula gt is chosen to compute the cooccurrence distance Possible values for lt formula gt see Evert s PhD thesis chisquared mi liddell dice jaccard gmean hyperlex ms oddsratio zscore loglikelihood poissonstirling normat lt string gt normalization method to transform the distance matrix into a 0 1 matrix affine linear log auto default auto splitstreepath lt path gt path of the program SplitsTree splitstree org used to draw the tree clouds Please avoid spaces in the path default C textbackslash TreeCloud textbackslash SplitsTree 1nk dendropath lt path gt path of the program Dendroscope dendroscope org used to draw the tree clouds instead of SplitsTree Please avoid spaces in the path unit lt b gt tree edges with unit length default b 1 otherwise set b 0 color lt string gt name of the color set chronology dispersion chronodisp berry yahoo default chronology customcolor lt path gt path of a csv file containing words in the first column and 3 integers in the next 3 columns customsize lt path gt path of a csv file containing words in the first column and font references in the second one Example Arial PLAIN 14 4 2 Cooccurrence distance formula To compute the cooccurrence distance between two words TreeCloud provides many formulas Details on how to use them are given in the PhD thesis by
Download Pdf Manuals
Related Search
Related Contents
Stoves 900DTC 3-6321-0 Manual disolvente nitro spb Projecto de um Hotspot, com uso controlado, para uma rede de TAFCO WINDOWS NU2-014V-W Installation Guide Structural Equation Modeling – Rakenneyhtälömallinnus Samsung CE107MT Manuel de l'utilisateur JVC GR-SX867UM User's Manual 20 Compact Copyright © All rights reserved.
Failed to retrieve file