Home

TXM Reference Manual 0.5_EN

image

Contents

1. Multext nlp a European standard morphosyntactic tagset NLP soft for lt Natural Language Processing gt software processing human language information in texts occurrence met the appearance of an event in a corpus like a word occurrence operator exp a special character expressing a particular constraint in a pattern in the search query language page mod a segment of text rendering usually corresponding to a reference paper edition part mod an element of a corpus partition partition mod a decomposition of a corpus in several parts The sum of all the parts of a partition is always the whole corpus A partition is used to analyze contrasts between parts like between dates of speeches authors of texts sections of a text etc pos mod for lt p art o f Speech the main grammatical information of a word preference int all TXM commands have default parameters affecting their behavior Some of those parameters can be edited in the Preferences panel property mod an information about a lexical unit or a structural unit query com the expression by characters of a pattern of word sequences combined with a pattern of word properties reference int an information displayed at the beginning of concordance lines coming from unit properties Score met a numerical value indicating a statistical tendency script soft a file containing the description of a sequence of
2. 2 LI 1 nom du r sultat de la usion Illustration 41 Columns selection window This window allows you to select several columns Use the search field to filter select columns by a regular expression or select directly a value column or several ones gt adds a value lt removes a value Then check merge or delete to select the operation to apply you can give a name to the merge result merge or delete lines click on the merge or delete lines button a dialog box similar to that window above allows you to edit the number of lines to keep or select directly lines in the table right click to delete or merge the selected lines click on Ok to refresh the table export the table from the contextual menu sort columns by clicking on their heads 62 4 12 TXM Reference Manual 0 5 TXM settings w Preferences TXM EN TXM Advanced NLP TreeTagger Search Engine Statistical Engine User Concordances Cooccurrences Correspondance Anal Description Export Language Lexical Table 4 General Help E SVG Restore Defaults Apply Illustration 42 TXM settings window TXM Settings Advanced advanced settings of the TXM platform o NLP software settings for the Natural Language Processing tools TreeTagger morphosyntactic tagger used by TXM o Search Engine parameters of the CWB server integrated into TXM o Stat
3. j 2 None v 3 None sort keys 1 None d Left context ftext loc de Gaulle text loc de Gaulle itext loc de Gaulle text loc de Gaulle itext loc de Gaulle text loc de Gaulle text loc Pompidou text loc de Gaulle text loc de Gaulle bext loc Pompidou text loc Pompidou text loc Pompidou text loc Pompidou text loc Pompidou text loc Pompidou text loc de Gaulle text loc de Gaulle text loc de Gaulle text loc de Gaulle text loc de Gaulle amiti Voil ce que complet Voil ce que destin Voil ce que qui m a demand si est confi Mais si citoyens Voil ce que de certains de ceux l que ai voqu l Europe avec quelque malice Alors la mesure du possible le pass N anmoins des int r ts p troliers et viter ou sanctionner Mais dire quelle tristesse quand quelle id ologie Depuis que Fran aises Fran ais vous savez bien ce que France voila ce dont je voyais je voulais je voulais je voulais je voulais je voulais je voulais je voudrais je voudrais je voudrais je voudrais je voudrais je voudrais je voudrais je vois je vis je veux je veux je veux et de nouveau votre attention Cependant ns dire en ce qui concerne vous dire mesdames vo
4. 8 oy 7 teme S tope word Ex Thresholds Fmin 1 sortkeys 1 None M 3 None w 4 None v t 105191 v 8965 fmn 1 fr r3 diu s st 1656 gt gt word A Left context Keyword Right context A vote Au ces o votre r ponse serat non va de so que je n assumerais pes plus longtemps ma Fonction si per un ou masal ar un out EA vous m exprimez vot g la France Vive la R publique we b once 1 Mesdames Messieurs e me relate den vous voir Nous sommes devant un une actuals quon 5 sont dedans masiva des questions essentielles e ce sont celles l que vous propose d examiner ensemble vous avez la parole Mesdarmi ag x gt DISCOURS Edition Page 2 b et r les b Que en 4 des d Go MI Hec CuDocuents and SettinasiconTAM samples discours bin HTML muki2l DeGaulle 2 hal lt v 3 g ncerez par un vote Au cas o votre r ponse serait non ll va de soi ips ssi vous m exprimez votre confiance hi HL impar avec es Pouvoirs publics et JE l esp re le conc de tous ceux qui veulent servir l int r t commun de faire changer partout o dl le faut des structures troites et p rim es et ouvrir plus largement la route au sang nouveau de la France Vive la R publique Vive la France grstsegsgggoe EJ console 2 System Output Starting up Search Engine Launched connected Statistical Engine launched connected Ready Messages Illustrati
5. HE i i M be lt 2 2 l Zo e A o 4 Edition Navigation Edition Page Illustration 10 The results All the results of the commands are by default displayed in the right results zone First each command the results are displayed in a new window with a name related to the command and its parameters and a new icon is added in the corpus view 24 TXM Reference Manual 0 5 The name of the window is displayed in the tab of the window and in the legend of the icon That tab is an important control widget to manage the display of the window as will be seen in the window manager section If a window is closed during the session it can be reopened by double clicking on the corresponding icon in the corpus view 3 2 1 6 The Messages Commands comments Nsearcineg us orarcing u i3earch Engins lawobed conmeoted i i3 at 9t cal Esgise lauschod consectet Status messages Illustration 11 The Messages zone The Status line display simple messages like the number of results The commands comments area displays more informations related to commands it can be scrolled selected copied and pasted It can also display critical messages 3 2 2 The Window Manager With the window manager one can maximize minimize collapse reopen move and resize any window of the interface with the mouse efficiently The window manipulations are the following temporarily maximize the window to ful
6. corresponding to the metadata of the line The metadata will be injected at the level of each transcription if present Parameters That module uses a parameter file called import properties coming with the transcription files With it one can set three different parameters e removelnterviewer can be true or false if the module should ignore the content of the speech of each interviewers in the import process metadataList the list of metadata to be considered Metadata are separated by a character csvHeaderNumber the number of header lines in the metadata CSV file e there are only metadata identifiers e 2 there is one line of identifiers and one line of long identifiers e 3 there is one line of identifiers one line of long identifiers and one line of metadata types ee http trans sourceforge net en presentation php 33 The field character surrounds column data containing commas or spaces etc That last value is not used in that version of the software ud TXM Reference Manual 0 5 7 5 2 output The structure of Transcriber files is reproduced e each Transcriber section corresponds to a div structure aspeech turn corresponds to a sp structure e an speech utterance corresponds to a u structure The two kind of Transcriber events are managed 1 milestones comments short noise 2 word segments pronunciation incertainty Milstone events comments are encoded in
7. je followed by a verb possibly with one or two words in between write je 110 27 pos v the modifier adds the capacity to count how many elements must match to search for the word je followed by a verb at any distance but not crossing sentence boundaries write je pos V within s 1 the within close expresses a constraint on the boundaries of all structural units 2 please note that the first operator counting from left has not the same semantics as the second one which is the same as the ones we have introduced before that is repeat gt The first means repeat the expression before which is a ees word occurrence expression and not a character occurrence expression Summary outside double quotes repeats word expressions on their left inside double quotes repeat character expressions on their left to search for the word je followed by the verb aimer at any distance but not crossing paragraph boundaries write je lem aimer within p To understanding all the level of CQL queries you can read the Reference manual of CQL expressions http weblex ens lsh fr doc weblex refregexpcqp html Please see the CQP User s Manual for a complete description at http www ims uni stuttgart de projekte CorpusWorkbench CQPUserManual HTMI 67 TXM Reference Manual 0 5 6 Driving the TXM plat
8. MAUS esas eie tan oce A vetri ni A tee ae 14 23 34 JV Agen spesa utque ie ae OR E ne aera lampes istante Hae NS annee 11 25 Tar e A e Toe aaah RE Et 26 27 29 LABS Le APE REA REE De wi RR ORO PEER ERATE IN 29 30 32 jg eee HELPER dexter a E Eia ELE 11 14 16 17 21 24 25 28 29 30 32 e O RN ee Ne TODAS A A taedet ttes do ande 17 ND PAR RS IET CE MEA ERROR EC RU RUMOR ONT PO BR AE lads ee ST LME Modan 11 25 26 27 29 30 31 34 36 37 39 40 41 44 50 51 Vocabulary NI ne a NA ee NE A O A te tt AE 16 24 92 27 29 60 51 53 57 36 39 56 29 30 46 41 60 63 41 63 64 24 25 29
9. To do so use the File View see the File view and text editors in section 3 2 1 1 2 to find open and change the script for example by changing the word searched for and the name of the backup file and then execute it through the contextual menu of the text editor accessed by a right click of the mouse 2 Tn the same way as you extend MS Word by a Visual Basic macro Dierk K nig et al Groovy in action Greenwich Manning 2007 Kenneth A Barclay et W J Savage Groovy programming an introduction for Java developers Morgan Kaufmann Publishers 2007 Subramaniam Venkat Programming Groovy dynamic productivity for the Java developer Pragmatic Bookshelf Raleigh Daniel H Steinberg ed 2008 2 Please note that no security policy has been enforced on Groovy scripts in the TXM platform for the moment so be vigilant with script code of which you don t know the provenance 2 You can also read that script on line at http textometrie svn sourceforge net viewve textometrie trunk Toolbox 0 4 7 org textometrie toolbox src groo vy org textometrie test conc groovy revision 1080 amp view markup 68 TXM Reference Manual 0 5 The best reference documentation for all the available TXM commands and their parameters 1s the Java documentation of the TXM platform at http textometrie sourceforge net avadoc index html For example the parameters of the Concordance class constructor are desc
10. programme ou son diteur vous pouvez le d bloquer Quand puis je d bloquer un programme Click on Unblock D bloquer b In the following dialog box Alerte de s curit Windows ES Pour vous aider prot ger votre ordinateur le Pare feu Windows a bloqu certaines fonctionnalit s de ce programme Voulez vous continuer bloquer ce programme Nom Rserve Editeur Inconnu Maintenir le blocage D bloquer Maintenir le blocage et me redemander ult rieurement Pour plus de s curit le Pare feu Windows bloque actuellement l acceptation des connexions Internet ou r seau pour ce programme Si vous faites confiance ce programme ou son diteur vous pouvez le d bloquer Quand puis je d bloquer un programme Click on Unblock D bloquer 3 1 2 On Linux 1 Through the Applications Sciences TXM menu item of your system menu 2 Or call in a shell TXM amp 3 Or with the ALT F2 shortcut followed by TxM The cqpserver process is the textual database engine which needs to communicate with the TXM platform through a network protocol The Rserve process is the statistics engine which needs to communicate with the TXM platform through a network protocol 13 TXM Reference Manual 0 5 3 2 Using Windows Menus Toolbars and Shortcut Keys 3 2 1 General Graphical User Interface Eak P 0 Ina TT DISCOURS lemma je 53 Er Properties word Eat
11. Import IO Og sien to qoe re a cube cot ale dunce ciat e us eU Qo parer n 27 Illustration 13 DISCOURS Description 31 Illustration 14 DISCOURS Editions iii dim 33 Illustration 15 Navigation window between the parts editions 33 Illustration 16 Simple sub corpus selection build the sub corpus of all the speeches of the DECIDE pr sidents nina nn oe at e eec tendi te entente 34 Illustration 17 Assisted sub corpus selection build a sub corpus of the texts of the 12th Century d A Eee n e 35 Illustration 18 Advanced sub corpus selection build the sub corpus of all the speeches of the Pompidou president made in O on 36 Illustration 19 Simple partition building build a partition on every date of speech 37 Illustration 20 Building a partition on the DISCOURS corpus with the text date values 38 Illustration 21 Build a partition on every president for the year 1970 39 Illustration 22 Concordance Initial Search Form 40 Illustration 23 Building a query for the word je followed by a verb 41 Illustration 24 Concordance of the je word followed by a verb in the DISCOURS corpus 42 Illustration 25 Reference Pattern Dialog BOX att 44 Illustration 26 Cooccurrents of the words beginning by j 45 Illustration 27 Lexicon dialog DO aia 46 Illustration 28 word forms frequency list of the DISCOURS corpus sorted alphabetically 47 Illustration 29 Index initi
12. character literally in a query use the operator immediately before it 66 99 to search for a word ending by a write a eq to search for a word ending by a possibly with a s after write 23 For Corpus Query Processor from the IMS Open Corpus Workbench technology http cwb sourceforge net 265 TXM Reference Manual 0 5 ras means possibly the last expression which is s here to search for a word beginning by I and ending by a write Lata to search for a word containing the letter I write rte to search for a word containing a blank write i XM blanks have no meaning in CQP expressions except in double quotes to search for a word beginning with L or T write LIJ the construction means one of the following characters can match and just one gt to search for a word beginning with a lowercase write kasz oes 6699 6699 the in the a z construction means lt a value of character between the a character to the z character can match gt that is lt any lowercase character and just one gt to search for two adjacent words write qeu jour please note that the blank character in the middle of the query is part of the CQP query language and is not a literal blank It can for example be repeated without changing the meaning of the query to s
13. corpus sources Those building elements can be inasingle file or in several in different formats The import process of a corpus from the sources into the search engine indexes editions etc is implemented by a Groovy script Any Groovy script as any import loader can be plugged into the TXM platform at run time The input parameter of a loader is the root directory of the source corpus The output of a loader is loader dependent but at least a new root object for the corpus is added to the Corpus view to be able to apply any TXM command on it and a new directory is created to hold the binary version of the corpus at SHOME TXM corpora lt name of the corpus gt See the Import Environment 0 4 7 FR document for an introduction to all the available concepts 28 TXM Reference Manual 0 5 You can see below the import setup window SO Source directory Options Sources characters encoding System encoding Cp1252 O Guess OSelect Main language System lang en O Guess O select Tokenizer parameter file Ee Illustration 12 Import window The source directory parameter is mandatory You can check system encoding default check guess and press the button or select directly the encoding of your source You can do the same for the main language 3 4 5 Example of loader the CNR CSV Importer The
14. cru fr wiki txm users public retours de bugs logicielZsynthese des retours de bugs 70 TXM Reference Manual 0 5 e The Text Encoding Initiative consortium http www tei c org 7 2 2 annotation A morphosyntactic description is added to each word by TreeTagger using the old French linguistic model rgaqcj par The tagset used by this model is CATTEX2009 see http bfm ens lyon fr article php3 id article 176 7 2 3 edition Text edition type is close to the one produced in the Queste del Saint Graal project see http textometrie risc cnrs fr txm However that component of the module will be later replaced by the XSLT CSS stylesheets of Alexei Lavrentiev to get similar and maintened results 7 3 XML TXM module 7 3 1 input That module imports the files encoded in the XML TXM UTF 8 format extension xml of the source directory It doesn t do any tokenization of words because the XML TXM format already encodes them with lt w gt tags One interest of that format is that it requires little work to be imported into TXM Although not finalized yet it is always compatible with the TEI encoding scheme There is one text per XML file Example lt xml version 1 0 encoding UTF 8 gt TEI xmlns http www tei c org ns 1 0 xmlns txm http textometrie org 1 0 lt teiHeader type text gt lt fileDesc gt lt titlestmt gt lt title gt Grec essai lt title gt lt respstmt gt
15. deecriprive nare for rhe corpus Kang n7 A corpua joust be rd i acyasley quere path te Finary dara files SOME cilDecuments uas Settings lam d XH 123 des quelh ban Jatu A optional info file displayed by infoj comrand tk COP ETC ZiVneeuneenrne acd Senet nee ent TXW mares las quer bin danas cote 1 i i A corpus yropertics z ovids additional infcrmation OUT the cezpus Af hare o uror Y character anzladite of Anne A cc f AA Language 77 inaez ISZ code for B i i ae i ri i i AY p attz Latco mules annotat cio i Sd ET gt ax d ATIPLBUIE ua F ATTRIBUTE tpos ade cub ATTRINUTE arre ATIMIBUIE lu ATTRIBUTE yt ATTRIMITE b ATIPIBUTE czig ATIPIBUTE 220 5 7 Left j 7 ES TA i L i i i i i i bos da i H i 1 i i Illustration 4 The File view The File view displays a classical hierarchical icon view of the folders and files in the file system It allows you to edit all those files from inside TXM TXT or XML source files Groovy or R script files etc should you need to correct an input file or a script for example Browsing The button opens the parent directory of the current directory The text field display the current directory you can change it and press Enter or click on the Ok button to refresh the view The TXM button brings back to the TXM user s directory A dou
16. from the File menu 15 TXM Reference Manual 0 5 The Corpus view is organized hierarchically Each root object is an independent Corpus That corpus is related to the Base from which the texts where imported All the children icons are objects resulting from TXM commands Suboorpora C icon same as the root corpus from Create sub corpus Partitions P icon from Create partition Lexicon Index Concordance Coocourrences Specificities Correspondence analysis Lexical table A branch in the tree results from new objects being created as results of commands applied to the parent object Each object type can be applied on a specific logical set of commands a Corpus object can be applied on any command a Sub Corpus object can be applied on the same commands as the corpus plus the Specificities command a Partition object can by applied on only a Specificities Factorial analysis or Lexical table command Double clicking on result objects reopens the results window when it has been closed 2165 TXM Reference Manual 0 5 The File view and text editors Text editor oe menu of the editor d Entry to run the script i 1 1 e s i 1 7 j H p 1 f d 1 RO aa H 7 A4 registres entry tos corpus OTT 1 Fr D e a 1 i i i L Pd i i i 1 J gt LE a long
17. lt resp id ucl gt initial tagging lt resp gt lt respstmt gt lt titlestmt gt lt fileDesc gt lt encodingDesc gt lt classDecl gt lt taxonomy id lemma gt lt bibl type tagset gt lt taxonom y gt lt taxonomy id pos gt lt bibl type tagset gt lt taxonomy gt The XML TXM format is defined as a XML TEI P5 extension specifically for the TXM platform ls TXM Reference Manual 0 5 lt taxonomy id intext gt lt bibl type tagset gt lt taxono my gt lt classDecl gt lt encodingDesc gt lt teiHeader gt text id grec try 1 gt lt w id w 1 gt lt txm form gt mot lt txm form gt lt interp resp resp type lemma gt lemme lt interp gt lt interp resp resp type pos gt pos lt interp gt lt interp resp resp type autre gt autre lt interp gt lt w gt cu lon lt text gt lt TEI gt 7 3 2 output Each XML tag level generates one structural level The properties of words are imported from the content of the lt interp gt sub elements of each lt w gt element 7 3 3 annotation No annotation is added by this module 7 3 4 edition Each text is edited by taking care of spaces and punctions marks between words and is paginated by blocs of n words 7 4 XML w module 7 4 1 input That module imports the XML files found in the source directory The lt text gt tag is reserved for this m
18. municipale Palais des arts 77 In that text each word is tagged by a morphosyntactic tag of the CATTEX2009 tagset for old French http bfm ens lyon fr article php3 id article 176 The importation of the corpus into thwe TXM platform encoded the following objects Structural units p paragraph q direct speech s sentence O ee p and s units have a n property encoding their number e each lexical unit has the following properties O O O O word the graphical form pos the morphosyntactic tag col the column number line the line number 33 TXM Reference Manual 0 5 4 Using TXM commands 4 1 Describe corpus For the selected corpus that command displays a complete diagnostic of all the structural elements and their properties and of all the lexical units and their properties number of words the total number of lexical units of the corpus number of word properties the number of available annotations for each word o for each annotation type the name of the annotation and the total number of different values for this annotation and some values number of structural units the number of different structural units of the corpus o for each structural unit type the name of the structure and the list of its attributes with their different values foreach structural unit attribute the first elements of the list of values Illustration 13 shows an example of corpus informati
19. of that edition presents all the metadata of the text In that edition one can navigate tothe next gt or previous lt page tothe end gt or the beginning of the edition tothe next gt gt or previous lt lt text edition in the corpus order A double click on a line of concordance see below opens or navigates directly to the page of an edition while highlighting in red the selected keywords and in light red the other keywords of the concordance if they occur in the same page Illustration 14 presents the first page of the edition of the first text of the DISCOURS corpus e in that example the metadata are id file loc type date o loc speaker name o type type of speech o date e each word has a flyover displaying its properties pos func lemma o in that example the mouse being over the word quilibre the flyover displays pos Ncms common noun masculine singular Multext tagset func none lemma quilibre 5 01 DeGaulle xml e id 01 DeGaulle xml 3 e file 09 DeGaulle e loc de Gaulle e type Allocution radiot l vis e A e date 05 02 1962 Les affaires de la France sont difficiles mais hier elles semblaient insolubles Aujourd hui non N est ce pas un progres Pour l imm diat il en est 3 qui dominent notre situation l Alg rie l quilibre financier et conomique la r forme de
20. the event property of the following word For events surrounding several words the event descriptions are concatenated in the event property of the words transcribed between the begin and the end Transcriber events Some metadata are copied at the word level spk and others at some structural levels u spkattrs textAttr lt metadata gt div topic endtime starttimeltype sp sp eaker endtime starttime overlap event typeGdesc to help sub corpus building 7 5 3 annotation Morphosyntactic description and lemma properties are added to each word by the TreeTagger software 7 5 4 edition The edition reproduces to the HTML edition of Transcriber The table of metadata values is edited at the beginning of each transcription Each transcription is paginated every n words after a speech turn Events and comments are enclosed in parentheses Synchronization information is edited between brackets 7 6 Hyperbase module 7 6 1 input That module imports files encoded in the old Hyperbase format in the source directory That is with the following text delimiting line amp amp amp Long text name TextName ShortTextName amp amp amp Page break lines encoded by are interpreted They are encoded as p structures 35 div sp and u elements are loosely adapted from the TEI standard Mind that TreeTagger linguistic models are built from written text corpora tagging results on orthographic transcriptions must be
21. them to analyze a corpus The way to extend the platform with the scripting environment is then introduced The document ends with reference appendix a glossary of notions and an index TO BE DONE 1 3 Related Readings The official Textom trie project web site publishes all the documentation related to the TXM platform http textometrie ens lyon fr spip php article98 lang en screencast tutorials textometry methodology fundamental documents textual encoding related documents search engine and statistical engine related documents and reference documents for the scripting engine It is also the reference site for all scientific publications related to the project http textometrie ens lyon fr spip php article82 lang en The TXM Wikis are the best place to share knowledge about the platform usage with other users and with developers EN The international English language wiki is at http textometrie sourceforge net please subscribe to Sourceforge and ask for permission to be able to edit the wiki http sourceforge net account registration TXM Reference Manual 0 5 FR The French language wiki is at https listes cru fr wiki txm users en startup please subscribe to the txm users mailing list to be able to edit the wiki The French language wiki currently has the following structure bug reports on the RCP version from mails and meetings bug reports on the GWT version n
22. working sessions with TXM until the corpus is deleted you don t have to import the corpus again for each working session The corpus is added to the Corpus view The next section will introduce you to all the available loaders in this release 7 Please read the sample corpora section for their full description 2554 TXM Reference Manual 0 5 e Export To transmit a corpus already imported into the platform to another TXM installation say on another machine you can copy the directory corresponding to the binary corpus built by TXM That directory is located at SHOME TXM corpora lt name of the corpus gt As a byproduct of the import process several intermediary source files encoded in the TEI TXM format have been produced in the SHOME TXM corpora lt name of the corpus gt txm directory Those files can be used as an XML interchange format with other tools e Load If you have copied a corpus directory from another TXM installation you can load it directly into TXM with the File Load command That command is faster than the Import command You only need to call it once for a TXM installation 3 4 3 Simple Import Commands 3 4 3 1 Raw Text Loaders The From Clipboard and Directory entries of the File Import menu import simple raw texts without any XML tags in them Each word is tokenized and annotated with a part of speech property and a lemma property From cli
23. 14 24 182 355 75 116 122 72 46 62 68 393 Npfs 1785 13 3 36 58 12 1 14 4 1 6 20 73 Af 11728 35 E 43 107 14 431 36 21 12 10 22 100 Np 1385 6 0 5 35 3 3 4 3 2 1 2 28 Dd 183 2 2 6 22 1 7 1 0 0 0 1 25 1 Nomp 2289 44 G 45 162 17 2 49 26 18 29 175 i Pp 1520 2 4 28 38 7 1 14 12 2 20 12 48 o Vm 300 1 7 19 17 5 1 9 9 1 il 9 32 i Pp 1651 3 3 67 86 31 32 1 6 8 1 2 46 o Vm 1423 1 3 44 47 21 25 1 6 5 1 2 32 Cs 14237 36 4 91 186 301 54 44 21 11 28 29 188 y va 14 0 0 0 0 01 0 0 0 o 0 0 0 1 Ym 11405 25 4 44 117 121 20 29 16 5 13 15 99 x Word property Part of the date partition Illustration 40 Lexical table on the Date partition of the DISCOURS corpus In the above illustration the lexical table is created from the Date partition One can define the total number of lines and a minimal frequency threshold The Keep button applies the parameters to the current table merge or delete columns clicking on the merge or delete columns button opens a values selector see illustration 41 261s TXM Reference Manual 0 5 a S lecteur de valeurs 29 12 1961 31 12 1970 24 06 1971 11 04 1961 09 02 1967 10 18 1969 16 04 1963 31 12 1964 31 12 1965 24 05 1968 31 12 1966 05 09 1960 04 02 1965 30 01 1959 09 09 1968 21 02 1966 28 12 1958 10 04 1969 01 12 1969 15 05 1962 10 11 1959 05 02 1962 12 03 1970 w 9 fusionner supprimer
24. 9 31 01 1964 31 12 1964 31 12 1965 31 12 1966 31 12 1970 Title 195 Title 196 10 11 1959 01 12 1969 12 03 1970 27 06 1958 04 02 1965 24 06 1971 28 12 1958 05 02 1962 05 09 1960 09 02 1967 09 09 1968 10 04 1969 10 18 1969 11 03 1969 11 04 1961 14 06 1960 15 05 1962 16 04 1963 21 02 1966 24 05 1968 27 11 1967 Illustration 20 Building a partition on the DISCOURS corpus with the text date values 41 TXM Reference Manual 0 5 4 4 3 Advanced partition building Illustration 21 presents the partition builder advanced tab form In that form one has to enter the name of the new corpus the name displayed in the corpus view write a COP query which selects all the lexical units composing each part o use the button to add a new part o use the button to suppress a part The new partition will be composed of all the parts defined each one containing the lexical units returned by their respective query It is the responsibility of the user that all the parts sum to the whole corpus au Create Partition Name PresidentsIn1970 Simple Assisted Advanced Part 1 fregion text a a text_loc Pompi w Part 2 fregion text a a text_loc de Ga Y Illustration 21 Build a partition on every president for the year 1970 4 5 Build Concordanc
25. A et W J Savage Groovy programming an introduction for Java developers Morgan Kaufmann Publishers 2007 Benz cri Jean Paul et al L analyse des correspondances Paris Dunod 1973 K nig Dierk Andrew Glover Paul King Guillaume Laforge et al Groovy in action Greenwich Manning 2007 Lafon P Sur la variabilit de la fr quence des formes dans un corpus Mots no 1 1980 127 165 Venkat Subramaniam Programming Groovy dynamic productivity for the Java developer Pragmatic Bookshelf Raleigh Daniel H Steinberg ed 2008 88 TXM Reference Manual 0 5 11 Index Illustrations Index Illustration 1 The general interface of EXIVL a seit order tette tre at 13 Ilusteatiom The Objects ute oue RAS A dator ee to a n b t RUE 14 Hlustration 3 Ehe Corpus Vii id dai 14 Hlustration 4 The File Mi A AA RAE A 16 M straton sS e Lon a ci lo e 17 Ht stratiom 6 Ehe Pie me B ou teneo IE E a cta 17 Illustration 7 The Corpus menu with on the left the corpus commands and on the right the partitions commands ose ttoos aei er po e eere Dra Rp ER a pun FU Ski re ETAT anos 18 Illustration 8 The Tools menu for the corpus and the partition objects 18 Illustration 9 The Corpus Contextual Menta sica a 19 Hlustration 10 The results erede t ertet b eo erbe ERR rb RUDI lee der rino tee 22 Illustration 11 The Messages ZOO ouem cd aci oce etr thea ex ues arua Den A LER NR ERREUR Urn EPOR ERG 23 Illustration 12
26. AO men nn de me RS ED 16 17 26 65 66 OTME A da teat dads 11 28 32 A RE ROR REE a ERA e tah e astu a aAA dt 6 11 14 21 24 25 26 27 28 29 62 63 Indes tea dudas A er Ac ics 11 15 21 26 28 46 47 49 50 57 62 IO A nan Meuse Ne Mini end 11 32 40 42 43 Sub PVT 25 277 30 46 50 55 ESTA T Tt 27 Lxexical pate se cati rod percer a reb Gu Pneu S e esha sor p ect idee di Aue sa dh 11 Lexical table a ii eet ede ecce sei st 11 15 21 55 57 58 IESU dais 15 21 46 53 55 ICoadeis ters tet ete tet Brat as ehe e ceo tede 21 24 25 26 27 28 Malta M Ne td NE ASIE 11 41 42 51 63 64 ME A ne edet ee ded re tee 26 27 32 Us OS A A IE M RS RY SARA DT M ONE A C 26 29 60 71 Occurtence oerte en eden end 41 42 43 45 64 Partial scene rte a Mos see ortis 11 15 18 21 33 36 37 39 47 50 51 55 57 58 PATES er rmt usta ONE riri fancti RR 21 41 42 43 54 POSTES toas iota testae tert EN A ue ter 21 54 Property 11 15 21 25 26 27 28 29 30 31 32 34 36 37 40 41 42 43 44 45 46 48 50 51 52 53 54 55 56 57 60 63 E A 7 21 36 39 40 41 42 43 44 45 46 47 49 50 54 62 63 64 71 AA E E CREE ETT 21 24 25 26 TXM Reference Manual 0 5 1e e 1 0 A EA erste enc iot Seats ns cepa datas Rios M Spa DELI L C can RUE 13 16 17 A catene ade eho PERT AEN PD nT A oae re I OA O RCA SO DWAIe Los tA II E M DE AU ML ar Re SEE E MN 6 7 8 11 SPC ADU NN EE IA E 11 15 21 37 50
27. CNR CSV reads a source corpus in the following format each document unit is in a single file the format of the document unit file is CNR that format is the output format of the commercial software Cordial which is a French tagger and lemmatizer That format is like the CSV format column separator being the tabulation character all the metadata are stored in a single Excel table the import process uses a CSV export format of that table Each metadata is defined in one column All the metadata of a single document unit are on the same line the only structural unit recognized and encoded is the sentence level which comes from the Cordial tagger output lexical units have the properties encoded in the CNR output columns word form word lemma lem and part of speech pos That loader can be applied to the sources of the sample DISCOURS corpus found in the distribution This function is not available in this version 220 TXM Reference Manual 0 5 The results of the loader are a new root corpus is added to the Corpus view giving access to any TXM commands on it two different HTML editions are produced for each text one paginated every 200 words and one in a single file In those editions each word has a flyover displaying its properties search engine indexes have been compiled In the next section you will find a synthetic description of the loaders and the recommended inform
28. Q Ctrl F10 Ctrl N Ctrl S Ctrl W Ctrl F4 Ctrl Shift W Ctrl P Alt Enter FS Alt 8 2 Graphics Output Pan Zoom in amp out Zoom to selection Rotate Reset the view Shift Left Mouse drag Shift Right Mouse drag Ctrl Left Mouse drag Ctrl Right Mouse drag FS 80 TXM Reference Manual 0 5 8 3 Windows Editor Windows Next Editor Previous Editor Quick Switch Editor Switch to Editor Show System Menu View Maximize Active View or Editor Next View Previous View Show View Menu Show Key Assist Show View Show View View Console 81 Ctrl F6 Ctrl Shift F6 Ctrl E Ctrl Shift E Alt Ctrl M Ctrl F7 Ctrl Shift F7 Ctrl F10 Ctrl Shift L Alt Shift Q Q Alt Shift Q C TXM Reference Manual 0 5 9 TXM Glossary Categories com Command mod Data Model fmt File Format int Interface nlp Natural Language Processing exp Search Query expression Soft Software met Textometry Methodology Entry Cat Description AFR nlp the standard code for the old French language Alceste soft a commercial software of textometry annotation mod a unit property lexical or structural from a logical point of view CATTEX2009 nlp a morphosyntactic tagset for the old French language character mod the elementary component of word forms clipboard mod a component of the operating system where a selection of text can be sto
29. Reference Manual 0 5 5 The Search Engine syntax 5 1 Quick introduction All the queries you write in the Query fields of the Concordance and Index commands to express their focus are given to the internal TXM search engine for resolution Those queries must obey the CQP language syntax and semantics Here is an elementary introduction to it to search for a simple word just cite it literally la a wrapper will finalize the query to 1a which is the right query for you cco to make the search not case sensitive add the c modifier la c modifiers are always written outside double quotes to make the search not diacritic sensitive add the Sd modifier la amp d you can combine the c and qd modifiers together in cd to search for a compound word put it in double quotes parce que the fact that word tokens contains blanks depends on the tokenizer used to import the corpus into TXM See bellow to look for all the words containing blanks sej to search for a word beginning by I write Jk means any character means possibly repeat the last expression which is any character here gt The result 1s thus any sequence of characters including none Those special meaning characters are called operators or jokers They can appear anywhere in a query but with a specific syntax If you want to express a particular operator
30. TXM R Ibin R dll Click Abort to stop the installation Retry to try again or Ignore to skip this file Abandonner This means that an Rserve process is still running on the machine and that the install process cannot modify its binary file You must then 1 quit TXM or kill the Rserve exe process running from the Process Explorer and 2 click on Recommencer Restart to resume the install process 10 TXM Reference Manual 0 5 e In the next dialog box Name Setup Completed COCO Extract cqpserver init 100 A Output Folder C Program Files TXM 3 Extract SetRegistry class 100 Extract commons cli 1 2 jar 100 Execute java cp C Program Files TXM C Program Files TXMicommons cli 1 2 jar Output Folder C Program Files TXM Execute C Program Files TXM txmitxm exe noSplash setPrefAndExit cgi server Created uninstaller C Program Files TXM uninstall exe Create Folder C Documents and Settings sheiden Menu D marrer Programmes TXM Create shortcut C Documents and Settings sheiden Menu D marrer Programmes T Create shortcut C Documents and Settings sheiden Menu D marrer Programmes T Completed Click on Close f Installation is now completed 2 3 Linux 2 3 1 Rapid installation l Ze Download the file txm_0 5 deb at the address https sourceforge net projects textometrie files software 0 5 Launch the txm_0 5 deb file to start th
31. TXM Reference Manual version 0 5 Copyright O ANR Textom trie http textometrie ens lyon fr lang en This creation is distributed under a BY NC SA Creative Commons license Document Revision Table 13 03 10 Serge Heiden SH Creation 02 07 10 Matthieu Decorde Update for release 0 4 7 15 29 07 10 SH Rewrite for 0 4 7 27 08 10 SH Section titles numbering reorganized plan 08 10 10 Lauranne Bertrand Update for release 0 5 19 01 11 SH Corrections 11 03 11 SH New section on import modules TXM Reference Manual 0 5 Edition n 626 Content 92 pp 18578 occ 78 ill 9 tab Edition time 03 11 11 09 41 46 PM TXM Reference Manual 0 5 Table of Contents EE A A A eee 7 id Who Should Use This DCI a A A A AAA AAA 7 Ls How This IES nd li i a 7 1 A A dd 7 1 4 Accessing TAM Documentation Dn Ime aiu e aca ttti tinae trt entras eek d ria 8 1 3 TP d RES LIS co PRINTS 8 UM MSIE Te Meca 9 AM du cul c LL aaNet renee mr 9 CAUCUS IN me vn mE 9 CARE s emo A H oE X m 11 DA T o Bou dris NN 11 RAE TE SE nn A A A A 11 de 12 LAN o0 adio tec 12 SIJ A cc L fca 12 A A A ce AA 13 3 4 Using Windows Menus Toolbars and Shortcut Keys 14 3 2 1 General upload A 14 Whe File view and text si 17 32 12 Commande cd dor ii Sisi 18 LO The Win
32. TXM actions to execute search query com the expression by characters of a pattern of word sequences combined with a pattern of word properties selection met a list of sequences of words The search engine returns a selection sentence nlp an orthographicaly delimited sequence of words generally computed by tokenizers source mod the original representation of a corpus in a specific format possibly in several files and directories For RIE TXM Reference Manual 0 5 exemple the format can be TXT raw text XML or TEI specificity com the action of listing the most specific word forms or other word properties for each part of a partition according to the specificity quantitative model status line gui TXM displays temporary comments on operations in a line at the bottom left of the interface structural unit mod an element of the logical structure of a text In TXM all structural units are organized hierarchically every unit is imbricated in an upper unit until the text unit The lower and smaller structural units are above the lexical units T met the total number of occurrences in a corpus tag mod the representation of element limits and their properties in the XML format tag nlp the morphosyntactic property of words tagger soft an independent software component able to tokenize grammatically tag and possibly lemma
33. al dialog box 48 Illustration 30 Index word properties iii nana 48 Illustration 31 Index of the combination of the form then pos word properties for all the occurrences of the pouvoir lemma in the DISCOURS corpus 49 Illustration 32 Specificity for a partition dialog box 50 Illustration 33 Specificity of j word forms in the discourse type partition of the DISCOURS COUS iy erred tuer batte none suas ax pte tin eese nent USUS RU CHA C quU ana 5 TXM Reference Manual 0 5 Illustration 34 Specificity graphic of the je jeune word forms between discourse genres in the DISCOURS COPUS a A ts 52 Illustration 35 Specificity scores of the word forms of the Allocution radiot l vis e discourse genre in the DISCOURS Corpus cccsccssscescsceescssscssssesccseccestcessacassnsecccseseceeses 53 Illustration 36 Progression processing parameters for the France and Alg rie words in the DISCOURS Coplas aio 54 Illustration 37 Progression graphic on the France and Alg rie words in the DISCOURS CODDUS ode NOn ON OS 55 Illustration 38 Graphics obtains from a lexical table with the Date property on the DISCOURS COTDUS T a Mr unten tente lee sais 56 Illustration 39 Lexical table property selection 57 Illustration 40 Lexical table on the Date partition of the DISCOURS corpus 58 Illustration 41 Columns selection NINO dey eco ti ha ed Pre eR SE RR 59 Illustration 42 TXM settings wWi
34. ation or double click on the property in the left panel lt remove the property from the combination or double click on the property in the right panel display that property before in the combination the top property in the right panel will be displayed first in the combination 66099 v display that property after in the combination 8 In the example below the word property name stands for the graphical form of words lu TXM Reference Manual 0 5 4 7 2 2 Queries You can write any CQP expression like in the concordance dialog box or use the Query Assistant uns a Query lemma pouvoir V Properties word pos Se Thresholds Fmin BJ Fmax Ymax Page size 1 34 j 34 t523 v 34 fmin 1 fmax 157 word pos peut Vmip3s puisse Vmsr3s pouvoir Mcms pourrait Vmcc3s pu vmpasm peuvent Vwmip3p pouvoirs Mcmp pouvait Vmii3s pouvoir V mn puissent Vmsr3p pourraient Vmcc3p Pouvoirs Mcmp puis Vmipis pourra Vmif3s pourront Vmif3p pouvons Vmiplp peux Vmipis pouvaient Vmii3p Illustration 31 Index of the combination of the form then pos word properties for all the occurrences of the pouvoir lemma in the DISCOURS corpus 4 7 2 3 Thresholds You can limit the number of results with Fmin the minimum frequency necessary to be included in the list Fmax the maximum frequency allowed to be included in the list Vmax the maximum number of res
35. ation to write in the dialog box 3 4 6 Other Loaders The TXM platform can already import several other corpus formats through different loaders Docum Main Metadat Struct Lexical Recommen ent Unit Format a ded information CNR CSV In i System metadata i encoding csv file Hyperbase Several Hyperba None i System texts i encoding ina single file Alceste Alceste Analytic i System i encoding Transcriber Transcri In CSV ber metadata XML csv file TRS 15 Metadata is associated to one transcription and to only one of its speakers s TXM Reference Manual 0 5 Docum Main Recommen ent Unit Format a i ded i information XML TXM Single XML TXM encoded inside the source TXT CSV Single TXT raw text metadata csv file XML w XML should already be encoded inside the source 241 TXM Reference Manual 0 5 3 4 7 Saving 8 Exporting results Each result of a TXM command lists tables graphics can be exported in a file That file is at least in the CSV format for tables and in the SVG format for graphics The export command can be accessed in the contextual menu of the result icon in the Corpus view or through the Export g button in the toolbar when the result object is selected 3 4 8 Sample corpora The TXM platform is released with several sample corpora encoded in representative formats that the platform can process They are releas
36. ble click on a directory expands its content A double click on a file icon opens it in a new text editor window The same result is obtained through the Open File command in the File menu Editing a text The default path of that view is the user s TXM home directory that is HOME TXM ds TXM Reference Manual 0 5 In a text editor the text can be modified saved etc by select copy paste search amp replace save etc usual commands Please see the section 6 Text Editor Shortcuts for the list of available editing commands If the text is a Groovy script it can be executed directly with the script command in the context menu right click on the text You c Execute a Groovy an also execute only a selection of the text with the Execute the selection as a Groovy script command in the context menu See the 5 section Scripting the TXM platform for more information on the scripting environment embedded in TXM If the text is an R script it can be executed directly with the Execute an R script command in the context menu You can also execute only a selection of the text with the Execute the selection as an R script command in the context menu 3 2 1 2 Commands In TXM main commands are expressed through three different but equivalent ways 1 when an object icon is selected in the objects zone the user can execute a command on that object by clicking on the corresponding command button
37. ch for the je word followed by a verb in the DISCOURS corpus you can search for the query je pos V 44e TXM Reference Manual 0 5 That query can be decomposed as cc 2 the je part expresses the need for the je word to be there in the result the pos V part expresses the need for a verb to be on the right of je next to the right O the brackets express the occurrence of just one lexical unit to be the next on the right of je the pos V part expresses the constraint for that occurrence to have its pos property to match the v regular expression In the DISCOURS corpus which has been tagged by the Cordial tagger in the Multext tagset this matches the pos property of all verbs in that corpus all the verbs have their pos property starting with V An assistant is available to write queries Click on the Query Assistant icon and the following window will pop up w Query Assistant I m looking for 2 words Word n 1 with its property word v equals to Mi je Followed by v Word n 2 with its property pos v starts with Y Add a word Illustration 23 Building a query for the word je followed by a verb The button add a word adds a new word pattern to the query The first menu selects a word property The second menu defines the size of the search field The last fiel
38. checked 74 TXM Reference Manual 0 5 7 6 2 annotation Morphosyntactic description and lemma properties are added to each word by the TreeTagger software 7 6 3 edition Each text is edited by taking care of spaces and punctions marks between words and is paginated by blocs of n words 7 7 Alceste module 7 7 1 input That module imports text encoded in the Alceste software format Which is nearly raw text with some escape characters There are two ways to delimit a text 1 a line ofthe form 0001 amp Attr1 Vall amp Attr2 Val2 amp AttrN ValN 2 aline ofthe form amp Attrl Vall amp Attr2 Val2 amp AttrN ValN To encode a coumpound word one can replace the spaces between words by a character For example 1 assembl e nationale can be segmented into two words 1 and assembl e nationale The Alceste format allows also one to encode speech turns but that module doesn t manage that encoding 7 7 2 output As output a text structure text encloses words segmented by separator characters 7 7 3 annotation Morphosyntactic description and lemma properties are added to each word by the TreeTagger software 7 7 4 edition Each text is edited by taking care of spaces and punctions marks between words and is paginated by blocs of n words 7 8 CNR CSV module 7 8 1 input Text body 715 TXM Reference Manual 0 5 That module imports files encoded in the CNR format from the sourc
39. concordances of the matches From any line in a concordance you can get to the edition page containing the corresponding keyword it computes cooccurrents around complex lexical pattern it computes the specificity model of occurring words or tags inside a partition or a sub corpus it computes the factorial correspondence analysis of word properties inside a partition The software is composed of four components a full text search engine a Statistics engine an import environment a scripting engine This manual will introduce you to each component through the various commands available in the platform 3 1 Starting TXM 3 1 1 On Windows l 2 In the menu Start All Programs TXM select TXM For the first start depending on the level of security of your Windows operating system you may have to answer some security alerts in the following way 12 TXM Reference Manual 0 5 a In the following dialog box Alerte de s curit Windows E Pour vous aider prot ger votre ordinateur le Pare feu Windows a bloqu certaines fonctionnalit s de ce programme Voulez vous continuer bloquer ce programme Nom cqpserver diteur Inconnu Maintenir le blocage D bloquer Maintenir le blocage et me redemander ult rieurement Pour plus de s curit le Pare feu Windows bloque actuellement l acceptation des connexions Internet ou r seau pour ce programme Si vous faites confiance ce
40. context Keyword and Right Context by clicking on their head line You can change the sort order by clicking a second time Default sort is on the word forms but this can be changed in the contextual menu You can also do more complex sort like sort on the right context then on the keyword Select Multiple sort in the contextual menu to see the available sorts 4 5 5 Word properties displayed You can choose which word properties and in what order will be displayed in each column There are two ways to do it the current properties displayed for the keyword column are set under the query field Press the Edit button to change the properties inthe contextual menu of the concordance select the entry Select view Properties 4 5 6 References displayed You can choose which informations will be displayed in the Reference column on the left side of each concordance line Selecting the Define references pattern entry in the contextual menu right click on the concordance opens the dialog box of illustration 25 15 This should be completely redesigned in the next release AG TXM Reference Manual 0 5 text date text loc text file text type text base text project Illustration 25 Reference Pattern Dialog Box The left panel lists all the properties of structural units and of lexical units For example text loc if the property loc of the structu
41. d allows to tip letters or word The menu between two words allows to express if the words are consecutive or not If you validate with Ok the query will appear in the query field The query is searched for by a click on the Search button 44 TXM Reference Manual 0 5 Before drawing the concordance results the console and the status line will notify you with the total number of matches The result is presented in illustration 24 there are 206 matches itis the second page of the concordance which is displayed from occurrences 22 to 41 the keyword column is composed of two consecutive words because of the query asking for the word je followed by a verb the concordance is sorted alphabetically on the keyword column the localization reference has been chosen to be the name of the speaker of the discourse in which the words occur e the contextual menu was opened by a right click on the concordance o Define references pattern to choose informations displayed in the Reference column o SetSort Property to choose on which word property the sort will be done Multiple Sort to select a sort on several keys o Setcontexts size to choose the number of words displayed in the contexts o Select View properties to choose which word properties will be displayed in each column This is the second page Occurrences 1 21 on 206 matches Query word je pos v Y come wrt E Y 4 None Y
42. dow Manager ERREUR 25 GRR ERE He A aa 26 E Working Wib E PO AA EA AAA A ne 26 A A A e o nue etree 26 34 2 The complete story Import Export Load corpora 26 it AMA 27 CENE CI ME CE o mtt A 24a CO AME Londen ee rerree nar 27 244 The Advanced Import PEER ONE iS 28 3 4 5 Example of loader the CNR CSV DDIDOPGR is ls 29 3 40 Other Loaders sui ip p bcp ada gba a a lg a a ad o d p add ca 30 34 7 Saving amp Exporting results eet A EXE RD c rci aa 32 TXM Reference Manual 0 5 TXM Reference Manual 0 5 EC II A iS E E RR re 72 AS n INCREMENT a de 75 ER A A A lulu 75 pAGCN NUR TA 75 EOM NCC A A AAA TT ph RE Rm Fi TXM Reference Manual 0 5 TXM Reference Manual 0 5 1 Preface 1 1 Who Should Use This Document If you want to use the TXM platform this document will introduce you step by step to the different concepts of the software and to the different tools available to analyze various textual corpora If you want to adapt the TXM platform to specific corpora this document will also introduce you to the scripting environment available to customize the import system 1 2 How This Document Is Organized This document first describes how to install the software on various platforms and how to start it Then it describes how the user interface is organized and how to import a new corpus into the platform The next section describes the available tools and how to use
43. dule imports raw text files found in the source directory extension txt The Ib property is added to each word to encode the line number 76 TXM Reference Manual 0 5 Text metadata Text metadata are imported from a file encoded in the CSV format called metadata csv and found in the same directory as the sources The column separator is the comma the field character is the double quote The first header line names each metadata column The first column must be named id the following ones can be named freely but without using any accented or special characters The first column must contain the name of the source file without the extension corresponding to the metadata of the line 7 9 2 output As output each textual unit text is built with properties imported from the metadata file and encloses words segmented by separator characters 7 9 3 annotation Morphosyntactic description and lemma properties are added to each word by the TreeTagger software 7 9 4 edition The text is edited by taking care of spaces and punctions marks between words and is paginated by blocs of n words The table of metadata values is edited at the beginning of the first page 77 TXM Reference Manual 0 5 8 Keyboard Shortcuts 8 1 Text Editor Command Help Show Key Assist Selection Select All Select Line Start Select Line End Select Next Word Select Previous Word Edit Copy Pa
44. e That command builds a kwic concordance of the search results of a specific CQP query expression on the selected corpus or sub corpus S The actual queries are region text a a text_loc Pompidou amp a text_date 1970 region text a a text_loc de Gaulle amp a text_date 1970 sd TXM Reference Manual 0 5 The initial search form is composed of e the CQP query input field abutton to access the history of queries a button to access the lexical unit properties editor to select which properties will be displayed in the keyword column e the search button CQP Query Field Lexical units properties to display CQP Query Assistant Query History Hl concordare x LI Jr IU sort keys 1 None w 2lnone 3 None ve vi E PD Reference Left context Keyword Right context Sort keys Search launch button Illustration 22 Concordance Initial Search Form 4 5 1 Queries The search engine allows you to express your queries in the CQP query language see below section 5 The Search Engine syntax TXM defines a simplified syntax over the standard CQP queries to ease the writing of simple queries For example to just search for the je word T in French you only need to write je that is the two letters j followed by e in the Query field For more elaborated queries you have to conform to the CQP syntax For example to sear
45. e directory The CNR format is produced by the Cordial software and corresponds to a TSV format with the tabulation character as column separator and no field character The CNR columns define respectively e para the paragraph number e sent the sentence number form the graphical form of a lexical unit lem the lemma e pos the part of speech or morphosyntactic description e func the syntactic function Text metadata Text metadata are imported from a file encoded in the CSV format called metadata csv and found in the same directory as the sources The column separator is the comma the field character is the double quote The first header line names each metadata column The first column must be named id the following ones can be named freely but without using any accented or special characters The first column must contain the name of the source file without the extension corresponding to the metadata of the line 7 8 2 output As output texts are structured by text text paragraphs p and sentences s Word properties are directly imported from the CNR column values 7 8 3 annotation No annotation is added by this module 7 8 4 edition Each text is edited by taking care of spaces and punctions marks between words and is paginated by blocs of n words The table of metadata values is edited at the beginning of the first page 7 9 TXT CSV module 7 9 1 input Text body That mo
46. e installation process with the gdebi package manager Launch TXM through the Applications Sciences TXM menu item of your system menu 2 3 2 Classic installation 4 CA Download the file txm 0 5 linux tar gz at the address https sourceforge net projects textometrie files software 0 5 Click on txm 0 5beta linux tar gz Extract the content of the archive in a directory you can use the command line tar xvf txm 0 5beta linux tar gz Go to that directory Run bash install sh path to the directory where TXM is installed the INSTALL file contains more informations on the Linux install process Run TXM with the command TxM amp or with the ALT F2 shortcut followed by TXM 11 TXM Reference Manual 0 5 3 Getting to Know TXM The current TXM platform prototype helps you to build and analyze tagged and structured corpora it helps you to import your textual resources to build a corpus from various format or directly from any text copied in the clipboard it builds subcorpora from various specifications of textual units properties it builds partitions from specification of properties it builds an HTML edition for each textual unit of a corpus it computes the whole vocabulary of a corpus or lists various combinations of word property values it builds lexical tables from partitions or index it searches complex lexical patterns based on lexical units properties and builds kwic
47. e vector representing the point and the axis A cos close to 1 indicates a well represented point on the axis a cos close to 0 indicates that the projection of the point on the axis is highly distorted the point coordinate on that axis should not be considered to compare the point position with other points A point with a small cos on both axis for a specific plane has a misleading position in the representation its apparent proximity to other points should not be interpreted in this plane Point coordinates 4 11 Lexical table The lexical table displays the frequency of the lexical units of a partition This table can be created from a partition or from an index of a partition Once the partition selected choose a word property for the table to create like in illustration 39 w Calcul de la table avec la propri t v Illustration 39 Lexical table property selection 260 TXM Reference Manual 0 5 Here is the description of the table content one entry by line one part by column This table total can be edited lines and columns can be merged or deleted It is also possible to filter the number of lines or to choose lines to keep by a minimal frequency threshold The CA or Specificities command create automatically a lexical table Information on results total frequency number of lines Number of lines and minimal frequency edition EF Date pbs eS o p T 108791 V 29i Fn
48. earch for three adjacent words write le TOUT o etc to search for a word which is a verb that is whose part of speech property called pos in the sample corpora value is beginning with V write pos V 1 this is true for the sample corpora of TXM Values of properties of words depend on the annotations that have been performed on the corpus in the import process into TXM Morphosyntactic taggers produce different tagsets so you have to read their documentation to craft the right query for a specific tagset 2 Please note that the in that query are not the same and don t have the same meaning as the previous ones The previous ones where implicitly enclosed in double quotes Here means the expression inside the square brackets concerns exactly and only one word gt 664 TXM Reference Manual 0 5 to search for a verb at the imperfect tense write pos V 1 only true for the Multext tagset of the sample corpora to search for a verb followed by a noun write pos V pos N to search for the word je I followed by a verb write je pos V in fact this query is equivalent to word je pos V to search for the word je followed by a verb with one word in between write je pos V here the expression means lt a word without any constraint that is any word gt to search for the word
49. ed under a BY NC SA Creative Commons license 3 4 8 1 DISCOURS corpus The DISCOURS corpus has been released by Damon Mayaffre from the BCL CNRS laboratory in Nice France It is composed of 29 discourses produced by two French presidents Pompidou 5 discourses and de Gaulle 24 between 1958 and 1971 of types Allocution radiot l vis e speech on tv 14 Entretien radiot l vis speech on radio 3 Conf rence de presse press interview 11 Each discourse has been tagged with the Cordial tagger with the usual Hyperbase software parameters The tagset is the Multext tagset described in the Weblex manual at http weblex ens Ish fr doc weblex cordialtagset html The importation of the corpus into the TXM platform encoded the following objects structural units discours s for sentence o each discours unit has the following properties encoded date loc the name of the president type o each lexical unit as the following propertiesword the graphical form pos the Cordial part of speech tag lem the Cordial lemma func the Cordial syntactic function code sent the sentence number Mo TXM Reference Manual 0 5 3 4 8 2 QUETE corpus The QUETE corpus has been released by Christiane Marchello Nizia and Alexei Lavrentiev It is based on their critical edition of the Queste del saint Graal from the Ms K manuscript Lyon Biblioth que
50. elected then copied from common desktop applications Firefox Thunderbird Writer etc the File Import Directory command allows you to analyze a set of raw texts found in a single directory the File Import XML Structure command allows you to analyze a set of XML encoded texts found in a single directory other entries from the File Import menu allows you to analyze corpora in various specialized formats like the Hyperbase format or the XML TEI P5 format The platform is released with two ready to use sample corpora DISCOURS a set of French Presidents speeches transcriptions QUETE an edition of the Ms K manuscript of the Holy Grail text written in the old French language The next section presents in detail all the available commands to import corpora in the TXM platform 3 4 2 The complete story Import Export Load corpora The TXM platform can work on corpora of various formats from simple raw text files to densely XML TEI P5 encoded ones Import To be able to work on a specific corpus it has to be imported into the TXM platform with one of the commands of the File Import menu Each command analyzes specific corpora sources to build all the necessary elements for TXM to work on it It can take some time depending on the size of the corpus and the complexity of the loader When that process has been done the corpus will be available instantaneously for all the next
51. entrepeend a avez les Pouwors pab ics et jo l esp re le concour t 22 p 7 ae H ls Francos ove ls R publique em ls France Mesdanss Messsers x me 8 cite de vous yor Nous sommes devat uns actusits qu on i xu sork ecans mak ly a des questions essentielles et ce sont cebes is que k wous propose dexaminer ensemble wos avez la parcie Mesdane y 1 on arn hr re RC v 18 1737 4 i gt k 1M O e 2 lo qe 1GIG i i Ca 152 LI o Bey 2 N i mi 1596 Po __J E H des 1225 n amp Feif c JQoounents and Settirgsitery Tah sanclesid corsi ITM Lima Dess 2 0 H i 1 bod zs H lar 1236 M Esmpsjeos E ai i mine eT ORDPCETAS ner un vota e ah wet it aca il va d 1 EP 1017 M Frangaises Fran ais an mois de jani vous vous pecniopcezez par un vole Au cas o votre r ponse serail aca il va de soc 1 794 i que Je r aserneras pas plas Jongtemos ma fonction s par un ow masse vous m expnmez votre conbance thar 757 iS 1 gt z 2 n i X En jertreprendrai avec les Pouvoirs publics et jt l esp re le concors de tous ceux qui seien sernir l int r t commun de D d s FIR EN i i we 712 ii fare changer partout o il le faut des struchees troites et p rim es et ouvre plas Segement La soute au sang nouveau de la IL 2 H France mus C94 H Es i 7 w m 4 Vive la R publique d bos 506 Bi 1 y ses HH Vive la France H S5 H d H s et H vi H ps D 5 T TT i
52. ew features asked for the RCP version that wiki also allows you to participate to the writing of the documentation or to translations If you want to add or modify core functionalities to the software that is to change the sources of the software you should also read the TXM Developers Guides referenced by the developer wiki https sourceforge net apps mediawiki textometrie with the Javadoc and the R module documentation 1 4 Accessing TXM Documentation On line This document and its translations are always available at the address http sourceforge net projects textometrie files documentation 1 5 Typographic Conventions In that documentation some specific items are distinguished by a different typography sample literal strings are rendered in Courier directory paths file names sample queries strings or links Arial rendering is reserved to section titles Arial rendering commands 2https listes cru fr sympa subscribe txm users 8 TXM Reference Manual 0 5 2 Installing TXM 2 1 Requirements This version of the software is compatible with Windows and Linux The following resources are recommended e 170 Mb of disk space for installation e 350 Mb of memory for execution 2 2 Windows 1 First download the file txm 0 5 win exe at the address https sourceforge net projects textometrie files software 0 5 2 Execute the file by double clicking on it a Depending on the secu
53. f textometry import mod the process of integrating into the platform a corpus from its source files index com the action of listing word property combinations with their frequency for the occurrences of a search query index soft file built by TXM to accelerate search query answers Java soft the main programming language used to program TXM keyword com the central column of a concordance that display all the occurrences of the search query aligned vertically language mod the main natural language in which a text or a corpus is written lem mod See lemma lemma mod the dictionary entry of a word lemmatizer soft a software component giving the dictionary entry to every word of a text lexicon com the action of listing all the possible word forms or other word properties in a corpus and their frequency literal exp a character taken as it is in a search query loader com a software component implementing a process to import a corpus into the platform from its source localization int the interface of TXM can be read in different languages determined by the localization preference match met an occurrence of a search query in a corpus metadata mod the properties of a whole text or document Each metadata has a name a type and a value modifier exp a special character used to express a different meaning of a 84 TXM Reference Manual 0 5 search query for example ignore caps
54. form with scripts 6 1 Running Groovy scripts and commands The ability to script the TXM platform gives the end user the opportunity to automatically call any TXM commands search a CQP expression with the search engine compute a statistical model score with the statistics engine export and save results in a file etc use different parameter values for those commands record and reproduce a set of commands for a regular analysis It is also a way for the end user to extend the platform with new commands Scripts are written in the Groovy scripting language http roovy codehaus org You will find a short introduction to the language at http onjava com pub a onjava 2004 09 29 groovy html At least three books will also introduce you to the language Groovy in action Groovy programming an introduction for Java developers Programming Groovy dynamic productivity for the Java developer The text of the scripts to execute can be stored in a file or simply selected and copied from an editor window see the Text Editor section The best way to start writing your own Groovy script is first to modify the sample scripts released with TXM in the C Documents and Settings lt your login name gt TXM scripts directory For example the conc groovy script which computes a concordance of the je word in the DISCOURS corpus and then exports and saves it in the conc txt text file
55. he graphic to be displayed inside the TXM user interface windows 69 TXM Reference Manual 0 5 7 Import modules The import modules available in the RCP version of TXM are stored in the scripts import subdirectory of the TXM home directory TXM Currently only the main launch script for each module is available to the user the files named xxxLoader groovy 7 1 Clipboard module 7 1 1 input That module imports the raw text copied in the system clipboard The Ib property is added to each word to encode the line number 7 1 2 output As output a unique text structure text encloses words segmented by separator characters 7 1 3 annotation Morphosyntactic description and lemma properties are added to each word by the TreeTagger software 7 1 4 edition The text is edited by taking care of spaces and punctions marks between words and is paginated by blocs of n words 7 2 XML TEI BFM module 7 2 1 input That module imports the files encoded in the XML TEI P5 BFM format of the source directory The input format is defined by the encoding manual of the Medieval French Base project Base de Frangais M di val BFM It is based on the XML TEI P5 format to encode the text body and metadata For further information please see The BFM XML TEI encoding manual http bfm ens lyon fr article php3 id article 158 in French Because of an unresolved bug see https listes
56. ice 2 0 0 l a MMMM MMMM MMMM e 2 SAA AO I EE questions sais moi parler r pondre Messieurs voudrais Vive Mesdames Fran aises dire mes parle Fe o od d A O1 O1 o o uo o o Ovx O 0 0 Taille du contexte Fr quence totale Score de sp cificit s Liste des cooccurrents Attraction des cooccurrents j Distance entre le pivot et le cooccurrent Illustration 26 Cooccurrents of the words beginning by j In this window one can Write a CQP query in the query field or use the Query Assistant Edit the cooccurrents word properties Modify frequency thresholds to cut the results list The cofrequency is the number of encounters of the cooccurrent and the COP query occurrences in the corpus Choose a context size if structure is selected the right and left contexts can be set Sort the search results by clicking on the columns head To launch the search for the cooccurrents click on compute 48 TXM Reference Manual 0 5 4 7 Lexicon and Index The list of types or any word properties can be processed by two commands Lexicon computes the frequency list of all the values of a specific word property Index computes the frequency list of all the combinations of values of a specific list of properties for the result set of a specific CQP search query expression 4 7 1 Lexicon The Lexicon Al command computes
57. ilt as a sub corpus for each value of the selected property of the selected structural unit type Parts can not be accessed individually they can only be accessed as a whole through the partition object and contrastive commands like Specificity or Factorial Correspondence Analysis The complete expression is region text a a text loc Pompidou amp a text date 1970 39 TXM Reference Manual 0 5 Create Partition Simple Assisted Advanced Structure text wv Property date Illustration 19 Simple partition building build a partition on every date of speech 4 4 2 Partition building Assistant Illustration 20 shows the assisted partition building window Here one can enter the partition name which will be displayed in the Corpus View select a structural unit and its properties select the values that will compose a part of the partition click on new part to create an other part o enter the part title in the field o click on Assign to assign the selected values to the part o click on Remove to remove one or several values o click on the cross to delete the part click on Rmv all the parts to delete with just one click all the parts click on OK to create the partition 40 TXM Reference Manual 0 5 Create Partition Decades Assisted Advanced Structure text w Property Select values to assign EEE 29 12 1961 30 01 195
58. in 1 Fmax 12647 i e y EE Table edition 2 Freq 29 12 1961 31 12 1970 24 06 1971 11 04 1961 09 02 1967 10 18 1969 16 04 1963 31 12 1964 24 05 1968 31 12 1965 31 12 1966 05 09 1960 041 Da 11928 33 11 40 153 20 251 44 18 15 8 14 155 Ndp 12061 35 5 48 154 211 48 45 17 14 51 20 135 A 1 Sp l2 245 s2 329 864 148 235 245 186 88 150 163 907 Das 12677 47 E 7 172 39 55 si 31 26 20 38 192 Nps 1732 10 5 9 71 71 5 E 6 3 5 8 59 1 vm 1938 17 2 26 78 91 12 20 5 4 12 9 102 Afp p 1252 5 2 7 17 2 si E 0 1 2 5 16 Ypw 18888 147 40 308 565 117 187 232 131 70 97 112 609 Cc 13609 75 12 111 247 651 80 93 45 30 38 46 279 Re 14104 62 10 129 275 LE 85 80 55 31 35 43 331 Pp 172 0 0 3 6 4 2i 1 0 0 0 0 9 o Wm 1189 1 0 5 14 4 a 2 0 o 1 0 15 i Yps 13405 56 17 113 256 39 55 80 44 28 28 47 232 Ron 1226 17 15 56 106 8 a 33 13 14 9 9 93 Rpn 983 18 2 35 103 9 17 27 8 3 9 1 80 1 vm 13117 67 7 124 209 3 77 75 43 27 18 54 228 Pd 1596 8 0 32 41 5 291 9 4 4 6 5 55 Da 1630 13 1 23 51 si 3 1 5 5 5 3 34 o Nom 11437 38 4 39 79 16 14 37 37 13 17 18 111 o Da 14956 88 15 156 303 Si 87 87 64 35 32 l 289 1 Noms 4917 82 16 142 320 56 105 89 74 34 40 65 245 Pp 11085 20 1 45 79 7 17 26 9 8 5 8 88 Pp 1453 4 1 24 34 1 91 4 4 3 4 5 44 1 Mc 1443 13 2 10 22 si 5 7 8 0 5 3 33 1 Pre 11165 22 5 36 96 141 12 25 12 6 12 1 90 i Ds 1349 9 5 4 25 1 5 18 11 7 12 11 6 Nds 15952 1
59. in the Toolbar Illustration 5 The Toolbar 2 when an object icon is selected in the explorer the user can execute a command on that object by selecting the corresponding command in the upper File Corpus or Tools Menus 18 TXM Reference Manual 0 5 a The File menu and its Export command Corpus Tools He Ef Export Import Load Open Browse Restart Change language Preferences Exit Illustration 6 The File menu 19 TXM Reference Manual 0 5 b The Corpus menu and its description and corpus manipulations commands Tools Help Tools Help Cf CO Open edition C OO open edition Description m X Delete j poe ES LexicalTable Create Sub Corpus II Create Partition Illustration 7 The Corpus menu with on the left the corpus commands and on the right the partitions commands The menu configuration changes with the type of the icon selected for the first menu a corpus is selected for the second one it is a partition c The Tools menu gives access to the textometric tools File Corpus Help File Corpus Help CO 0 X Al Lexicon OA X EM index Ind IF D Corpus L i did C Corpus L pul Specificities Quel Concordance Kc QUE S ES mp Progression a pis Fo P Settings 4 Cooccurrences Settings Illustration 8 The Tools menu fo
60. istical Engine parameters of the statistical engine R integrated into TXM User default settings for all TXM commands o Concordances number of lines per page context size o Cooccurrences minimal frequency maximum number of cooccurrents minimal score o CA show the individuals or the variables in graphics change columns format it uses the specifications of the Java class DecimalFormater For more information see http download oracle com javase 1 4 2 docs api java text D ecimalFormat html o Description the number of property values to display o Export encoding format of the results export o Language English or French language interface 264 TXM Reference Manual 0 5 4 13 Commands relationship COMMANDS FROM TO USED BY CA Partition Lexical Table Concordances Corpus Cooccurrences Cooccurrences Corpus Concordances Corpus Corpus Cooccurrences Concordances Corpus Description Index Lexicon Partition Progression Text Edition Description Corpus Index Corpus Concordances Lexical Table of a Partition Progression Partition Lexical Table Partition CA Partition index Specificities Lexicon Corpus Concordances Progression Partition Corpus CA Specificities Lexical Table Text Edition Progression Corpus Specificities Partition Lexical Table Sub corpus Corpus Corpus commands Specificities Text Edition Corpus Sub corpus Partition 64 TXM
61. l Etat Dans ces domaines nous revenons L2os Ncms func lemma guilibre Mais je crois maintenant qu avant la fin de l ann e nous aurons largement avanc dans la bonne voie L Alg rie les conditions de son avenir la France veut les fixer avec les Alg riens eux m mes Qu ils fassent donc entendre leurs voix Celle des fusils est st rile vi Hm Illustration 14 DISCOURS Edition 235 TXM Reference Manual 0 5 4 2 2 Partition The Text edition command for partitions allows to navigate into parts of the selected partitions in the explorer see illustration 15 The navigation system is similar to the system described above Select a part Select a part 01 12 1969 04 02 1965 05 02 1962 05 09 1960 09 02 1967 09 09 1968 10 04 1969 Illustration 15 Navigation window between the parts editions 4 3 Build Sub corpus That command is used to build a sub corpus of the selected corpus The new corpus is created as a child node in the corpus view That function opens a dialog box entitled Create a sub corpus It is composed of three tabs one for simple sub corpus build one for assisted sub corpus build and one for advanced sub corpus build 4 3 1 Simple sub corpus building Illustration 16 presents the sub corpus builder simple tab form In that form one has to enter the name of the new corpus the name displayed in the corpus view select a structural unit type se
62. l screen double click on the window tab putthe window back to its original size double click on the window tab move and resize the window depending on the place it is dropped drag the tab of the window to the place it should go Before releasing the mouse when it arrives at the center of the outer limit of the underneath window called a hot spot a ghost window frame is drawn to show the size and the place the window will have if the user releases the mouse there Each middle border of the underneath window has a hot spot to choose left to split vertically and let the window on the left side right to split vertically and let the window on the right side up to split horizontally and let the window on the top side bottom to split horizontally and let the window on the bottom side minimize the window click on the Minimize icon of the window Each interface zone objects and results manage its windows in a coherent way The current window layout is always saved automatically by TXM 25 TXM Reference Manual 0 5 3 3 Getting Help The text of that manual will be embedded into the TXM platform as an implicit corpus with its own edition TO BE DONE 3 4 Working with Corpora 3 4 1 Quick introduction With TXM you can analyze textual data coming from various sources the File Import From Clipboard command allows you to use the TXM platform commands on any text you have s
63. lect a property of that unit and its value xb TXM Reference Manual 0 5 x Create a sub corpus Name De Gaulle Simple Advanced Structure text wv Property loc v Pompidou de Gaulle Illustration 16 Simple sub corpus selection build the sub corpus of all the speeches of the De Gaulle president The new corpus will contain all the lexical units found in all the structural units of the given type with the given property set at the given value 37 TXM Reference Manual 0 5 4 3 2 Assisted sub corpus building Illustration 17 presents the sub corpus builder assisted tab form In that form one can Enter the name of the sub corpus e Check all criteria to treat all the criteria of the search or some criteria to treat some element constituting it e Select the structure of the sub corpus Write the selection criteria o adda criterion with the button o delete a criterion with the button o choose the property used by the criterion that contains or does not contain an property value o refresh the query of the sub corpus Click on OK to create the sub corpus w Create a sub corpus Name 12 Verse Simple Assisted Advanced Match S all criteria some criteria Structure text wv Issiecle contains 12 1 B ssiecle contains 12 2 B ssiecle contains 123 M forme w contains w f
64. lines in the results table with the mouse then in the contextual menu right click choose Graphic Specificity score nm om Illustration 34 Specificity graphic of the je jeune word forms between discourse genres in the DISCOURS corpus In the graphic each part will be represented by a set of contiguous bars in the same order as in the table for each word property value selected the word form in the example the score will be represented by a bar of the same color in each part the legend in the upper right of the graphic gives the key of colors for each value Shift click selects several contiguous lines Ctrl click selects several non contiguous lines 55 TXM Reference Manual 0 5 4 8 1 3 Browsing the graphic To ease the reading of the graphic you can pan with Shift Left mouse button drag zoom in Shift Right mouse button drag zoom to selection Ctrl Left mouse button drag rotate Ctrl Right mouse button drag reset the view FS 4 8 2 Sub corpus specificities For a sub corpus that command allows you to choose on which word property to compute the specificity scores on Thus the command opens the same dialog box as the Lexicon command in illustration 27 The command then displays after the columns of the word property values and their global frequency two lists of scores one list for the score in the complement of the sub corpu
65. ndOW ss 60 90 TXM Reference Manual 0 5 Index A A AO 11 21 Command 10 11 13 14 15 16 17 18 19 22 23 24 25 26 28 29 31 32 33 34 36 37 39 44 45 46 47 50 51 53 54 55 57 60 62 65 71 COnDCOFdAHGO aie tit n 11 15 21 32 39 41 42 43 44 49 50 60 62 65 CODlexT is o p A a c eoe Gay poA Nu tei tai A 42 43 45 60 Contextual menu is 15 17 19 29 42 43 44 50 52 59 65 CCODCQUEFeBOV A cence n Ma Eod eet ts 15 21 44 60 Gooccurrent ainia a es ive ie atest es Sere eat ei hee ees 11 21 44 45 60 Corpus 6 11 13 14 15 18 21 22 23 24 25 26 29 30 31 32 34 36 37 39 40 41 44 45 46 47 50 51 53 54 62 63 64 65 Correspondence ANALYSIS id ave 11 15 21 37 56 COP e i e mu 21 36 39 40 44 45 46 47 49 51 54 62 63 64 65 IDESCHIPUON o desde dd sacratis d 18 21 57 60 64 PUPS C LOL Y A 7 9 10 16 21 24 25 26 27 65 A eter coveted cals shoe bens Tea den denn edel ined 6 7 26 27 28 29 conce Pr Mc m 13 14 17 33 EDO O ted dis 17 21 25 27 29 46 47 55 59 65 Pl a nes dtu 7 8 9 10 14 16 17 21 24 25 26 27 28 29 65 BlVOVGE eee n E rad ex da e a ubec est lae eode a dal Gt cre E pH dcus 28 32 o m MADE EE O A E E RUD LR RET RHONE ER 16 A oed c Ss ves ede E EE Se Dis 11 15 24 25 26 27 28 29 44 47 60 71 Graphi nn mean en Hesse anne eid bans 52 53 54 55 56 60
66. odule Any lt text gt tag found in the source will be renamed lt textunit gt by the module If some words are delimited by lt w gt tags they will be taken as such with their properties imported from the tag attributes Care must be taken so that all lt w gt elements have the same number and names of attributes 7 4 2 output Each XML tag level generates one structural level oe TXM Reference Manual 0 5 7 4 3 edition Each text is edited by taking care of spaces and punctions marks between words and is paginated by blocs of n words 7 5 Transcriber CSV module 7 5 1 input Body of text That module imports the transcription files encoded in the XML TRS extension trs format found in the source directory That format is generated by the Transcriber software The files must come with the trans 14 dtd file to be valid Each transcription will be associated to one textual unit or text Text metadata Text metadata are imported from a file encoded in the CSV format called metadata csv and found in the same directory as the sources The column separator is the comma the field character is the double quote The first header line names each metadata column The first column must be named id the following ones can be named freely but without using any accented or special characters The first column must contain the name of the source file without the extension
67. on 1 The general interface of TXM The user interface of TXM is divided in four main zones depicted in illustration 1 the explorer root corpora results of commands scripts icons In fact all objects which are managed by TXM and produced by commands the commands where actions on objects are expressed theresults the output windows the messages the comments from commands execution 14 TXM Reference Manual 0 5 All the zones are managed by a single window manager We will first present the main zones and then present how to organize the interface with the window manager 3 2 1 1 The explorer FE QUETE NI word Illustration 2 The explorer The explorer is the main place for the user to select the objects on which to apply the commands of TXM and to get to the results of the commands The explorer is organized in two different views the Corpus view related to available corpora for analysis the File view related to files found on the file system to edit Each view is accessed by its specific tab The Corpus view Corpus 7 File Tom TC QUOTES QUETE Ni word BEDS M lemma je a e Illustration 3 The Corpus view The Corpus view displays all the different corpora available for analysis within TXM and all the icons of the objects built by TXM during a work session The corpora have been created by the Import command
68. ons processed for the DISCOURS corpus C Documents and Settings txm TXM informations Infos_class org textometrie searchengine cqp corpus MainCorpusDISCOURS html 23 file C Documents and Settings txm TXM nformations Infos_class org textometrie searchengine cqp corpus MainCorpusDISCOURS html v p E A Description of the corpus DISCOURS General Statistics Number of words 105191 Number of word properties 4 Number of structural units 4 Lexical properties and their values 20 max o func 31 T V B K S Q H E F L t C N p D c I P O o lemma 4996 le affaire de France tre difficile mais hier elle sembler insoluble aujourd hui non ne ce pas un progr s o pos 291 Da p d Nefp Sp Da fs d Np s Vmip3p Afp p Ypw Cc Rep Pp3fpn Vmi3p Yps Ren Rpn Vmip3s Pd n Da ms i Necm Da ms d o word 8965 Les affaires de la France sont difficiles mais hier elles semblaient insolubles Aujourd hui non N est ce pas Structural Units properties and values 20 max text date 29 01 12 1969 04 02 1965 05 09 1960 09 02 1967 09 09 1968 10 04 1969 10 18 1969 11 04 1961 Guau Owe Illustration 13 DISCOURS Description 34 TXM Reference Manual 0 5 4 2 Read Edition 4 2 1 Corpus For the selected corpus that command displays the first page of the HTML edition of the first text of the corpus The preamble
69. ormme vers w Query fregion text a a text_ssiecle 12_1 amp a text ssiecle 12 2 amp ate Illustration 17 Assisted sub corpus selection build a sub corpus of the texts of the 12 century in verse 38 TXM Reference Manual 0 5 4 3 3 Advanced sub corpus building Illustration 18 presents the sub corpus builder advanced tab form In that form one has to enter the name of the new corpus the name displayed in the corpus view write a CQP query which selects all the lexical units composing the sub corpus The new corpus will contain all the lexical units returned by the query w Create a sub corpus Name Pompi970 Simple Advanced Query fregion text a a text_loc Pompidou amp a text date w 4 4 Build Partition That command is used to build a partition of the selected corpus The new partition is created as a child node in the corpus view That function opens a dialog box entitled Create Partition It is composed of three tabs one for simple partition build one for assisted partition build and one for advanced partition build 4 4 1 Simple partition building Illustration 19 presents the partition builder simple tab form In that form one has to enter the name of the new partition the name displayed in the corpus view e select a structural unit type select a property of that unit The parts of the new partition will be bu
70. ow in which the text can be modified like a source text file or a script file encoding mod the way in which an information is represented in a source corpus export com the action of saving in a file the results of a TXM command for external processing crediting factorial com the action of reducing the dimensionality of a parts x words correspondence matrix according to the correspondence analysis algorithm The analysis new dimensions are represented by eigenvectors called factors The parts and the words from the original matrix can be displayed in the resulting factorial planes file mod lan elementary container of information on the user file system like a text or a corpus source A file can be designated with a path flyover int a small popup window displayed while the mouse moves over an object in the interface for example a word in an edition focus int a way to concentrate a command on a specific word event for example through a search query 83 TXM Reference Manual 0 5 form mod the graphical form of a word generally computed by tokenizers frequency met the total number of occurrences of an event a word occurrence a sequence of words occurrence etc in a corpus Groovy soft the computer language in which the TXM platform scripts are written HTML fmt the data format of web pages Hyperbase soft an academic software o
71. pboard loader usage 1 Select then copy some text from an application OpenOffice Writer Thunderbird Firefox etc 2 Select the command File Import From Clipboard 3 A corpus named ClipboardN is added to the corpus view where N is the current clipboard import number in the current session Directory loader usage 1 Select the command File Import Directory 2 In the popup form select the directory containing the raw text source files Note each source file will be imported as a textual unit it must have the extension TXT to be considered by the import process This command imports all the files contained is the selected folder tree folders and sub folders 3 A corpus with the same name as the directory will be created in the corpus view 3 4 3 2 Raw XML Loaders The XML structure entry of the File Import menu imports a single valid XML file into TXM Each tag will be interpreted as a structural unit with properties coming from the tag attributes Each word is tokenized and annotated with a part of speech property and a lemma property The text tag is not interpreted by the tokenizer so be sure to remove it before the import process XML structure loader usage 1 Select the command File Import XML Structure The TEI TXM format is an extension of the XML TEI P5 format Its schema is not publicly released yet By default TreeTagger is used to tag words with the F
72. ph type cumulative O density Structural Unit text Property loc Regex France Alg rie Illustration 36 Progression processing parameters for the France and Alg rie words in the DISCOURS corpus sS TXM Reference Manual 0 5 Clicking on ok displays a progression graphic such as in illustration 37 In this graphic the speaker name is displayed at the beginning of each discourse The curves represents the progression of the France and Alg rie words Progression graph of in corpus DISCOURS structure text property loc Occurrences Limit of one De Gaulle s speech Illustration 37 Progression graphic on the France and Alg rie words in the DISCOURS corpus The graphic can be exported with the Export button in the toolbar 4 10 Correspondence Analysis The CA command computes the correspondence factor analysis algorithm on a partition based on the frequency of values of one of their word properties word forms lemmas pos in each part Jean Paul Benz cri et al L analyse des correspondances Paris Dunod 1973 Computed by the CA R package 58 TXM Reference Manual 0 5 Applied on a partition of at least four parts or on a lexical table the command first allows you to choose on which word property to compute the algorithm Thus the command opens the same dialog box as the Lexicon command in illust
73. ports the text from the clipboard 4 Load loads a new corpus from its binaries directory Open opens a file in a new text editor Browse opens a file in the integrated web browser Restart restarts TXM search and statistics engines Change language shows a window to changing the interface language of TXM as set in the preferences menu Preferences gt TXM gt User gt Language Preferences To set various parameters of TXM like some threshold calculation minimal frequency etc Exit quit the application Corpus Menu Open edition displays the first page of the edition Description displays the structures and the word properties available Delete deletes the selected object Create sub corpus builds a new sub corpus Create partition builds a new partition Lexical table creates a lexical table from a partition or a partition index Tools Menu Lexicon lists all the different values of a specific property of words with their overall frequency Index lists the different values of combination of different word properties with their overall frequency from the results of a specific CQP query Concordance searches for patterns of a CQP query expression and display results as kwic concordances Progression displays evolution of one or more patterns throughout a corpus Cooccurrences computes cooccurrents from a CQP query Specificities lists the positive and negative specificity scores of a specific property of words fo
74. property value in the specific part t is the total number of words in the part Illustration 33 presents the Specificity scores of all the word forms matching j that is starting with a j character in the partition of discourse type s for the DISCOURS corpus The table is sorted on the score of the part for the Allocution radiot l vis e type decreasing q word j 53 m Units Frequency T 105191 Allocution radiot l vis e t 49868 Conf rence de presse t 41834 Entretien radiot l vis t 13489 je 359 2 5707 2 6872 0 334 jeune 7 1 3264 1 5413 0 4172 jamais 68 0 8234 1 2965 0 6008 jeunesse 20 0 7375 0 5905 0 3165 journalistes 2 0 6483 0 4404 0 6197 jeunes 13 0 3723 0 378 0 3106 janvier 9 0 3601 0 3112 0 5364 juin 5 0 3454 0 5038 0 3041 Y lil e Illustration 33 Specificity of j word forms in the discourse type partition of the DISCOURS corpus 4 8 1 1 Sorting You can sort the table by columns by clicking on their head line You can change the sort order by clicking a second time When a score column is sorted downward the top words are considered overused in the corresponding part with respect to the whole corpus the last words are considered underused and the middle words around the zero score are considered commonplace the score 1s useless for them 54 TXM Reference Manual 0 5 4 8 1 2 Graphics The scores can be visualized graphically Select some
75. r each part of a partition Correspondence Analysis draws texts and word properties on the factorial map of the first two factors obtained by factorial correspondence analysis on a partition Settings opens parameters page of TXM tools In this version this is the same as the File Settings menu 255 TXM Reference Manual 0 5 Help Menu e Key Assist displays all the available keyboard shortcuts Report a bug opens the report a bug web page e Ask for enhancement opens the ask for feature web page Submit to txm users list opens the submit page of the txm users mailing list e Check for update opens the Sourceforge TXM download page Install TreeTagger opens the TreeTagger install tutorial page e About displays TXM version number and license informations 3 2 1 5 The Results Concordance results Concordance Navigation Query field pL 7 ut mad El i 8 DISCOURSE b mam ja T7 i Properties ward ca s wt a ine Theezolde Frin 1 z ELOSLaL Acces eniin i i i i i i were r 8 E I x0 l de 3516 Left contest Keyword ght contet i 2 Ei vote Au cas o votre r ponse cersi non va de sci que n assumeras pas plus longtemps ma Fonction s per un oat mass H la X i Fonction 4 per un od masse vous mexceine votre corfianza f
76. r the corpus and the partition objects Me TXM Reference Manual 0 5 3 the user can open the Contextual Menu by clicking on the right button of the mouse on the object to apply the command on File Corpus Tools Help MOXI M AM D Corpus File H Xm Create Sub Corpus 4 i I Create Partition NZ Lexicon HZ index Concordance imp Progression Illustration 9 The Corpus Contextual Menu The commands are described in detail in the section 4 Using TXM Commands All results windows can also give access to commands depending on the object types contained in the result 350 x 3 2 1 3 Icons TXM Reference Manual 0 5 Here is the list of all the icons used in the TXM graphical interface Objects icons Corpus Partition Open edition Progression Lexical table Commands icons CA Concordance e Cooccurrences Create Partition Create Subcorpus Delete Description Export Index Lexicon Query assistant Search ef oN EBER e x Settings Specificities om TXM Reference Manual 0 5 3 2 1 4 The Main Menus Here is the description of all the available main menus in TXM in the upper left part of the interface File Menu Export exports a result at least as raw text Import imports a new corpus from its sources with one of the available import loaders 3 From clipboard im
77. ration 27 The results are then presented in two different windows the first one displays the first factorial plane graphic the second one displays the factorial analysis parameters for the graphic interpretation It is divided into four tabs o singular values o lines information o columns information o barplot graph of the singular values E Date pos C Date pos 3 E Factors 1 2 y show individuals variables Cols sini T 31 12 1966 31 12 1965 31 12 1964 31 01 1964 30 01 1959 29 12 1961 29 07 1963 s ang vers mue RUE oumenses 27 06 1958 Nri m P dados 24 06 1971 T 24 05 1968 9508 1869 1103 1989 21 02 1966 16 04 1963 01 12 1969 a 15 05 1962 a 14 06 1960 09 02 1987 12 03 1970 a 11 04 1961 11 03 1969 14 06 1960 10 18 1969 27 06 1958 24 5 1968 n 10 11 1959 a 10 04 1969 24 06 1971 28 12 1958 09 09 1968 a E 48041003 09 02 1967 002 9656 05 09 1960 30 01 1959 S 0502 1962 04 02 1965 01 12 1969 0 0 Rosero 15 05 1962 9 09 0 2 31 12 1964 n axis 2 intertia 5 0 4 3142 1955 n 0 5 31 12 1970 a 0 6 0 6 0 4 0 0 0 2 0 2 axis 1 intertia 12 75 lt Singular values Rows infos Cols infos y Illustration 38 Graphics obtains from a lexical table with the Date property on the DISCOURS corpus E The CA window can display the individuals or the variables or both for that check or uncheck individ
78. re text To choose a property select it then click on the right gt button to move it in the right panel which is the list of the properties which will be displayed in the reference column To unselect a property select it in the right panel then click on the left lt button to move it back to the left panel To change the order of properties in the right panel use the up MAN and down v buttons 4 5 7 Export Concordances can be exported in the CSV format select the concordance object in the corpus view and use to g icon in the toolbar or the Export entry in the contextual menu 4 6 Cooccurrences This command builds the table of the cooccurents around a CQP query The cooccurrency score allows you to sort cooccurrents according to their a priori encounter probability The higher the score the more surprisingly high is the number of observed encounters of the cooccurrent and the expression in the corpus The command opens a parameter window like in illustration 26 UP Lafon Sur la variabilit de la fr quence des formes dans un corpus Mots no 1 1980 127 165 47 TXM Reference Manual 0 5 Champ de requ te CQP Seuils de fr quence Assistant Propri t s des cooccurrents Lancer la cooccurrence o D RR jn pd 1 ae ee i Propri t s des cooccurrents word Eater _ El Ind
79. red by the Copy command ClipN int all the corpora created from the clipboard are automatically named Clip lt a number gt CNR fmt the data format of the output file of Cordial command com an elementary action available in TXM concordance com a way to present the results of the search engine where every hit is displayed on its own line with some contextual words around console int TXM displays various messages while executing commands in a special window called the console Cordial nlp a commercial tagger 82 TXM Reference Manual 0 5 corpus mod a compilation of word sequences Sequences come from texts in whole or in part Root corpora are build from a selection of texts CQL exp for Corpus Query Language query language managed by CQP applied to corpus CQP soft for lt Corpus Query Processor gt software component processing the search queries to build the index concordances etc CSV fmt for lt Comma Separated Values gt a textual file format where each record is separated by a newline and where each property or value is separated by a chosen character like comma Ctrl int the Ctrl or Control key on the keyboard directory mod a file containing other files or directories on the file system of the user A directory can be designated by a path document mod a text from a logical point of view editor com a textual wind
80. rench model but you have to install it yourself because of TreeTagger licensing conditions see the tutorial displayed by TXM if it is not installed yet This bug will be removed in a next release c TXM Reference Manual 0 5 2 In the popup form select the directory containing the raw XML source files Note each source file will be imported as a textual unit it must have the extension XML to be considered by the import process 3 A corpus with the same name as the directory will be created in the corpus view 3 4 4 The Advanced Import Framework The TXM platform is designed to import various kind of source corpora To ease the design of specific corpus loaders several key concepts have been defined to specify the import process a document unit represents a body of textual data for which all the metadata are the same the metadata of a document unit is a simple set of properties having simple values title date author s name domain type adocument unit is organized as a tree of structural units each node can have any number of properties having simple values the leaf nodes of a document unit are the lexical units words an NLP tool can be applied to any source file during the import process like a tagger each document unit can have one or several editions built An import process or loader creates theses key concepts into the TXM platform from the informations found in
81. ribed in the java org textometrie functions concordances package documentation for the Concordance class that is at http textometrie sourceforge net javadoc index html java org textometrie functions concordances Concordance html All classes and methods described in that documentation are available for a Groovy script 6 2 Running R scripts and commands The TXM platform uses the R statistical environment to implement some statistical models To this end it loads specific packages processes results and displays them in its user interface For example it displays in a new window the specificity barplot graphics computed by R This version of TXM allows you to also edit and run yourself R scripts from within its user interface The text of the scripts to execute can be stored in a file or simply selected and copied from an editor window see the Text Editor section The best way to start writing your own R script is first to modify the sample scripts released with TXM in the C Documents and Settings lt your login name gt 1TXMY scripts directory For example e Thex sample R script generates a vector of points following a normal law then displays them The HelloWorldR groovy script shows how to embed a R script inside a Groovy script and then to call it e For scripts generating graphics the executeRscript groovy script shows how to call the plot100 R R script from Groovy while allowing t
82. rity level of your Windows operating system the following dialog box may pop up in your language Fichier ouvert Avertissement de s curit L diteur n a pas pu tre v rifi Voulez vous vraiment ex cuter ce logiciel m Nom Setup exe Editeur diteur inconnu Type Application Source C Documents and Settingsisheiden Mes document Ex cuter Annuler Toujours demander avant d ouvrir ce fichier Ce fichier ne contient pas de signature num rique valide authentifiant son diteur N ex cutez que les logiciels des diteurs approuv s par vous Comment savoir quels logiciels je peux ex cuter In that case please click on the left button Ex cuter Execute gt Only Windows XP Vista amp Seven and Linux Ubuntu have been tested for this release 9 TXM Reference Manual 0 5 In the next dialog box Name Setup Setup will install Name in the following Folder To install in a different Folder click Browse and select another folder Click Install to start the installation Destination Folder C Program Files TXM Space required 88 3MB Space available 22 4GB Click on Insta11 you may choose another install directory before The install process takes about a minute If during the installation process you get the following message TXM 0 4 6 Setup a Extract Copie de R exe 100 Error opening file for writing C Program Files
83. s with respect to the parent corpus named corpus name part name one list for the score in the sub corpus with respect to the parent corpus named part name o DISCOURS alloc radio t l t255323 alloc radio t l t 49868 3 7737 3 7737 r pete 3 5145 3 5145 Alg rie 3 0792 3 0792 aient 2 8228 2 8228 je 2 5707 2 5707 autod termination 2 4452 2 4452 construire 2 4294 2 4294 moi 2 2739 2 2739 Sahara 2 2692 2 2692 bombes 2 2692 2 2692 vi Illustration 35 Specificity scores of the word forms of the Allocution radiot l vis e discourse genre in the DISCOURS corpus 56 TXM Reference Manual 0 5 4 9 Progression A progression displays graphically the evolution of one or more patterns throughout the corpus This command is launched on a corpus It makes a cumulative or density graphic and adds the selected structure limits in the corpus When launching the Progression command a parameters window is opened like in illustration 36 Then you can Choose the progression display type cumulative or by density Choose the structural unit displayed each vertical bar corresponds to a unit limit and one of its property to display Filter property values with a regular expression Add one or more CQP queries to display possibly with the Query Assistant with the ada button You can also remove one query with the delete button x Repartition Parameters Gra
84. ste Cut Delete Undo Redo To Upper Case To Lower Case Find Find and Replace Find Next Find Previous Incremental Find Incremental Find Reverse Shortcut Ctrl Shift L Ctrl A Shift Home Shift End Ctrl Shift Right Ctrl Shift Left Ctrl C Ctrl Insert Ctrl V Shift Insert Ctrl X Shift Delete Delete Ctrl Z Ctrl Y Ctrl Shift X Ctrl Shift Y Ctrl F Ctri K Ctrl Shift k Ctrl J Ctrl Shift J 78 Move Text Start Text End Line Start Line End Next Word Previous Word Go to Line Last Edit Location Delete Delete Line Delete to End of Line Delete Next Word Delete Previous Word Move line Move Lines Up Move Lines Down Insert line Insert Line Above Current Line Insert Line Below Current Line Other Join Lines Scroll Line Up Scroll Line Down Duplicate Lines TXM Reference Manual 0 5 Ctrl Home Ctrl End Home End Ctrl Right Ctrl Left Ctrl L Ctrl Q Ctrl D Ctrl Shift Delete Ctrl Delete Ctrl Backspace Alt Up Alt Down Ctrl Shift Enter Shift Enter Ctrl Alt J Ctrl Up Ctrl Down Ctrl Alt Up 79 Copy Lines Toggle Folding Mode Toggle Insert Mode Toggle Overwrite Toggle Block Selection Quick Diff Toggle Show Ruler Context Menu File New Save Close Close All Print Properties Refresh Misc Word Completion TXM Reference Manual 0 5 Ctrl Alt Down Ctrl Numpad_ Divide Ctrl Shift Insert Insert Alt ShifttA Ctrl Shift
85. the complete frequency list of all the word forms or pos tags or word lemmas etc of a selected corpus or sub corpus First choose for which word property to build the list for Property Ey v Illustration 27 Lexicon dialog box The result is a sortable and exportable table zd s TXM Reference Manual 0 5 Properties word Thresholds Fmin li El Fmax Ymax Page size lt 1 100 8965 t 105191 v 8965 fmin 1 fmax 8603 Illustration 28 word forms frequency list of the DISCOURS corpus sorted alphabetically You can sort each column by clicking on its header Another click toggles the sort order You can export this table into the CSV format 4 7 2 Index The Index AH command computes the frequency list of the result set of a specific CQP search query expression for a selected corpus or partition 50 TXM Reference Manual 0 5 M piscours 2 Query Y Properties word Thresholds Fmin 1 2 Fmax 99999 Vmax 99999 7 Page size 10 Units r Illustration 29 Index initial dialog box 4 7 2 1 Properties combination First select the combination of properties with the Properties Edit button w Select view properties Illustration 30 Index word properties editor Select each property to combine in the left panel then use the arrows to move it to the right panel or to remove it gt add the property to the combin
86. tize texts from their sources tagset mod the set of all the possible values for the morphosyntactic property of words TEI fmt for Text Encoding Initiative gt the standard way of encoding texts See http www tei c org The TEI format is expressed in XML text mod a possibly structured homogeneous sequence of words possibly described by properties A text can be described by its metadata textometrie met the general methodology underlying TXM See http textometrie ens lyon fr tokenizer soft a software component to compute word boundaries by their character properties in source files TreeTagger soft an academic tagger TXT fmt the data format of raw text files without annotations unit mod a leaf unit or lexical unit or a structural unit of a text V met the total number of different graphical forms of a corpus vocabulary com the action of processing a lexicon or an index Weblex soft an academic software of textometry window manager int a software component helping to organize the interface windows word mod a lexical unit identified by its graphical form and its position in 86 TXM Reference Manual 0 5 word sequences generally computed by tokenizers workspace int the set of all the objects available to the user in TXM corpus sub corpus XML fmt the main data format for corpus source ERT TXM Reference Manual 0 5 10 Bibliography Barclay Kenneth
87. ty the command will be applied on 12 Shift click selects several contiguous lines Ctrl click selects several non contiguous lines bid lt http www persee fr web revues home prescript article mots_0243 6450_1980 num 1 1 1008 gt originally presented at the Association for Literary and Linguistic Computing conference at Oxford the 4 5 of April 1976 253 TXM Reference Manual 0 5 Property filter you can filter the values of the word property lines that will be processed with a regular expression this is not a CQP expression yet o you can add as many filters as you need with the button if you don t specify any filter all the values will be processed For example for the word form property all the word forms will be processed e Part filter you can filter the values of the structural unit property columns that will be displayed You can use the v button to access the available values o you can add as many filters as you need with the button if you don t specify any filter all the parts will be considered by the command The result is a table with lines the word property values appearing in all parts columns the values of the structural unit property taken into account the parts o the first column gives the total frequency of the word property value in the corpus T is the total number of words o the other columns gives the logarithm of the specificity score of the word
88. uals and variables then update the view by clicking on refresh The graphic can be resized by clicking on Resize see also the graphics shortcuts in section 6 2 for zooming pan rotation etc 59 TXM Reference Manual 0 5 By default the correspondence analysis plot shows only the parts in the plane You can change this in the CA preference page Show individuals in graph display word property values Show variables in graph display parts In the right pane many details information are available for variables individuals and singular values reading For each singular values the table displays the value numbers the singular values and the percentage of the singular value Display of lines and columns tables Quality of the plane for each plane the quality of the representation of the point is computed as the sum of the point s cos values on the two axis defining the plane The closer the quality is to 1 the less is distorted the point position after its projection onto the plan Relative weight frequency divided by the sum of the other words frequency lines Distance of the point from the origin that is the center of the representation or the center of the cloud of points Contribution of the point to the axis building Contributions sum to 100 and points with the highest contributions are used to interpret the axis cos of the point along each axis a measure of the angle between th
89. ults to list Page size the number of results per page 4 7 2 4 Browsing The Index first displays the first page of results You can navigate through the results with the top buttons go to the first page of results 52 TXM Reference Manual 0 5 lt go to the previous page gt go to the next page gt go to the last page 4 7 2 5 Hypertext The Index command is linked to the Concordance command Select some lines in the Index results with the mouse then in the contextual menu Ctrl click choose Send to concordance a corresponding query will be generated for a new concordance to build 4 8 Specificities The Specificity command i uses a probabilistic model to compute the overused and the underused word properties word forms lemmas pos of each part of a partition for example of each text or of each century of the corpus or of a sub corpus with respect to its parent corpus 4 8 1 Partition specificities The partition is associated to a structural unit and to one of its properties of which each possible value is associated to a part in the partition The Specificity command opens the following parameters dialog box amp Computing specificities Property word Focus Part Focus x Illustration 32 Specificity for a partition dialog box The parameters are the following Word property you can choose the word proper
90. us dire en commencant mettre l ordre du faire rire quelques uns ou vous dire vous entretenir Je crois que celui d entre vous vous dire tr s simplement comment que ces entretiens soient une r pondre un instant ceux vous rappeler que des deux qu on se repr sente ce des Francais accepter d tre l id ologie communiste a vous dire que j accepte dire qui obscurcissent plus parler et vous exposer montrer au pays o nous r pondre celui d entre Illustration 24 Concordance of the je word followed by a verb in the DISCOURS corpus TXM Reference Manual 0 5 4 5 2 Browsing The concordance first displays the first page of results You can navigate through the results with the upper left panel buttons lt go to the first page lt go to the previous page gt go to the next page gt go to the last page The number of lines per page can be changed in the File Settings menu TXM gt User gt concordances preferences panel 4 5 3 Returning to text It is always possible to go back to the page of the edition containing the keyword occurrence by double clicking on the corresponding line in the concordance The words composing the keyword are highlighted in red in the page and keywords from other lines of the concordance occurring in the same page are highlighted in light red 4 5 4 Sorting You can sort the concordance by each column References Left

Download Pdf Manuals

image

Related Search

Related Contents

RSTG-3/2010 Sistema Electrónico de Transferencia de Información  Oakland Fund for Children and Youth    

Copyright © All rights reserved.
Failed to retrieve file