Home
TERMINAE User Manual - V14-1
Contents
1. x gt CHAPTER 12 ANNEX lt ELEMENT SENT ENCE lt ELEMENT STAR T_POSIT PCDATA ON gt lt ELEMENT SYNTACTI lt ELEMENT L TERM_CAND Gai CATEGORY DATE D IST_OCCURRENCES lt ELEMEN TERM_EXTRACTION_RESULTS lt ELEMEN lt ELEMEN lt ELEMEN lt ATTLI lt ELEMEN I T a5 T T T lt ELEMENT Text Type Vari Variant type Cl PCDATA type comm e s type ant EMPTY ent PCDATA PCDATA pS jS gt PCDATA PC DATA LEMMA NUMBER_OCCURRENCES gt gt FORM MORPHOSYNTACT NAMED_ENTITY comment 40 gt L IST_TERM_CAND C_FEATURES List_Variants L DATES ST_EN gt DATA REQU RE gt gt 12 4 DTD export set of forms to skos lt ELEMENT rdf skos editorialNote skos prefLabel lt ATTLIST lt ELEMENT lt ATTLIST lt ATTLIST lt ELEMENT lt ATTLIST lt ELEMENT lt ELEMENT lt ELEMENT lt ELEMENT lt ELEMENT lt ATTLIST lt ELEMENT lt ATTLIST rdf rdf rdf rdf rdf rats SKOS SKOS SKOS SKOS SKOS SKOS SKOS SKOS R R
2. 6 4 2 Term Management submenul a SAD eee ee Et a Saeed thE a Sah ee ee hy pe 7 Terminological level step 2 perspective 7 1 Perspective overview 2 4 erica su Pegs nr ea gow Xs bea Saar Garey Gare dare Gee es ee ee 7 3 Terminological actions Mea 7 3 1 Form management submenu 73 2 Feature management submenul 73 3 Termino concept management submenul 8 Terminae TerminoConceptual level perspective 8 1 Perspective OVEIVIEW was 992087800007 8 2 Data Termino conceptual forms 8 3 TerminoConceptual actions ME aia Le da Le ES AE A 8 3 2 Termino concept management submenul por RUE ee o ea a Se 9 Neon toolkit Conceptual level OWL perspective DL Persp tiv OVETVIEW 44 da da A A Wa 9 2 Terminae links mea 2 00000 8 10 Annotator perspective 10 1 Input files 3 22 das dr sr e e e 10 2 How to proceed 10 3 Some caveats 11 Collaboration perspective A ai ete Sonata gow a aoe ec E ee A ue ee Goce Sake Sage Kon ee oe ee ee pds a aes he Ge a ee th eS ee igh pg eine a Grace dow aha do ate Sane ate ne 12 6 Tree Tageer English Tassel lt 2 24 2 404 24m an a an dos ds 12 7 TreeTagger French Tagset sr ae a So Gla Sok Be al aS al 12 8 Use ANNIE to extract named entities
3. 12 9 Gate named entity type fle 24 4 4240444844 aa aa ewe a be ews TORES DIQUE 2 ee See DIS ok Be ek Beek ee a a 22 22 24 25 26 26 21 27 28 28 29 30 32 32 32 34 34 35 36 37 Chapter 1 Introduction This document describes the functionalities of the TERMINAE platform which is an eclipse application Chapter 2 gives a very short insight of the methodology Chapter 3 gives the technical characteristics and the installation instructions Chapter 4 presents the main menu and the following chapters chapters 5 to 10 introduces the 6 perspectives of the platform and the related functionalities Chapter 2 The Terminae method TERMINAE is a tool that is supported by a method and some very short forewords on the method can help using the tool The task is to build a domain termino ontological resource thesaurus or ontology This is an expert task since it needs to decide which concepts are really important for the domains and how they are related It has been experienced that linguistic tools relying on texts specific of the domain can help the expert They do not do the work in his her place but they propose a good starting point to improve the coverage of the domain and some ambiguities they raise reveal real and unseen ambiguities of the domain vocabulary The TERMINAE method starts from the linguistic results produced by a term extractor It has then three steps e At the linguistic level th
4. Benen 6 specification of the materials used which may affect the strength of the seat anchorages and belt anchorages and a technical description of the seat anchorages and the belts anchorages and a bench seat 3 technical description of the seat anchorages and the belt anchorages Figure 6 2 Visualisation of term extractor results CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 15 The window is composed of two views the Lexical units view on the left and the Occurrences view on the right The terminological units either terms or named entities are listed on the left view By clicking on the heads of the columns you can sort the list alphabetically Term by frequency Frequency or by type terms vs named entities and named entity type Named entity The last column of the Lexical units view allows to write comments if you click on a cell comment a text field appears and you can add a comment to the corresponding termino logical unit The comments are saved with the terminological results and can be reloaded upon request when the term extractor results are loaded The occurrences of the selected terminological unit in the working corpus appear on the right view 6 4 Linguistic actions menu The action menu associated with the Terminological level step 1 perspective is the Linguistic action menu It proposes 3 submenus and 2 actions which are also contextually accessible from the right click of the mo
5. Browse in the four left pane fields for the files which have been prepared Then run by clicking on the button with a triangle down this pane The annotated text appears in the right pane You can check it and if satisfied save it two buttons up the right pane allow to save either a project which SemEx can use or only two files CHAPTER 10 ANNOTATOR PERSPECTIVE 36 describing the annotations according two different formats Then 1f you continue annotating some more files for SemEx you can store the new results in the same project or create a fresh one 10 3 Some caveats The document must be in text format so pdf and other elaborated files have to be converted It is required to use the same encoding in the three files where non ascii characters may appear text POS and SKOS UTF 8 is proposed by default but other encodings can work too Due to OS and source files diversity encoding may need some care When debugging anomalies the text and POS file being non homogeneous results in scope errors and misses of the annotations The SKOS and POS file being non homogeneous results in misses Sentence splitting and word splitting are provided by the POS tagger Depending on it sentence boarders may happen to be internally incorrect e g because titles have no end point But the output exactly preserves the appearance of the input white space line length blank lines Some typography may be ambiguous w r t word splitting e g the
6. in which you have to define an URI added to the name of skos concepts to guarantee they are uniquely identified for instance http www lipn univ paris13 fr terminae Note that in the current version of the TERMINAE platform the termino conceptual relations are defined as in the exported file but only its value and its type Export SKOS RDF XML format to export a thesaurus in RDF XML format A dialog window opens in which you have to define an URI as for the skos format The termino conceptual relations are not defined in the exported file ExcelToSkos not described specific processing ExcelToSkos2 not described specific processing 8 3 2 Termino concept management submenu Select terminoConcept to select a terminoConcept with a part of its name Visualize all terminoConcepts to visualize all terminoConcepts Create termino concept to create a new termino concept You have to type in the name of the termino concept if it is not created directly from a terminological unit Remove termino concept to remove the selected termino concept You have to confirm the removal Rename termino concept to change the name of the selected termino concept To terminological form to go to the corresponding terminological form Add kindof link ALT K to give a father to the selected termino concept A dialog window opens in which you have to give the name of the father termino concept This can be done by drag and drop command D
7. lt ELEMENT EnsTerminoConcepts name TerminoConcept gt lt ELEMENT ID PCDATA gt lt ELEMENT NL Definition PCDATA gt lt ELEMENT OCCURRENCE ID DOC SENTENCE START POSITION END_POSITION Texte gt lt ELEMENT PrefLabel PCDATA gt lt ELEMENT RelationRTC name domain range Skos_type gt lt ELEMENT SENTENCE PCDATA gt lt ELEMENT START POSITION PCDATA gt lt ELEMENT See_also PCDATA gt lt ELEMENT SetRTC RelationRTC gt lt ELEMENT Skos_type PCDATA gt lt ELEMENT Synonym PCDATA gt lt ELEMENT TerminoConcept ID NL_Definition OCCURRENCE PrefLabel See_also SetRTC Synonym children fathers x lt ELEMENT Texte PCDATA gt lt ELEMENT child PCDATA gt lt ELEMENT children childx gt lt ELEMENT domain PCDATA gt lt ELEMENT father PCDATA gt lt ELEMENT fathers father gt lt ELEMENT name PCDATA gt lt ELEMENT range PCDATA gt 12 6 TreeTagger English Tagset CC CD DT EX FW IN JJ JUR Adject tive comparat JJS Adject LS list it MD Modal NN Noun NNS Noun NP Proper tive superlat tem marker plural noun Cooordinating conjunction Cardinal number Determiner Existential there Foreign word Preposition or subordinating conjunction Adjective tive tive singular or mass singular gt CHAPTER 12
8. 2 forms These forms have to be selected before this item is used A window open see fig 7 2 The upper part of the window is a form aa la Ar es TD free_101630 1D_free_1D3646 DOC 38 SENTENCE 1131 DOC 74 SENTENCE 1754 ID free IDI TerminoConcepts DOC 42 SENTENCE 1233 Left 14 1 Right 13 1 incoming deletion 4 Left 13 13 Right before line 13 Cancel Commit Figure 7 2 Merging 2 forms comparing window Each form field is presented and is compared to the equivalent field of the other form Upper right buttons may be used to see all differences The lower part of the window is divided into two parts The left part window contains the fields of the merging form These fields have to be filled by fields from initial forms using copy paste action The right part window contains three fields Variants this field is filled by identical variants from the two forms The user may add other variant from each form using copy paste action x Occurrences this field is filled by identical occurrences from the two forms The user may add other occurrence from each form using copy paste action He she may add all occurrences using paste all occurrences item from con textual menu CHAPTER 7 TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE 24 Comment this field is filled by the user If all fields are filled the user has to click on the commit button The resulting form is added to the list of for
9. characters Cluster terms to cluster several lexical units You first have to select the various units you want to cluster then click on the Cluster terms action and choose the canonical form you want to keep The alternative forms are removed from the term list and all their occurrences are attached to the canonical form which frequency count is updated For each alternative form it is proposed to add it as variant of the canonical form If it is a variant you have to choose its type abbreviation acronym or lexical variant Add a term to add a new term to the term list Remove a term Ctrl R to remove the selected term from the list Undo remove to undo the last remove action This may also undo a cleaning action see Section 6 4 3 View occurrence context to visualise the surrounding sentences of an occur rence You have to select the occurrence identifier see Figure 6 4 and to set the size of the expected context expressed as a number of sentences Add occurrence for a term to enter a new occurrence for a term You have to select a term and fill the form see Figure 6 5 Remove occurrence s for a term to remove occurrence s for a term You have to select the identifier of occurrences you want to remove Select terms by documents This action is used when the corpus has several documents You search candidate terms which are presents in one or many documents A dialog window opens in which you have to define the number
10. created new form s can be vi sualized on the Terminological level step 2 perspective which is auto matically opened and lexical unit s which form has have been created is are displayed in blue character in the Lexical units view Terminological level step 1 perspective If the number of occurrences is greater than 100 a window dialog opens to ask if the occurrences have to be all kept If the response is no a window dialog opens to define the number of occurrences to keep e To terminological form allows to visualise the terminological form of the se lected terminological unit if it has one This action automatically switches from the Terminological level step 1 perspective tothe Terminological level step 2 perspective 6 5 Occurrence view Popup menu A popup menu is associated to the occurrence view The actions are accessible from the right click of the mouse e Add occurrence for a term to enter a new occurrence for a term You have to select a term and fill the form see Figure 6 5 e View occurrence context to visualise the surrounding sentences of an occur rence You have to select the occurrence identifier see Figure 6 4 and to set the size of the expected context expressed as a number of sentences e Remove occurrence s for a term to remove occurrence s for a term You have to select the identifier of occurrences you want to remove e Find ALT F to search some group of words in the occu
11. file format e Load named entities to load the named entities from an XML backup e Load all lexical units to load the terms and named entities from a single XML backup e Save all lexical units to make an XML backup of all entities terms and named entities see Annex for details on the file format e Load new term extractor result to load a new term extractor result if you want to load a new version of a term extractor result or another term extractor result e Compare lexical unit list with another to compare two versions of lex ical unit list The first one is the list of the current project The second one is the lexical unit list of another project The lexical unit list is defined in the ensLexUnit file which is in the repExtractTerm directory of another project This is the first functinality for collaboration project If everything works properly when all types of terminological data are loaded the window of Figure 6 3 appears Terminae project DemoFinale Linguistic actions Perspectives Show View help 5 ETerminological Level step 1 Project perspective ennui SRE Term Frequency Named entity comments Named entity type Organization 1975 1 Date Noun phrases 2002 ul Date Occurrence 1 1958 5 Date ID occ2891 doc O sent 1026 Every modification of the vehicle type or the belt or restraint system or 272 lanar JE i PELO both shall be notified to the Administrative Departme
12. form using a defined corpus Create all terminological forms to create all terminological forms from a preexisting thesaurus This functionality is useful when you want to add terminological information and occurrences to an existing thesaurus You start from an existing thesaurus and create a terminological form for each termino concept using a defined corpus Expand tree to expand all the branches of the tree Close tree to reduce the tree to its roots 8 3 3 Feature management submenu This submenu proposes various actions related to the detailed information provided for a given termino concept and recorded in its termino conceptual form Add a synonyn to add a synonym to the selected termino concept A dialog window opens for capturing the new synonym If the corresponding terminological unit has been found by YaTeA or ANNIE its occurrences are automatically clustered with that of the current termino concept Remove a synonym to remove a synonym You have to confirm if you want also to remove the related occurrences Add a link to add a type of link and its link and its value Remove a link to remove a type of link and its link and its value Modify link type to modify link type Modify link to modify link Modify link value to modify value link Permute term and synonym to permute the term and the selected synonym CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE 30 8 3 4 Neon ontology submenu This menu is
13. in the repExtractTerm directory of another project This is the first functinality for collaboration project Open comparing view step 2 to compare two versions of terminological form table This file is in termi noFormDir directory and is named tableTermeFiches xml 37 Chapter 12 Annex This annex lists the DTD used by Teminae 12 1 XML backup DTD for terms The DTD of the XML file which contains terms and their occurrences which is visualized in Terminological level step 1 perspective lt ELEMENT DOC PCDATA gt lt ELEMENT END_POSITION PCDATA gt lt ELEMENT FORM PCDATA gt lt ELEMENT ID PCDATA gt lt ELEMENT LEMMA PCDATA gt lt ELEMENT LIST_OCCURRENCES OCCURRENCE gt lt ELEMENT LIST TERM CANDIDATES TERM CANDIDATE gt lt ELEMENT List_Variants Variant gt lt ELEMENT MORPHOSYNTACTIC_FEATURES SYNTACTIC_CATEGORY gt lt ELEMENT NUMBER OCCURRENCES PCDATA gt lt ELEMENT OCCURRENCE ID DOC SENTENCE START POSITION END_POSITION Texte gt lt ELEMENT SENTENCE PCDATA gt lt ELEMENT START POSITION PCDATA gt lt ELEMENT SYNTACTIC_CATEGORY PCDATA gt lt ELEMENT TERM CANDIDAT
14. line e Select the tagged corpus from which the terms have been extracted tt file tt fr It 1s supposed to be located in the corpora subdirectory of your project e Select the corpus file t xt It is supposed to be located in the corpora subdirectory of your project e Speficy the corpus language English en or French fr When the terminological data is loaded TERMINAE creates one additional file in the corpora directory e fTempCorpus2XML xml which is an xml version of the corpus If you have several documents each one must be processed by TreeTagger and the results must be concatenated in a single file where the various intial documents are separated by a document tag as shown below Text_n TAB Document TAB n where TAB is the tabulation character and n varies between 0 and x 1 x being the total number of documents 6 3 Perspective overview If everything works properly when loading the terminological data the window of Figure appears when the Terminological level step 1 perspective is first opened Terminae project DemoFinale Linguistic actions Perspectives Show View help BTerminological Level step 1 Project perspective Quescatunts posan Noun phrases Term Frequency Named en comments Occurrence 1 back angle quadrant of 1 ID occ1346 doc O sent 522 back assembly of the 3 1 However if the belt adjustment device for height is constituted by the belt anchorage as approved in accordanc
15. the value of the variant Modify type variant to modify the type of the variant e Remove a variant to remove a lexical variant of the selected term Modify use variant to modify the use of the variant CHAPTER 7 TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE 25 Syntactical relation management submenu e Add a syntactical relation head to add a phrase where the selected term is the head e Add a syntactical relation modifier to add a phrase with the selected term as a modifier e Remove a syntactical relation head to remove the selected relation e Remove a syntactical relation modifier to remove the selected relation Terminological relation management submenu e Add a terminological relation to add a terminological relation where the se lected term is terml or term2 e Remove a terminological relation to remove a terminological relation Occurrence management submenu e Add an occurrence to add an occurrence to the selected term You have to specify the document identifier and to type in the text of the occurrence e Remove occurrence s to remove an occurrence to the selected term Select the relevant occurrence s to indicate which occurrence has ve to be removed e Find occurrences for a term in corpus to find occurrences in a corpus for a term This functionality is useful when the forms are created and the user wishes to find occurrences for a term from a specified corpus th
16. upper middle class and we have had a version of a POS tagger which blows out y with some poor effects on the annotation The lexicalization of the ontology described in the SKOS file associates several lexical forms to a single labeling entity Each lexical form stores the lemmatized form of words don t forget it if you create your own SKOS As this form is also computed by the morphosyntactic parser lexicalizations are recognized independently of morphological variants Note that the technique is a bit over productive due to ambiguity of lemmas We plan to improve it by using the POS category On the other hand before annotating according to the SKOS file the labeling entity is checked against the ontology if it is not present there the annotation is skipped Discrepancies between SKOS and OWL files are logged in annotator log in the result directory and it can be wise to check the content of this file Chapter 11 Collaboration perspective This perspective is under building The perspective objectives are to give functionnalities to allow to compare and to merge two Terminae projects At the moment you may only compare two Terminae projects step 1 and step 2 Comparing menu Open comparing view step 1 to compare two versions of lexical unit list The first one is the list of the current project The second one is the lexical unit list of another project The lexical unit list is defined in the ensLexUnit file which is
17. used to link TERMINAE and Neon ToolKit It supports the creation of the concep tual level and many actions to connect it to the termino conceptual one Create a Neon project is used to create a Neon toolkit project If you want to work at the conceptual level you have to create a Neon project and to specify its name It is recommended to use different names for the TERMINAE and Neon projects Create Neon Toolkit ontology is used to create an ontology This ontology is part of the newly created Neon project Create a class ALT C is used to create a class in the previous ontology and from the selected termino concept A dialog window opens in which you have to give a name to the class and select a class father in the existing ontology The class can be visualized in the Neon toolkit Conceptual level OWL perspective see Figure 8 2 Note that the class is created with an annotation property in which the link to the source termino concept and its identifier is saved Once it has been linked to a class at the conceptual level the termino concept is displayed in blue color in the TerminoConcept tree To ontology level is used to switch from the termino conceptual perspective to the OWL one This action opens the OWL perspective and shows the class corresponding to the selected termino concept Link to Neon project is used when one wants to exploit an existing Neon toolkit project Link to Neon ontology is used when one wants to exploit
18. ANNEX NPS Proper noun plural PDT Predeterminer POS Possessive ending PP Personal pronoun PPS Possessive pronoun RB Adverb RBR Adverb comparative RBS Adverb superlative RP Particle SYM Symbol TO to H Interjection B Verb base form BD Verb past tense BG Verb gerund or present participle BN Verb past participle BP Verb non 3rd person singular present BZ Verb 3rd person singular present DT Wh determiner P Wh pronoun PS Possesive wh pronoun RB Wh adverb 2222095095 c 12 7 TreeTagger French Tagset BR abreviation DJ adjective DV adverb ART article POS possessive pronoun ma ta interjection KON conjunction NAM NOM UM RO RO RO RO RO RO RP RP det preposition plus article au du aux des UN punctuation H dE HUOUUP YP P H Z H roper name O G n umeral Hb zo 5 O E 5 demonstrative pronoun indefinite pronoun Fl W personal pronoun O un possessive pronoun mien tien Ed L relative pronoun reposition Qo yoYI O O UN cit punctuation citation ENT sentence tag YM symbol ER cond verb conditional futu verb futur pu GS lt UN WW Y tU ty ty ty tu tU tu tu tz dd E E ve R impe verb imperative 42 CHAPTER 12 ANNEX 43 po impf verb imperfect vs infi verb infinitive pper verb past participle ppre verb present participle J
19. E ID LEMMA FORM List _Variants NUMBER OCCURRENCES LIST OCCURRENCES MORPHOSYNTACTIC_FEATURES comment lt ELEMENT TERM EXTRACTION RESULTS LIST_TERM CANDIDATES gt lt ELEMENT Texte PCDATA gt lt ELEMENT Variant EMPTY gt lt ATTLIST Variant type CDATA REQUIRED word CDATA REQUIRED use CDATA RE lt ELEMENT comment PCDATA gt 12 2 XML backup DTD for ENs The DTD of the XML file which contains named entities and their occurrences which is visual step 1 perspective ized in Terminological level 38 CHAPTER 12 ANNEX 39 lt ELEMENT DOC PCDATA gt lt ELEMENT END POSITION PCDATA gt lt ELEMENT FORM EMPTY gt lt ELEMENT ID PCDATA gt lt ELEMENT LEMMA PCDATA gt lt ELEMENT LIST_EN NAMED ENTITY gt lt ELEMENT LIST OCCURRENCES OCCURRENCE gt lt ELEMENT LIST_SENT SENTx gt lt ELEMENT List_Lemme EMPTY gt lt ELEMENT List_Variants EMPTY gt lt ELEMENT NAMED ENTITY ID LEMMA FORM List_Variants Types NUMBER_OCCURRENCES LIST_OCCURRENCES LIST_SENT lt ELEMENT NUMBER OCCURRENCES PCDATA gt lt ELEMENT OCCURRENCE ID DOC SENTENCE START POSITION END_POSITION Texte gt lt ELEMENT SENT ID offset phrase List_Lemme gt lt ELEMENT SENTENCE PCDATA gt lt ELEMENT START POSIT
20. ION PCDATA gt lt ELEMENT Texte PCDATA gt lt ELEMENT Types type gt lt ELEMENT offset PCDATA gt lt ELEMENT phrase PCDATA gt lt ELEMENT type PCDATA gt 12 3 EnsLexUnit DTD The DTD of the XML file which contains terms named entities and their occurrences which is visualized in Terminological level step 1 perspective lt ELEMENT DOC PCDATA gt lt ELEMENT END POSITION PCDATA gt lt ELEMENT FORM PCDATA gt lt ELEMENT ID PCDATA gt lt ELEMENT LEMMA PCDATA gt lt ELEMENT LIST_EN NAMED ENTITY gt lt ATTLIST LIST_EN numeroDocument CDATA REQUIRED gt lt ELEMENT LIST OCCURRENCES OCCURRENCE gt lt ELEMENT LIST_SENT SENTx gt lt ELEMENT LIST TERM CANDIDATES TERM CANDIDATE gt lt ELEMENT List_Variants Variantx gt lt ELEMENT MORPHOSYNTACTIC FEATURES SYNTACTIC CATEGORY lt ELEMENT NAMED ENTITY Ens Variants ID LEMMA LIST_OCCURRENCES LIST_SENT NUMBER_OCCURRENCES Types lt ELEMENT NUMBER_OCCURRENCES PCDATA gt lt ELEMENT OCCURRENCE ID DOC SENTENCE START_POSITION END_POSITION Texte gt lt ELEMENT SENT EMPTY gt lt ATTLIST SENT ID CDATA REQUIRED gt
21. R DE Description skos related Description rdf about CDATA REQUI DF xmlns DF xmlns rdf type skos hiddenLabel x gt rdf Description D word CDATA REQUI gt rdf CDATA REQUI RED gt type EMPTY gt type rdf resource Cl altLabel editorialNote hiddenLabel note prefLabe PCDATA PCDATA gt L prefLabel xml lang related EMPTY gt related rdf resource CDATA REQUI skos CDATA REQUI HPC DATA FREQUI PCDATA PCDATA DATA RED gt D RE gt gt gt gt en fr RE REQUI RE D D use CDATA RE RE skos altLabel skos note D RED CHAPTER 12 ANNEX 41 12 5 Thesaurus DTD The DTD of the XML file which contains a thesaurus which is visualized in Terminae Ter minoConceptual level perspective A thesaurus contains a collection of terminoconcepts Each terminoconcept is described by an ID a natural language definition corpus occurrences a pre fLabel a set of see_also a set of synonyms altLabel a set of children and its father lt ELEMENT DOC PCDATA gt lt ELEMENT END_POSITION PCDATA gt
22. TERMINAE User Manual v a Sylvie Szulman Paris 13 2014 January Abstract TERMINAE is a platform that assists users in designing termino ontological resources from texts It can be used by terminologists to build terminological forms and by knowledge engi neers to build either thesaurus expressed in SKOS or ontologies organising concepts and lexical units in a formal way supporting inferences This platform allows to link textual elements to terminological and conceptual resources The acquisition corpus may contain one or several documents The supported languages are English and French Keyword list Ontology acquisition terminology assisting tool Executive Summary This document is the user guide of TERMINAE TERMINAE is a platform that assists users in the design of termino ontological resources from texts It is used to build from texts e thesaurus expressed in SKOS and e ontologies organising in a formal way the concepts associated to the terms and supporting inferences This platform allows to link textual elements to terminological and conceptual resources The corpus may contain one or several documents The supported languages are English and French TERMINAE is organised in three main levels the first step of the terminological level enables to constitute the set of terms of the corpus its second step organises these according to lexical and syntactic relations the termino conceptual level organizes the termin
23. Ww pres verb present Simp verb simple past subi verb subjunctive imperfect ve lt sssssss lt o Hoe eee ee w po subp verb subjunctive present lt Element rdf Description skos preflabel skos altlabelx rdf type gt lt ATTLIST rdf Description rdf about CDATA gt lt Element prefLabel PCDATA gt lt Element altLabel PCDATA gt lt Element rdf type EMPTY gt lt ATTLIST rdf type rdf resource CDATA gt 12 8 Use ANNIE to extract named entities This annex describes the procedure to be followed to use ANNIE to extact named entities from a given document only one document can be processed at a time Note that the following procedure is extracted from the Gate documentation for processing English corpora http gate ac uk sale tao splitch3 html GATE enables you to extract named entities from plain texts and annotate your corpus with it GATE is distributed with an IE system called ANNIE ANNIE relies on finite state algorithms and the JAPH language Take one large pile of text documents emails etc Call this your corpus If you right click on Language Resources in the resources pane select New then GATE Document the window Parameters for the new GATE Document will appear Once you indicate the corpus to work on it you can call for ANNIE From the File menu select Load ANNIE System To run it in its default state choose with Defaul
24. al Terminae submenu is proposed on MacOS systems It gives access to the standard application main operations information About Terminae Preferences Hide Terminae Quit Terminae Chapter 5 Project management perspective TERMINAE starts with the project management perspective This perspective has 2 views Fig 5 1 Terminae Prototype LIPN Perspectives Project actions Show View help a E Project perspective 5 i ject i jeo 7 Project comments t range Project nan Information Hello in Terminae tool You have to set up its use Create a Terminae project or Import one Figure 5 1 Project management perspective e The left view presents the project information if a project has been already defined project corpus thesaurus and author s names e The right view is a text editor where the user may write comments To save the comments you have to click on the right click of the mouse ctrl s 5 1 Project actions menu A project consists of all data used or created by TERMINAE when building a specific termino ontological resource from a given corpus see Section for a description of the project structure The corpus is in a txt file it is advised to use utf 8 encoding See section 6 2 to have the description of the used files You can either e Create a new project Create Terminae project if you start to build a specific termino ontological resource from a given corpus You have to spe
25. an existing ontology in a specified project Link to a classis used to link a termino concept to an existing class Create an ObjectProperty ALT O is used to create an objectProerty from a termino conceptual relation A dialog window opens and you have to enter the name of the property the father object property its domain and range The objectProperty is created with an annotation property in which the name and type of the source termino conceptual relation are saved Link a RTC and an ObjectProperty is used to link a termino conceptual re lation to an existing objectProperty Link a RIC and a class is used to link a termino conceptual relation to a an ex isting class Create classes and TCs is used to derive a set of classes from a set of selected termino concepts If these termino concepts have termino conceptual relations object Properties are created and linked to these source relations Create classes and TCs without dialog offers the same functionality as above but there without dialog The default values are systematically kept name of class name of terminoconcept CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE 31 name of objectproperty name of the RTC if termino concepts are linked by a isKindOf link the corresponding classes are in the same hierarchical order e Link to an individual is used to link a termino concept to an individual You have to enter the individual name and select th
26. ation view Load terminological form from another project to load a terminolog ical form from another project The terminological form file is copied into the termino logical form directory and the terminological form is visible in the list of terminological form list CHAPTER 7 TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE 23 e Compare terminological form list with another to compare a list of terminological forms with another list of terminological forms This functinality is useful when there are many users on the same project A dialog window opens to define the tableTermeFiches file to open This file is in the terminoFormDir directory of a project e Add a set of terminological forms from another project to add a list of terminological forms from another project A dialog window opens to define the table TermeFiches file to open This file is in the terminoFormDir directory of a project All the forms defined in the tableTermeFiches are added e Export skos form set to export in skos the set of forms For each form the term is described by the skos prefLabel the variants are described by skos altLabel the use of each variant is described by skos note as altLabel variant use except for the hiddenLabel type which is described by skos hiddenLabel The comment is described by skos editorialNote terminoConcepts are described by skos related see DTD 12 4 e Merging 2 forms to merge
27. cify The name of your project CHAPTER 5 PROJECT MANAGEMENT PERSPECTIVE 9 The name of the directory where you want to locate your project A default directory is proposed but click on the cancel button and navigate through the file system if you want to choose another directory e Switch from one project to another Load Terminae project note that only one project can be opened at the same time You are first offered to navigate through the file system to select the directory containing the concerned project directory Be aware if you change project when a perspective X other than project perspec tive is open you have to reload manually the data of the X perspective e Export the current project Export project A zipped file is created in which all the required directories and files are included If you have created a Neon project its directory is also included in the zipped file e Import an existing project Import project The project to be imported is repre sented as a zipped file containing the project directory with all the required subdirectories and files You do not have to unzip the file but you have to specify The zipped file to load The name of the directory where you want the project to be imported e Modify author Modify author allows to modify the project s author e Modify corpus language defines the corpus language french fr or english en By default it is initialize
28. current version of TERMINAE platform is compiled using Oracle 1 7 Java virtual machine e It relies on UTF 8 text encoding e It can be used for English and French 3 1 Installation To install TERMINAE you need java version 1 7 Download the version of the platform for your system from the http lipn univ paris13 fr terminae index php Download web page and unzip the downloaded file The default language is English but it can be changed If you want to work with a French platform edit the terminae ini file and change the line n1 enby n1 fr FR This file is located in the Terminae directory on Linux and Windows systems and in the Terminae app Contents directory on MacOS systems 3 2 How to start To launch the TERMINAE platform click on the Terminae application either Terminae on Linux system Terminae exe on Windows system or Terminae app on MacOS Initially the project management perspective Terminae Project perspective is open and you have to import or create a project 3 2 1 Project location and structure In any case you have to define your project directory On Linux and Windows systems it is advised to locate it in the workspace directory created by the eclipse application A project has a fixed structure represented as the 6 following subdirectories CHAPTER 3 TECHNICAL CHARACTERISTICS 4 corpora Contains the corpus data raw and tagged and the results of named entity recognition tools The current
29. d by the platform used language e Create corpus from many documents allows to create a corpus from many documents This functionnality is used before opening the terminological perspective step 1 Each document has to be in a txt file and has to be processed through TreeTagger tool The corpus involves all the txt files selected by the user in the corpora directory It is defined in a txt file A tagged file involving all the tagged files corresponding to the txt files is created If the used term extractor is TermoStat a file involving all the results of TermoStat on each document is created e Add document names allows to give the names of the several documents The user gives names separated by semi colon in the same order as documents in the corpus e Remove document names allows to remove the names of documents For modifying a name you have to remove all names and to add all names e Cluster projects corpora extraction allows to cluster corpora extrac tion with the same extraction tool e TMX result cluster allows to cluster result extractions with TMX tool This tool 1s not open access 5 2 Help menu The Help information is not available yet CHAPTER 5 PROJECT MANAGEMENT PERSPECTIVE 10 5 3 Show View menu Each perspective has many views and a main view which is on the left side of the perspective A click on an item in the main view change values in other views These views may be closed by the user or h
30. d terms allows to save only invalidated terms Save terms with comments allows to save only terms which have a com ment 6 4 3 Cleaning submenu This menu allows to clean up the list of terminological units by removing a certain category of terms or named entities Various options are proposed e Remove terms listed in a file allows to suppress all the terminological units that are listed in a given file You have to give the name of that file in which the stop words are listed one at each line e Remove terms involving given characters allows to clean the list of ter minological units on a character basis You have to type in the list of forbidden characters CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 19 e Remove single character terms allows to suppress the single character terms from the list of terminological units e Removing adjectives allows to suppress the terms that are tagged as adjectives e Removing numbers allows to suppress the terms that are numbers e Removing adverbs allows to suppress the terms that are tagged as adverbs e Removing terms from its frequency allows to suppress the terms for which its frequency is less than a number for example 0 6 4 4 Terminological form actions This menu is used to define terminological forms described in next chapter e New terminological form s CTRL T allows to create terminological form s for selected term s Once terminological form s is are
31. ded A variant has 2 attributes its type abbreviation acronym or lexical variant its use allowed forbidden hidden recommended hidden is used to save the vari ant as skos hidden e The Relations view presents the relations that the terminological unit has The Syntactical relations list shows the phrases to which it belongs ei ther as a head or as a modifier The syntactical information is provided by YaTeA analysis of the corpus The Terminological relations list shows what are its terminological re lationships In the current version of the TERMINAE platform the terminological relations have to be filled manually e The Comment view to indicate comments You have to save the contents by clicking on save right click of the mouse e The Occurrences view lists all the occurrences of the terminological unit that have been identified They can be occurrences of the canonical form or of any of its alternative variant form e The Related termino concepts view shows to which termino concepts the ter minological unit is related CHAPTER 7 TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE 22 As indicated in the second column of the Terminological form list view a termi nological form can be In progress or Completed Each terminological form is saved in an XML file in the terminoFormDir directory The list of terminological forms is saved in the filet ableTermeFiches xmlinterminoForml Di
32. downloaded from the web service named termostat_res txt or a list of lemmatized terms one by line You must also give the name of the corpus if you exploit one and the name s of the au thors s of the future resource s When the project is created its main characteristics are presented in the Terminae project information view on the left by default of the project perspective and you can start work ing on 1t 3 3 Hidden files The software creates 2 hidden files to manage the Terminae application e The file Terminae contains the name of the current project It is created in the di rectory where you launch the Terminae application You normaly do not need to modify it e The file nameOfProject xcfg defines the configuration of each project the set of files exploited by the project Advertised user may easily understand its content and may happen to change it in tricky cases e g for renaming directories or files These files are text files or modifiable xml files Chapter 4 Main menu Figure 4 1 presents the main menu of the TERMINAE platform which is accessible from any perspective It presents 4 items which are associated to specific actions or submenus Perspectives Project actions Show View help Figure 4 1 Main menu e The action submenu gives access to the specific functionalities accessible at the Terminae level where you are currently working The name of the action menu depends to the per spective fr
33. e class from which it belongs thanks to dialog windows e Create an individual is used to create an individual You have to enter the indi vidual name and select the class from which it belongs thanks to dialog windows e Synchronisation thesaurus and ontology is used when an ontology has lost its annotations described in the thesaurus It supposed that the ontology has a good le 5 Terminae project TestDemo Y Terminae links Perspectives Show View Search help E Neon toolkit conceptual level OWL Terminae TerminoConceptual level 5 Terminae Terminological level step 2 El Ter 4 Ontology Naviga 3 O Entity Properties 3 a o Gte gt Attribute URI khttp lipn univ paris13 fr RCLN terminae Audi Airbag gt b BusinessObj gt O Category Annotations b O Conditioning 5 Annotation Property Value Type D Device b O Dimension concept Airbag TerminoConcept v Function Create new Adjusting E TC Airbag b Anchorage Anchorage Buckle b ChildRest SafetyBe Seat CSSS p D O18 wD ORP 0 Y Co D Class Restrictions Taxonomy Annotations Source View Figure 8 2 Neon toolkit conceptual level OWL perspective Chapter 9 Neon toolkit Conceptual level OWL perspective The conceptual perpective is a Neon toolkit plugin version 2 4 to which a specific menu has been added for the TERMINAE platform to link the conceptual and termino conceptual levels When using Neon toolkit concep
34. e corpus is defined by two files a txt file and the result of the txt file by treetagger e Find occurrences for all terms to find all occurrences in a corpus This functionality is useful when the forms are created and the user wishes to find occurrences for all terms from a specified corpus the corpus is defined by two files a txt file and the result of the txt file by treetagger 7 3 3 Termino concept management submenu This submenu proposes three different actions e Create a termino concept ALT G to create a termino concept linked to the selected terminological unit The termino concept is added to the current thesaurus If the terminological unit is a named entity the type of the named entity may also give bearth to a termino concept and a kindOf link is created between the two termino concepts e Remove a termino concept to remove a termino concept from the current the saurus e To TerminoConceptual level to switch from the Terminae Terminolo gical level step 2 perspective to the Terminae TerminoConceptual level perspective e Rename termino concept to rename a termino concept Chapter 8 Terminae TerminoConceptual level perspective This perspective must be opened from the Perspective submenu in the main menu by se lecting the Terminae TerminoConceptual level 8 1 Perspective overview The Terminae TerminoConceptual level perspective presentation is very similar to that of the Termi
35. e input is a list of term candidates i e words or group of words which on a linguistic basis could possibly figure in a terminology of the domain a list of its main terms The goal of this level is in a first step chapter 6 to constitute clean and improve the list removing parasistic or irrelevant proposals A second step 7 involves grouping those which are morphologic variants of the same term and collecting linguistic relations This work relies on the list of occurrences of each term which are gathered with linguistic information in terminological forms e The termino conceptual level chapter 8 1s specific to TERMINAE Whereas terms are at the vocabulary level the goal is now to analyse the use of terms in the corpus at the semantic level The work is to recognize and distribute the various senses of this term into several termino concepts distributing also the occurrences of the term between senses At the same time the termino concepts of the form can be tagged as having a synonym in an other form or being otherwise more loosely related e The ontological level see chapter 9p now relies on termino concepts and their relations to build the ontology First synonym termino concepts should only yield one concept All the related termino concepts help building the hierarchical relations and defining the roles as can do some other linguistic information gathered during the process Chapter 3 Technical Characteristics e The
36. e she may want to see a view of another perspective which is not in the used perspective only one perspective may be selected This menu is used to reopen a view that has previously been closed Click on the single item Other to visualise the list of available views and choose again Other to find TERMI NAE views Select the view you want to reopen or to see and be aware that the view may be dependant of one or the other perspective Chapter 6 Terminological level step 1 perspective The Terminological level allows to browse and modify the list of domain specific lexical units that have been extracted from the source corpus using term extraction and named entity recog nition tools such as YaTeAl or the web service for TermoStatPland ANNIE You may also use a list of terms see 6 2 4 if you have another term extractor 6 1 Term extractor uses TERMINAE assumes that the acquisition corpus has been processed by the term extractor be forehand and possibly ANNIE beforehand 6 1 1 TermoStat web service Termostat Web is usable after login The software is still usable for free for research purposes you only need to create an account You have to upload an utf 8 txt file involving a document You download a part of the results by clicking on a disk icon The result is given in a txt file named termostat_res txt Put this file in the repExtractTerm directory of your project The acquisition corpus has to be also processed by Tre
37. e with the provisions of Regulation No 14 the Technical Service responsible for testing back of the manikin 1 may at its discretion apply the provisions of paragraph 7 7 1 bar 5 Al ber housing i Occurrence 2 ID occ6226 doc O sent 706 basis 2 below except in the case of retractors having a pulley or strap guide at the upper belt anchorage being 21 when the load will be 980 daN and the length of strap remaining wound on the reel shall be the Length resulting from Locking as close as possible to 450 mm from the end of the strap being 300 mm belt 165 Occurrence 3 ID occ6802 doc O sent 120 belt adjustment device 8 the type and dimensions of the belt anchorage on the seat of the seat anchorage and of the belt anchorage 18 affected parts of the vehicle structure belt arrangement O A belt assembly 48 ies ID occ133 doc O sent 53 belt assembly compone 1 Parts of the belt assembly including the necessary securing components which enable it to be helt being 1 attached to the belt anchorages belt corrosion test 1 canana belt of the type 1 ID occ563 doc O sent 196 belt twisting In the case of a restraint system the description shall include drawings of the vehicle structure and of the seat structure adjustment system and attachments on an appropriate scale showing the sites of belt type 4 the seat anchorages and belt anchorages and reinforcements in sufficient detail together with a
38. eTagger Use the script for UTF 8 Put the treetagger file and the corpus file in the corpora directory of your project If your corpus involves many documents each document has to be processed by TermoStat tool and TreeTagger tool You can use 5 1 item menu to build the corpus from all its documents 6 1 2 YaTeA tool TERMINAE assumes that the acquisition corpus has been processed by TreeTagger YaTeA takes as input e The corpus file e atagged corpus required e a list of terms extracted from it as input required see Section 6 2 2 http search cpan org 7Ethhamon Lingua YaTeA 0 621 http olst ling umontreal ca drouinp termostat_web 3http gate ac uk ie annie html 11 CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 12 A Choice an used term extractor Je Select a term extractor Term list TermoStat Yatea OK Cancel _ Figure 6 1 Term extractor used 6 2 Data Terminological files When you open the Terminological level perspective you have to specify the term extractor used see figure 6 1 you have three choices e Term list see 6 2 4 e TermoStat see 6 2 1 e Yatea see 6 2 2 You may also want to work with named entities see 6 2 3 6 2 1 TermoStat Term files First you have to specify the terminological data you want to start with note that additional data can be loaded afterwards e Load a term list Load TermoStat file which is supposed to be located
39. ed only with an unlabelled right arrowhead which must be selected in order to make visible the available annotations Open the default annotation set and select some of the annotations to see what the ANNIE application has done Having selected an annotation type in the annotation sets view hovering over an annotation in the main resource viewer or right clicking on it will bring up a popup box containing a list of the annotations associated with it from which one can select an annotation to view in the annotation editor or 1f there is only one the annotation editor for that annotation Now to save your corpus annotated with ANNIE right click on a document in the resources tree and choose Save as XML In addition all documents in a corpus can be saved as individ ual XML files into a directory by right clicking on the corpus in the resources tree and choosing the option Save as XML For French corpora you have to install treetagger and load the Tagger Framework plugin In the resource directory you find TreeTagger FR Tokenization gapp You load this application in Gate platform You also load the Lang French plugin and the french gapp Gate application The selected processing resources are defined in Figure 12 1 De TreeTagger FR T UN ANNIE Selected Processing resources A Name Type s reset E Document Reset PR al RegEx Sentence Splitter RegEx Sentence Splitter 4 French Gaze
40. erms have been extracted tt tt fr file It is supposed to be located in the corpora subdirectory of your project e Select the corpus file fxf e Speficy the corpus language English en or French fr When the terminological data is loaded TERMINAE creates two additional files in the corpora directory e fTempCorpus2XML xml which is an xml version of the corpus If you have several documents each one must be processed by TreeTagger and the results must be concatenated in a single file where the various intial documents are separated by a document tag as shown below Text_n TAB Document TAB n where TAB is the tabulation character and n varies between 0 and x 1 x being the total number of documents 6 2 3 Named entity files You may also want to work with named entities In that case you need two files that are output by the ANNIE named entity recognition tool see Annex for details on the file format and which are expected to be located in the corpora subdirectory of your project e The first xml file indicates which named entity types you are interested in e The second xml file contains the list of named entities extracted by ANNIE To create such files follow the procedure described in Annex CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 14 6 2 4 Term list files e Loadatermlist Load term file which is supposed to be located in the repExtractTerm subdirectory of your project The format is a term by
41. he terminological form of the unit that has been selected in the Terminological form list see Section 7 2 20 CHAPTER 7 TERMINOLOGICAL LEVEL STEP 2 PERSPECTIVE 21 Note that when the list of terminological forms is selected you can find any terminological form by typing the first letter of its canonical terminological unit 7 2 Data Terminological forms An example of terminological form is displayed on the right part of Figurel7 1 A terminological form gathers all the lexical and terminological information that has been collected or manually added for a given term or named entity It is usually composed of the following views e The Lexical information view is a form in which you can freely create modify or suppress some fields By default four lexical fields are defined Term extractor which range is X if the terminological unit has been ex tracted by term extractor named X form which gives its form grammatical type which gives its grammatical category NE extractor Gate which the range is its type if the lexical unit is a recog nised named entity The first three fields are automatically filled in by information provided by term extractor The last one is an ANNIE Gate information e The Variants view lists all the lexical forms that are associated as variants to the canonical form They can be found in the corpus and automatically added if a cluster has been created beforehand or manually ad
42. identifying documents Add as variant to add the selected lemma as a variant of a term already defined by a terminological form Sort on TC length to sort candidate terms by their length CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 18 ww Terminae project TestDemo Linguistic actions Perspectives Show View help EB E Terminae Terminological level step 1 E Terminae Project perpective Lexical units E Occurrences EE TETE SE TEET 1 Named entity type A Term Freque Named entity comments AI Bat eres Fa a ue Orno wN J aaa CONDUCTING APPROVAL TE 1 a Noun phrases CONFORMITY il Unknown i Occurrence 1 i i ID occ5661 doc 0 sent 1819 Sii i i sie BISO F2 Reduced Height CRF base 1 Forward Facing toddler CRS Occurrence 2 ESPA O ID occ5662 doc 0 sent 1820 Centreplane of occupant 1 i B11S0 F2X Reduced Child ia tinka i IHeiaht Forward Facina Figure 6 4 Select an occurrence identifier du Y v Fill the form The document number and text occurrence fields are required Term CRS Document ID 0 Sentence ID Occurrence text CE OK Cancel Figure 6 5 Add occurrence for a term e Invalidated term Ctrl I to invalidate a term e Re enable term to re enable a term which has been invalidated e Partial saving Save validated terms allows to save only validated terms Save invalidate
43. in the repExtractTerm subdirectory of your project e Select the tagged corpus from which the terms have been extracted tt tt r file It is supposed to be located in the corpora subdirectory of your project e Select the corpus file t xt It is supposed to be located in the corpora subdirectory of your project e Speficy the corpus language English en or French fr When the terminological data is loaded TERMINAE creates one additional file in the corpora directory CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 13 e fTempCorpus2XML xml which is an xml version of the corpus If you have several documents see 5 Ip each one must be processed by TreeTagger and the results must be concatenated in a single file where the various intial documents are separated by a document tag as shown below Text_n TAB Document TAB n where TAB is the tabulation character and n varies between 0 and x 1 x being the total number of documents 6 2 2 Yatea Term files First you have to specify the terminological data you want to start with note that additional data can be loaded afterwards e Loadatermlist Load Yatea file whichis supposed to be located in the repExtractTerm subdirectory of your project e Indicate how many documents your corpus encompasses Note that documents are num bered starting from 1 if there are several of them but that a single document has number 0 e Select the tagged corpus from which the t
44. kit perspective to link the concep tual and the termino conceptual levels of Neon and TERMINAE projects and of the resulting termino conceptual resources e To terminoConceptual level is used to switch from the Neon toolkit Conceptual level OWL perspective to the Terminae TerminoConceputal level per spective Clicking on this action item re opens the termino conceptual perspective and 32 CHAPTER 9 NEON TOOLKIT CONCEPTUAL LEVEL OWL PERSPECTIVE 33 selects the termino concept associated with the class initially selected in the conceptual perspective It is not yet implemented for objectProperties e Create a termino concept is used to create a termino concept and link it to the selected entity This functionality is useful when you want to add thesaurus information to an existing ontology You start from an existing class and create a termino concept in the thesaurus of the TERMINAE project e To link a class to a TCis used to link a class to an existing termino concept in the thesaurus of the TERMINAE project e Extract thesaurus from lexicalized ontology is used to create a the saurus from a lexicalized ontology A lexicalized ontology includes for many entities class objectProperty skos annotations as skos prefLabel skos altLabel skos definition skos hiddenLabel From these annotations a terminoconcept network as thesaurus is created at the ter minoConceptual level Click on terminoConceptual perpective to vis
45. ms The initial forms are removed If some field is not defined the click on the commit button will bring about an error Only the cancel button may be used to finish the merge action e Remove all terminoConcepts to remove all created terminoConcepts e Refresh to trigger a refresh of the window e Select from state to select all forms having same state A dialog window appears to select the wished state 7 3 2 Feature management submenu This submenu proposes various actions related to the detailed information provided for a given terminological unit and recorded in its terminological form It proposes one item and five submenus which are presented in the following subsections e Modify author to modify the author of the form By default the author is the project author e Lexical entry management submenu e Variant submenu e Syntactical relation management submenu e Terminological relation management submenu e Occurrence management submenu Lexical entry management submenu e Add a lexical entry to add a lexical entry for the selected term You have to type in the entry name and its value separated by two points e Modify value lexical entry to modify the value of the lexical entry e Modify lexical entry to modify the lexical entry e Remove a lexical entry to remove a lexical entry Variant submenu e Add a variant to add a lexical variant of the selected term Modify value variant to modify
46. n the project is imported its main characteristics are presented in the Terminae project information view on the left by default of the project perspective and you can start work ing on it http www ims uni stuttgart fr projekte corplex TreeTagger http gate ac uk ie annie html 3http search cpan org T7Ethhamon Lingua YaTeA 0 621 http olst ling umontreal ca drouinp termostat_web CHAPTER 3 TECHNICAL CHARACTERISTICS 5 3 2 3 How to create a project To start working on a new project e Goto the main menu e Click on Terminae project actions Click on Create Terminae project A first dialog window appears in which you must indicate the name of the project A second dialogue window appears in which you must indicate in which directory you want to locate the project A directory with the same name as the project is automatically created with 6 subdirectories To start working on your project to build termino ontological resource from a given corpus you need to have at least the following files in your project directory more details in 6 2 e In the corpora subdirectory The raw corpus txt A tagged version of the raw corpus txt tt file as output by TreeTagger The extension may be tt or ttfr or TT or TTFR e In the repExtractTerm subdirectory the list of terms that have been extracted from the tagged version of the corpus by YaTeA xml file or the list of terms extracted by Ter moStat
47. nological level step 2 perspective It is composed of two main parts with a global view on the left and a set of more detailed and dependant views on the right see Figure 8 1 Terminae project LegilocalJorfRandoCGCT Perspectives TerminoConceptual actions Show View help Q B Project perspective Terminological Level step 1 Terminological level step 2 TerminoConceptuallevel v Y t TerminoConcept tree Terme_definition view del TerminoConcepts Term chemin rural MArr t Natural language definition Hi QuestionsPourLeTeamTaxo E TerminoConceptOntoJuridique 5 Occurrences amp p E BienDeLaCollectivit Territoriale Noun Verb phrases Occurrence 1 ID free_ID2185 doc 6 sent 128 W BienPriv DeLaCollectivit Publique IL a largi La reconnaissance du droit La promenade notamment par Le principe de pr somption de l affectation de certains chemins ruraux au public par La Loi d orientation pour l am nagement et Le d veloppement du territoire de juillet 1999 LOADT r formant E Bienimmobilier E CheminForestier Le code rural article L161 1 et suivants CheminRural cc ae E DomainePriv DeLaCommune OTC relations amp y Hi DomainePriv DuD partement A Terml nameofr Term2 Skostype Ontology prefLabel definition E DomainePublicDeLaCommune M DomainePublicDuD partement E Synonym TC view amp 0 6 Links TC view amp o E Nageurlmprudent Rand Synonyms Links What Value andonneu
48. nt which 95 6 Date approved the vehicle type or safety belt or restraint system type 10 147 Date CO 2000 2 Date 2575 1 Date 1 October 2000 1 Date 16 July 2006 1 Date 16 July 2008 1 Date Rock il FirstPerson Person Japan 1 Location the Netherlands 8 Location E4 8 Money Administrative Department a Organization Technical Service 9 Organization Figure 6 3 Visualisation of terms and named entities CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 17 6 4 2 Term Management submenu This menu allows to manage terminological data i e to visualise the list of terminological units and edit it by clustering removing or adding some of them For all removing actions the lemmas of removed lexical units are written in blacklist file which is in repExtractTerm directory This list may be visualized To open the view go to schow view menu on the top of the window then click on the other then click on BlackList view A dialog window open where you have to indicate the blacklist file to visualize The Term Management menu proposes 9 different actions Visualize all terms to redisplay the list of terminological units after a search sequence Visualize validated terms to visualise only validated terms Visualize non validated terms to visualise only non validated terms Vizualise invalidated terms to visualise invalidated terms Find a term to search for a specific unit on the basis of its beginning
49. of a morphological analyzer and POS tagger in three tab separated columns word POS lemma The Annotator and SemEx can be found from http www lipn univ paris13 fr szulman Annotator annotator htmlorhttp www lipn univ paris13 fr fr rclnorfromhttp www lipn univ paris13 fr fr rcln logiciels 34 CHAPTER 10 ANNOTATOR PERSPECTIVE 35 e A lexicalization file following the SKOS standard such as provided by TERMINAE when 1t builds an ontology This file can also be created or modified with a plain text editor Its DTD is defined in the annex part see e One or several ontologies in OWL format 10 2 How to proceed The ontologies and their lexicalization can generally be reused for several documents The POS file is of course document dependent and must be generated before annotating When the Annotator perspective is open 1t supposes that the directory of your project is the defined workspace and that the file s encoding is UTF 8 XQ Annotator RCLN Show View About Ej E Annotator perspective A Annotatedtext view in rdfa 38 Document text file ave for Seme Browse POS Morpho syntax tt file Browse Thesaurus rdf file Browse Ontoloay owl file Ontology files Browse Figure 10 1 The Annotator window Then a window opens see fig 10 1 with four fields in the left pane Used resources and with a blank right pane entitled Annotated text view
50. ology according to semantic relations the third level the ontological level enables to create a formal ontology out of the list of termino concepts created at the second level This document describes the functionalities of the TERMINAE platform The first chapter describes the technical characteristics and the installation instructions The following chapters present the main menus of the platform that are accessible from its main window Contents 1 Introduction The Terminae method 3 Technical Characteristics 3 1 Installation suicide 44444446 LECCE ABADIA 3 2 1 Project location and structurel 3 2 2 How to import a project oo nn 3 2 3 How to create a project sra A ss 24 2 o di due het A EE ER E EV a N 4 Main menu 5 Project management perspective 5 1 Project actions Mea A gees Gee ehe taste ore qhar ace ae ce ee es byrne a gt debe ot ae evade ety ne Gon a Gok go a geass 6 Terminological level step 1 perspective 6 1 Termextractoruses 2 ooa e e e a 6 1 1 _TermoStat web Service 6 12 YaleAtooll 2 2 2 ea Es as ana coros ee A re 6 2 1 TermoStat Term files 6 2 2 Yatea Term files woe E ee a E ge E 6 2 4 Term list files se acia ee cs ees aoe es epee a Gass Goals Go ee ne fo ee a eG sot atid a Gee a Gea tee ee ee 6 4 1 File submenul
51. om which it depends Terminae project actions Linguistics actions Terminological actions TerminoConceptual actions and Terminae links e The Perspectives item allows to open new perspectives you simply have to click on the name of the perspective you want to open in the perspective list that appears 8 perspectives are accessible Project perspective which is the default perspective which is opened when a project is loaded It is presented in Section 5 Terminological level step 1 perspective see Section 6 Terminological level step 2 perspective see Section 7 TerminoConceptual level perspective see Section 8 Neon toolkit Conceptual level OWL perspective see Section P Annotator perspective see Section 10 TMX perspective This perspective is used only for Legilocal project It permets to work with the Temis term extractor Collaboration perspective see Section 1 1 The 1 2 3 4 perspectives make up Terminae The OWL perpective belongs to Neon ToolKit 2 4 The annotator perspective marks the occurrences of given terms in a text with concepts and individuals of an ontology The collaboration perspective allows to compare two Terminae project This main menu slightly differe from on exploitation system to another 6 CHAPTER 4 MAIN MENU 7 e The item Help is proposed in all eclipse application it is not described in this report e An addition
52. plaining The output format is language independent as are the algorithms so the application can in principle be used for any language where its input makes sense namely where lemmatizing and POS tagging are possible and not too ambiguous The Annotator is included as a plugin in SemEx and in Terminae and can be used from them 1f preferred Only the installation differs In Terminae the Annotator may be used through the Annotaor perspective Linux specific Eclipse s browser calls native browsing libraries to do its work Under Linux you may have to install specific ones the present version of the annotator relies on Eclipse 3 7 which browser needs a proper installation of one of Mozilla 1 4 GTK2 1 7 x GTK2 XUL Runner 1 8 x 1 9 x and 3 6 x but not 2 x WebKitGTK 1 2 x and newer If your installed browser is either too old or too recent you can install also XULRunner the autonomous heart of Mozilla Firefox and Thunderbird to enable Eclipse browser In this case you have to specify where XULRunner is modify the annotator ini file in the executable s directory to initialize org eclipse swt browser XULRunnerPath e g Dorg eclipse swt browser XUL RunnerPath home szulman outils xulrunner sdk bin Of course you must replace home szulman outils xulrunner sdk bin with your own location 10 1 Input files To annotate a document you need 4 inputs e The document itself in a single text txt file e The output
53. presents the synonyms of the termino concept e The Links view holds for termino concepts related to named entities for which type information can be collected Typical links are brother father links It is also used for describing links to ontologies or links skos There are three area Links to describe type of link as OWL Skos unknown What to describe the link as Class individual hiddenLabel fr hiddenLabel Gen Value to describe the value Note that the meaning of a termino concept is not formally defined It is mainly described by its related occurrences 8 3 TerminoConceptual actions menu The action menu associated with the Terminae TerminoConceptual level perspec tive is the TerminoConceptual action menu It proposes 4 submenus which are pre sented in the following subsections e File submenu e Termino concept management submenu e Feature management submenu e Neon ontology submenu The corresponding actions are also contextually accessible from the right click of the mouse CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE 28 8 3 1 File submenu This menu allows to load and save termino conceptual data It proposes the following actions Load XML format to load a thesaurus in XML format see DTD in Annex 12 5 Save XML format to save a thesaurus in XML format Import SKOS to load an existing thesaurus in Skos format Export SKOS to export a thesaurus in Skos format A dialog window opens
54. r directory 7 3 Terminological actions menu The action menu associated with the Terminological level step 2 perspective is the Terminological action menu It proposes 3 submenus which are presented in the following subsections Form management submenu Feature management submenu Termino concept management submenu The corresponding actions are also contextually accessible from the right click of the mouse 7 3 1 Form management submenu This submenu proposes two actions related to terminological forms Select a form to search for a form on the basis of its beginning characters Visualize all forms to redisplay the list of all forms after a search sequence Modify terminological form state this action is used to note that the work on this terminological form is developed or is completed It acts as a comment aimed at the user Remove a terminological formto remove the selected terminological form To terminological level step 1 to goto previous pespective and to select the lemma corresponding to the terminological form New terminological form to create a terminological form from scratch A dia log window opens to define the corresponding term for which a terminological form is created Modify term to modify the term which identifies the terminological form Create a terminological form for a syntactical relation tocre ate a terminological form for a term selected in syntactical rel
55. r OWL Class http www projet LegiLocal fr onto OntoJuridiquettCheminRurz E VoieDeCirculationCarrossable M VoieDeLaCommune Figure 8 1 Terminae TerminoConceptual level perspective e The TerminoConcept tree view is by default presented on the left part of the perspective It shows the hierarchy of all the termino concepts that have been created 26 CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE 21 e The other views form the termino conceptual form of the termino concept that has been selected in the TerminoConcept tree see Section 8 2 Note that you can find a termino concept simply by typing its first letter in the TerminoConcept tree view 8 2 Data Termino conceptual forms The termino conceptual level is a bridge between the terminological level and the conceptual level the ontology It is made of a set of termino concepts which are themselves described by termino conceptual forms gathering the relevant information that has been collected or defined for those termino concepts A termino conceptual form is usually composed of the following views e TheNL definition view allows to enter a natural language definition for the selected termino concept e The Occurrences view presents the occurrences in the corpus of the lexical units to which the termino concept is linked e TheTC relations view presents the termino conceptual relations in which the termino concept is domain or range e The Synonym TC view
56. rag the son node upon the father node Remove kindOf link to remove a father of the selected termino concept Add a RTC to add a termino concept relation for the selected termino concept A first dialog window opens in which you have to give the name of the relation A second dialog window opens in which you have to click on ok if the selected termino concept is the domain and on cancel if not A third dialog window opens in which you have to give the name of the range or domain depending on the previous answer That termino concept must pre exist CHAPTER 8 TERMINAE TERMINOCONCEPTUAL LEVEL PERSPECTIVE 29 A choice dialog window then opens in which you have to select the skos type of the relation Remove a RTC to remove the selected termino conceptual relation Modify aRTC to modify a field of a RTC A first dialog window opens in which the user chooses the field Asecond dialog window opens in which the user defines the new value Add occurrence to add an occurrence to the selected termino concept Remove occurrence to remove an occurrence of the selecteed termino concept You have to select the identifier of the occurrence to be removed Create a terminological formto create a terminological form from a termino concept This functionality is useful when you want to add terminological information and occurrences to an existing thesaurus You start from an existing termino concept and create a terminological
57. rd amp p Done conseil du contentieux admini Done conseil du contentieux adminis In Progress Dipalationa 2 zi cour administrative d appel Done ve appe Syntactical relations Terminologia relations droit de propri t In progress droit priv In progress Head Modifier Term 1 name of relation Term 2 droit public In progress d lib ration du conseil ToDo juge des reconduites la front Done juge des r f r s Done libert d aller et venir In progress libert publique In progress pr sident de la section du con Done CE pr sidente de la section du co Done skosEditorialNote ToDo valider l URI 25 randonn e In progress skosChangeNote gt See gt skosEditorialNote am Reaction OU EonLentiaug lone skosChangeNote Prise en charge des sigles avec points sous section jugeant seule Done skosEditorialNote sous shetlofn reUnies Done skosChangeNote Factorisation maximale skosEditorialNote tribunal administratif Done skosChangeNote Changement d criture du d terminant Cr ation du lemme appel usucapion In progress T occurrences 3 No occurrence Figure 7 1 Terminological level step 2 perspective e The Terminological form list view is by default presented on the left part of the perspective It gives the lists of all the canonical terminological units for which a terminological form has been created the form can be In progress ToDo or Done e The other views form t
58. rrence view If it exists it may appear in blue If there are many group of the same words they appear in blue Chapter 7 Terminological level step 2 perspective This perspective can be opened either by creating a terminological form or from the main Per spective menu Terminological level step 2 7 1 Perspective overview The Terminological level step 2 perspective is composed of two main parts with a global view on the left and a set of more detailed and dependant views on the right see Figure 7 1 Sy Terminae project LegilocalRAROLot1VieMuni Je Terminological actions Perspectives ShowView help ES BTerminae Terminological level step 1 E Terminae Terminological level step 2 Terminae Project perpective Terminological form list 3 D Lexical information 23 m Terminological form A State Author admin Date of creation January 17 2013 acte administratif In progress Ei een Entry range type Variants use Seil er Term extractor Termostat abbrevation cour d app administrati assemblee du contentieux Done A 7 form cour administrative d appel abbrevation cour administrative e nen LALO gress grammatical type SN abbrevation cour administrative d conse E tat 3 Done NE extractor Gate abbrevation cour administrative d censal de pr fecture Rens Modifying author Eve acronym CUJTALIRA conseil de pr fecture d parter Done r Iz conseil de pr fecture inte
59. ts This will automatically load all the ANNIE resources and create a corpus pipeline called ANNIE with the correct resources selected in the right order and the default input and output annotation sets If without Defaults is selected the same processing resources will be loaded but a popup window will appear for each resource which enables the user to specify a name location and other parameters for the resource This is exactly the same procedure as for loading a pro cessing resource individually the difference being that the system automatically selects those resources contained within ANNIE When the resources have been loaded a corpus pipeline called ANNIE will be created as before The next step is to add a corpus and select this corpus from the drop down corpus menu in the Serial Application editor Finally click on Run from the Serial Application editor or by right clicking on the application name in the resources pane and selecting Run To view the results double click on one of the document contained in the corpus processed in the left hand tree view No annotation sets nor annotations will be shown until annotations JAPE is a Java Annotation Patterns Engine It provides finite state transduction over annotations based on regular expressions JAPE allows you to recognise regular expressions in annotations on documents CHAPTER 12 ANNEX 44 are selected in the annotation sets the Default set is indicat
60. tteer ANNIE Gazetteer e ANNIE POS Tagger ANNIE POS Tagger a Me ANNIE NE Transducer ANNIE NE Transducer Figure 12 1 Selected processing resources 12 9 Gate named entity type file The DTD of the XML file which contains named entity type file which is used when loading named entities see 6 2 3 lt xml version 1 0 encoding UTF 8 gt lt ensTypeEn gt lt typeEn gt Organization lt typeEn gt lt typeEn gt Date lt typeEn gt lt typeEn gt Person lt typeEn gt lt typeEn gt Percent lt typeEn gt lt typeEn gt Location lt typeEn gt lt typeEn gt Money lt typeEn gt lt typeEn gt Title lt typeEn gt lt typeEn gt Address lt typeEn gt lt typeEn gt Unknown lt typeEn gt CHAPTER 12 ANNEX 45 lt typeEn gt Jobtitle lt typeEn gt lt typeEn gt FirstPerson lt typeEn gt lt typeEn gt Location lt typeEn gt lt typeEn gt UrlPre lt typeEn gt lt ensTypeEn gt 12 10 Key binding CTRL I invalidate term CTRL R remove term CTRL T new terminological form CTRL X exit CTRL J find a term CTRL V paste a lexical entry merging form CTRL F5 Refresh forms ALT K add a kindof link ALT O create an objectProperty ALT C create a class ALT G new terminological form ALT F find in occurrences ALT Q select a form
61. tual level perspective you need to create or to import a Neon toolkit project which is different from the Terminae project and to create or import an ontol ogy in this project This can be done either from theNeon ontology submenu of the Terminae TerminoCon ceptual perspective Create a Neon project and Create Neon Toolkit ontology items or create the project and the ontology from the menu of the navigator view in Neon toolkit conceptual level perspective click right In the Neon toolkit conceptual level perspective you can also import an existing project In this case you have to refresh the view to display the imported project and to link 1t to the ter minoConceptual perspective see the following section You can also import an ontology use import item from the menu of the navigator view of Neon toolkit conceptual level perspective 9 1 Perspective overview The Neon toolkit Conceptual level OWL perspective presentation is very simi lar to that of the Terminae TerminoConceputal level perspective It is composed of two main parts with a global view on the left and a set of more detailed and dependant views on the right see Figure 8 2 See the documentation http www neon toolkit org wiki Documentation_and_Supp When adding a link between an individual and a terminoConcept Neon does not keep the modification You have to export the ontology 9 2 Terminae links menu Terminae links menu has been added to the Neon Tool
62. ualizee it Each terminoconcept is linked to its corresponding class Each terminoConceptual relation is linked to its corresponding objectProperty e Create a lexicalized ontology is used to create a lexicalized ontology as ex plaining above Note that for links between terminoConceptual relations and ObjectProp erties the objectProperty must have a defined domain and a defined range The ontology which is open is modified You may export it for saving e Extract thesaurus from Owl entities is used to create a thesaurus from an Owl ontology Each class become a terminoconcept named by the normalized name of the class e Lexicalisation is used to find a lexicalisation of OWL entities Each class label is broken down into words Each word is searched in the set of candidate terms or in the set of terms The results are written in txt files This functionality will be more developed in next version Chapter 10 Annotator perspective This chapter and the tool have been written by F L vy A Guiss S Szulman The LIPN Annotator marks the occurrences of given terms in a text with concepts and indi viduals of an ontology It outputs a project which can be directly opened by SemEx the LIPN semantic explorel to explore the annotations mark and transform rules etc The user can alternatively choose to produce plain result files and to work them with her his own programs The output format is textual html and txt and self ex
63. use e File submenu e Term management submenu e Cleaning submenu e New terminological form s CTRL T action e To terminological form action Those submenus and actions are presented in the following subsections 6 4 1 File submenu This menu allows to load and save terminological data It proposes the following actions e Load term extractor results to load the terms initially extracted from your corpus by the term extractor or saved in a XML backup The procedure is the same as that described in Section e Save term extractor results to make an XML backup see Annex for details on the file format e Load named entities from ANNIE results to load the named entities iden tified by the ANNIE named entity recognition tool see Section 6 2 3 A first file dialog window opens in which you have to indicate which named entity types you are interested in by selecting a named entity type XML file that should be located in the corpora subdirectory of your project A second file dialog window opens in which you have to select another xml file containing the list of named entities extracted by ANNIE This file should also be located in the corpora subdirectory of your project CHAPTER 6 TERMINOLOGICAL LEVEL STEP 1 PERSPECTIVE 16 You have to indicate the number of the document 0 if only one document for which you have used Annie tool e Save named entities to make an XML backup see Annex for details on the
64. version of the platform is designed to work with TreeTag gen and ANNIE named entity recognition tooP terminoFormDir Contains the terminological forms that are created using TERMI NAE and output by it linguae Contains the search patterns that have been designed and their results no pattern design tool is available in the current version thesauri Contains the termino conceptual resources that are created using TERMI NAE and output by it system Contains some files automatically created by TERMINAE repExtractTerm Contains the results of term extraction tools The current version of the platform is designed to work with YaTeA term extractor TermoStat term extractor which can be used through a web service or with a sample file involving terms one term by line aterm list one lemma by line with the corpus and its tagged corpus with Tree Tagger 3 2 2 How to import a project A project to be imported is represented as a zipped file containing the project directory with all the required subdirectories and files of a given project You do not have to unzip the file Go to the main menu Click on Terminae project actions Click on Import project A first dialog window appears in which you must indicate the zipped file to load A second dialogue window appears to propose the directory into which the project will be imported If you do not accept you ll be offered to choose another one Whe
Download Pdf Manuals
Related Search
Related Contents
EASY-METER 製品パンフレット 空気清浄機 総合カタログ 2015/秋 Philips PET718/55 User's Manual Instruction Manual Transmitter M420 pH 取扱説明書 - Samsung Manuel d`utilisation LiftNStore Systeme d`entreposage aérien Manual do Utilizador InLine Patch Cat.6A S/FTP (PiMf) 500MHz 1.5m Chief KWK110 input device accessory Copyright © All rights reserved.
Failed to retrieve file