Home

Natural language processing and query driven information retrieval

image

Contents

1. ADJ or NOUN NOUN or Noun_Group AS Noun_Group Input Sentence 0126 IF 0127 left_context empty 0128 right_context empty 0129 UNITE 0130 DEF_ARTICLE or NOUN or Noun_Group INDEF_ARTICLE 0131 AS Noun_Group 0132 IF 0133 left_context empty 0134 right_context empty 0135 END_BACKWARD_STAGE 0136 Rule 1 ADJ and NOUN Pass 1 0137 The_DEF_ARTICLE device_ NOUN has HAVE _s an INDEF ARTICLE open Noun_Group distal ADJ end_NOUN _PERIOD 0138 Rule 1 ADJ and Noun_Group Pass 2 0139 The_DEF_ARTICLE device_ NOUN has HAVE_s an_INDEF_ARTICLE Noun_Group open _ADJ Noun_Group distal_ADJ end_NOUN _ PERIOD 0140 Rule 2 INDEF_ARTICLE and Noun_Group Pass 3 0141 The_DEF_ARTICLE device _NOUN has HAVE_s Noun_Group an_INDEF_ARTICLE Noun _Group open_ADJ Noun_Group distal_ADJ end_NOUN _ PERIOD 0142 Rule 1 DEF_ARTICLE and NOUN Pass 4 0143 Noun_Group The DEF ARTICLE device _NOUN has_HAVE_s 0144 Noun_Group an_INDEF_ARTICLE Noun _Group open_ADJ 0145 Noun_Group PERIOD distal ADJ end_NOUN Jan 24 2002 0146 Now there exists two nodes or groups noun groups Only one more rule is needed to unite a noun group HAS verb and one more noun group as a sentence 0147 Thus the first stage in parsing deals with POS tags then sequencies of POS tags are gradually substituted by syntactic groups these groups are then substituted by other groups
2. 0236 c Helpers 0237 enc_DO enc DOD enc_DOZ enc_MD enc_IN enc_XNOT enc_TO enc_HV enc_HVZ enc_HVD enc_BE enc_BEZ enc_BER enc_BED enc_BEDZ enc_BEM Jan 24 2002 0238 Example do did does 0239 d Personal Pronouns 0240 enc_PPusd enc_PPusd2 enc_PP1A enc_PP1AS enc_PP10 enc_PP10S enc_PP2 enc_PP3 enc_PP3A enc_PP3AS enc_PP30 enc_PP30S enc_PPL enc_PPLS enc_PP 0241 Example I we they 0242 e Other Pronouns 0243 enc_PN enc_PNq2 enc_PNusd enc_PNusdq2 0244 Example same each something 0245 Determiners enc_DT enc_DTusd enc_DTI enc_DTS enc_DTX enc_EX 0246 Example this those these 0247 g Because If 0248 enc_CS 0249 Example because if since after 0250 h Punctuation 0251 enc_Exclamatory enc_AmpersandFW enc_RL Bracket enc_RRBracket enc_LeftQuote enc_Right Quote 0252 Stop enc_MultipleMinus enc_Comma enc_Full 0253 enc_Spot3 enc_Colon enc_Semicolon enc _Question 0254 Example 0255 i Others 0256 enc_UH enc_CC enc_OD enc_CD 0257 Example Oh and or 0258 As a result eSAO extractor 42 outputs eSAO request in the form of a set of for example 8 fields where some of the fields may be empty 0259 1 Subject 0260 2 Action 0261 3 Object 0262 4 Preposition 0263 5 Indirect Object 0264 6 Adjective 0265 7 Adverbial 0266 8 Constraints 0267 Along with that Sub
3. Statement command sentence 24 Complex query FIG 2 Basic Types of the User Request Patent Application Publication Jan 24 2002 Sheet 3 of 8 US 2002 0010574 A1 gt User request keywords a Request parsing Linguistic KB Parsed user request FIG 3 Structural and Functional Scheme of the User Request eSAO Processor the case of keywords Patent Application Publication Jan 24 2002 Sheet 4 of 8 US 2002 0010574 A1 User request bit command question or complex sentence 36 Part of speech tagging 3 Recognition of introductory part of the query 7 38 Request parsing Linguistic KB Request converting i 44 eSAO request FIG 4 Structural and Functional Scheme of the User Request eSAO Processor the case of bit command question or complex query Patent Application Publication Jan 24 2002 Sheet 5 of 8 US 2002 0010574 A1 48 Tagged request 50 Verb chains recognition 12 a Linguistic KB Noun group recognition Syntactical dependency tree construction 56 Parsed request anam Bi FIG 5 Structural and Functional Scheme of User Request Parser Patent Application Publication Jan 24 2002 Sheet 6 of 8 US 2002 0010574 A1 00 Parsed request 62 Structure of question sentence recognition Linguistic KB Request converting Question word substit
4. question or complex query 0016 FIG 6 shows a parsed synonymic search pattern expanding routine 0017 FIG 7 shows a routing for generating the eSAO user request 0018 FIG 8 shows the principal stages of forming as eSAO Knowledge Base or Index 90 and using a user natural language search query for relevant eSAO component and source information display from the knowledge base DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENT OF THE INVENTION 0019 The following are incorporated herein by refer ence 0020 1 System and on line information service y presently available at www cobrain com and the publicly available user manual therefor 0021 2 The software product presently marketed by Invention Machine Corporation of Boston USA under it s trademark KNOWLEDGIST and the publicly available user manual therefor 0022 3 WIPO Publication 00 14651 Published Mar 16 2000 U S 2002 0010574 A1 0023 4 U S patent application Ser No 09 541 182 filed Apr 3 2000 0024 5 IMC s COBRAIN server software mar keted in the United States and manuals thereof 0025 See references Nos 3 4 and 5 above for systems and methods of using an SAO format for developing an SAO extracted Knowledge Base 0026 The system and method according to the present invention employs a new expanded S A O format for semantic processing documents and generating a database of expanded SAOs for expanded informa
5. higher in the sentence hierarchy thus building a multi level syntactic structure of sentence in the form of a tree 0148 For instance first the results are presented for the four sentences given above 1 The dephasing wave guide is fitted with a thin dielectric semicircle at one end and a guide cascaded with the dephasing element completely suppresses unwanted modes w Sentence w__N_XX w_NN a_AT guide_NN w___VBN_XX cascaded _VBN w__IN_N with_IN w_NN the_ATI w_NN dephasing NN element_NN w___VBZ_XX w___VBZ completely_RB suppresses __VBZ w NNS unwanted_JJ modes_NNS 2 It was found that the maximum value of x is dependent on the ionic radius of the lanthanide element w___ Sentence w_NN w_NN the__ATI w_NN maximum_JJ value_NN of IN x_NP w___BEX_XX is_BEZ w__JJ_XX dependent_JJ w__IN_N on_TN w_NN w_NN the__ATI w_NN ionic_JJ radius_NN of_IN w_NN the__ATI w_NN lanthanide_NN element_NN 3 This was true even though the BN interphase reacted and vaporized because of water vapor in the atmosphere at intermediate temperatures and glass formation occurred at higher temperatures w Sentence w_NN glass_NN US 2002 0010574 Al continued formation NN w___VED_XX occurred_VBD w IN N at_IN w_NNS higher_JJR temperatures_NNS 4 The composites were infiltrated under vacuum cured at 100 degree C and precalcined in air at 700 degree C w___Sentence w_NNS The_ATI composites_NNS w___BEX_XX were__BED
6. w___VEN_XX infiltrated VBN w__IN_N under_IN vacuum__NN 5 bit sentence type Input clean water Output a lt w_NN gt lt clean_JJ gt clean_JJ lt water_NN gt water_NN b lt w__VP_XX gt lt clean__VB gt clean__VP lt water_NN gt water_NN 6 question sentence type Input What causes fuel cell degradation Output lt w___q__Sentence gt lt What_WDT gt What_WDT lt w___VBZ_XX gt lt causes__VBZ gt causes_ VBZ lt w__NN gt lt fuel_NN gt fuel_NN lt w__NN gt lt cell_NN gt cell NN lt degradation_NN gt degradation_NN lt _ gt 2 0149 At the stage of question transformation or conver sion FIG 6 in case of question sentence question structure is first recognized according to its general descrip tion Unit 62 This formal description concerns only that introductory part of the query or the whole query which will be transformed later on and it is given in the following Backus Naur notation 0150 1 lt Question gt lt Wh group gt lt First Verbal Group gt NG 0151 lt Second Verbal Group gt 0152 Notes a x means that x element may be absent 0153 b NG noun group 0154 2 lt Wh group gt Pr lt Wh gt NG 0155 Notes Pr preposition 0156 3 lt Wh gt enc_WPlenc_WRBlenc_WDT lt How RB gt 0157 Notes a enc X means represents a lexical unit with a terminal symbol X being its POS tag Jan 24 2002 0158 b enc_WP enc_WRB an
7. cylinder 0042 3 Input ruby color of Satsuma glass Output P Feature Color of Attr Om ruby color Om Sat suma glass 0043 4 Input micro cracks situated between sintered P grains Output P Placement situated between Attr Om sintered grains Om micro cracks 0044 5 Input precursor derived from hydrocarbon gas Output P Formation derived from Attr Om hydrocarbon gas Om precursor 0045 6 Input dissipation driver coupled to power dissipator Output P Connection coupled to Attr Om power dissipator Om dissipation driver 0046 7 Input lymphoid cells isolated from blood of AIDS infected people Output P Separation isolated from Attr Om blood of AIDS infected people Om lymphoid cells 0047 8 Input one dimensional hologram pattern transferred to matrix electrode Output P Transfer transferred to Attr Om matrix electrode Om one dimensional hologram pattern 0048 Itis clear that the components Om proper can also be predicate elements in the given above examples it is for instance Ex No 2 Om freely suspended cylinder Ex No 8 Om one dimensional hologram pattern It should be noted that for information retrieval purposes it is more important to recognize the structure of Subject Object and Indirect object that is Attr Om and Om than the types of relation P because it is the basis of the algorithm of transition to the less relevant search patterns 0049 Semantic Processor for User Request An
8. of the natural language expressions from a plurality of downloaded documents into an eSAO Knowledge Base 11 The method of claim 10 said method further compris ing providing communication access to said eSAO knowl edge base by a plurality of user computers processing natural language user requests into eSAO search pat terns and conveying to respective users expressions and source document links for respective expression whose eSAO field components substantially match the eSAO components of the respective user requests x x x
9. tence semantic analysis e SAO extraction Unit 42 and outputs formal representation of the original request in the form of a set of predetermined fields 0085 At the step of tagging Unit 36 each word of the request is assigned a Part Of Speech tag its lexical gram matical class The analysis used here see above identified references Nos 3 and 4 is supplemented with statistical data obtained on the specially collected question corpus This provides highly correct POS tagging In case of bit sentence several variants are possible 0086 For instance 0087 0088 0089 Output 0090 a clean_JJ water_NN 0091 b clean_VB water_NN 0092 where JJ NN noun 0093 Then Unit 37 the introductory part of the query is recognized which is a sequence of words in the beginning of the query none of which is a keyword for the given query For example Input clean water stands for adjective WB verb 0094 a Could you tell me 0095 b Is it true that 0096 c I want 0097 This part of the query is excluded from further processing or analysis The recognition of the introductory part is performed by means of patterns making use of separate lexical units and tags Jan 24 2002 0098 For example 0099 a lt PP BE interested wondering if whether LP 0100 This pattern removes for example the following part from the user s query 0101 0102 b lt MD PP
10. to Jan 24 2002 extract from the parsed text eSAOs with finite actions non finite actions verbal nouns One example of Action extraction is 0193 lt HVZ gt lt BEN gt lt VBN gt gt lt A gt lt VBN gt 0194 This rule means that if an input sentence contains a sequence of words wl w2 w3 which at the step of part of speech tagging obtained HVZ BEN VBN tags respectively then the word with VBN tag in this sequence is in Action 0195 For example 0196 has HVZ been BEN produced_VBN gt A produced 0197 The rules for extraction of Subject Action and Object are formed as follows 0198 1 To extract the Action tag chains are built e g manually for all possible verb forms in active and passive voice with the help of the Classifier block 3 For example have been produced lt HVZ gt lt BEN gt lt VBN gt 0199 2 In each tag chain the tag is indicated corresponding to the main notion verb in the above example lt VBN gt Also the type of the tag chain active or passive voice is indicated 0200 3 The tag chains with corresponding indexes formed at steps 1 2 constitute the basis for linguistic modules extracting Action Subject and Object Noun groups constituting Subject and Object are determined according to the type of tag chain active or passive voice 0201 The recognition of such elements as Indirect Object Adjective and Adverbial is implemented in the same way that is t
11. EXEMPLARY EMBODIMENT OF INVENTION 0007 Itis an object of the present invention to expand the semantic processing power of computers to include not only the SAO but to use a new more comprehensive extended Subject Action Object eSAO format as the foundation for rule based processing normalization and management of natural language 0008 One skilled in this art will note that prior systems SAOs included three components subject S action A Object O the expanded SAO hereafter eSAO includes a minimum of four components and fields and preferably seven components and fields These additional fields include adjectives prepositions etc more fully described below In one exemplary embodiment an eighth field is preferably provided into which all other components can be placed These other components and eighth field are called con straints Where the knowledge base or information in local and remote databases are to be accessed in response to a user request or query the system preferably uses the same rules and number of fields to process the natural language user request as to process candidate access or stored documents for presentation to user Jan 24 2002 0009 Thus Semantic Processor for User Request Analy sis according to the principles of the present invention aims at analyzing and classifying different types of user requests in order to create their formal representation in the form of a set of certain
12. VB PP gt I am wondering if 0103 This pattern removes for example the following part from the user s query 0104 Could you tell me 0105 At the step of parsing FIG 4 verbal sequences Unit 50 and noun phrases Unit 52 are recognized from the tagged request FIG 5 and a syntactical parse tree is built Unit 54 0106 This module includes stored Recognizing Linguis tic Models for Syntactic Phrase Tree Construction They describe rules for structurization of the sentence i e for correlating part of speech tags syntactic and semantic classes etc which are used by Text parsing and SAO extraction for building Syntactic and Functional phrases see Reference No 4 i e U S Patent application Ser No 09 541 182 page 36 section Tree Construction 0107 The Syntactical Phrase Tree Construction is based on context sensitive rules to create syntactic groups or nodes in the parse tree 0108 A core context sensitive rule can be defined by the following formula 0109 UNITE 0110 element_1 element_n AS Group_X 0111 IF 0112 0113 0114 which means that the string in brackets element_1 element_n must be united and further regarded as a syntactic group of a particular kind Group_X in this case if elements to the left of the string conform to the string defined by the left_context expression and elements to the right of the string conform to the string defined by the right_c
13. aking into account the tags and the structure itself of Syntactical Phrase Tree 0202 Recognition of Subject Object and Indirect Object attributes is carried out on the basis of corresponding Recognizing Linguistic Models These models describe rules algorithms for detecting subjects objects their attributes placement inclusion parameter etc and their meanings in syntactic tree 0203 To identify parameters of an Object Indirect Object Subject Parameter Dictionary is used A standard dictionary defines whether a noun is an object or a parameter of an object Thus a list of such attributes can easily be developed and stored in Linguistic KB Block 80 For example temperature parameter of water object To identify attributes such as placement inclusion etc Lin guistic KB includes a list of attribute identifiers i e certain lexical units For example to place to install to comprise to contain to include etc Using such lists the system may automatically mark the eSAOs extracted by eSAO extractor which correspond to said attributes 0204 These algorithms work with noun groups and act like linguistic patterns that control extraction of above mentioned relations from the text For example for the relations of type parameter object basic patterns are 0205 lt Parameter gt of lt Object gt 0206 and 0207 lt Object gt lt Parameter gt US 2002 0010574 Al 0208 The relation is valid only when t
14. alysis according to the principles of the present invention aims at analyzing and classifying different types of user requests in order to create their formal representation in the form of a set of certain fields and relations between them which enables more effective and efficient search for information or documents in local and remote databases knowledge bases information networks etc 0050 Semantic Processor FIG 1 receives User Request 2 as input data Using Linguistic KB 12 Semantic Processor identifies or classifies the type of request as described below Unit 4 and performs eSAO analysis of the request in accordance with its type Unit 6 Then a number of search patterns is generated corresponding to the input user request which represent its formal description designed for answer search Unit 10 in databases information net works etc 0051 Semantic Processor analyzes the following basic types of requests FIG 2 Jan 24 2002 0052 1 Keywords 18 0053 Keywords is a type of user request where words are organized into a Boolean expression using predetermined grammar rules In one example it comprises 6 rules for infix prefix and brackets operators The following operators are implemented AND OR XOR NEAR NOT and brack ets The operators may be expressed in user request in different ways for instance AND can be written as AND K RRK 0054 User request example 0055 l
15. as United States S 20020010574A a Patent Application Publication ao Pub No US 2002 0010574 Al Tsourikov et al 43 Pub Date Jan 24 2002 54 NATURAL LANGUAGE PROCESSING AND Publication Classification QUERY DRIVEN INFORMATION 51 Int Cl GO06F 17 27 RETRIEVAL E O Seach ninesvenetbosrenesen 704 9 704 1 707 3 76 Inventors Valery Tsourikov Boston MA US 67 ABSTRACT Igor Sovpel Minsk BY Leonid In a digital computer the method of processing a natural Batchilo Belmont MA US Correspondence Address STANGER amp DREYFUS 608 SHERWOOD PKWY language expression entered or downloaded to the computer that includes 1 identifying in the expression expanded subject action object components that includes at least four components subject action object SAO components and at least one additional component from the class of prepo sition indirect object adjective and adverbial eSAO com MOUNTAINSIDE NJ 07092 US ponents 2 extracting each of the at least four components for designation into a respective subject action object field 21 Appl No 09 815 260 and at least a preposition field indirect object field adjective field and adverbial field and 3 using the components in at 22 Filed Mar 22 2001 least certain ones of said fields for at least one of i displaying components to the user ii forming a search Related U S Application Data pattern of a user request for infor
16. aser NEAR beam amp amp heating 0056 2 Bit sentence 20 0057 Bit sentence is a type of user request representing a part of sentence or sentence segment incomplete sen tence which corresponds to a certain semantic element pro cess object function action object ete 0058 User request examples 0059 a solid state laser system 0060 b decrease friction 0061 3 Statement 22 0062 Statement is a type of request which is a gram matically correct imperative sentence 0063 User request example 0064 Give me the number of employees in your company 0065 4 Question sentence 24 0066 Question sentence is a type of request which is a grammatically correct interrogative sentence 0067 User request examples 0068 a What causes fuel cell degradation 0069 b What is the chemical composition of the ocean 0070 c Do the continents move 0071 5 Comlex query 25 0072 Complex query is a type of request which is expressed by several sentences i e by the fragment of the text 0073 User request example 0074 a Is there anything I can give my one month old son to relieve gas pain I think he may have colic 0075 b My 15 year old son has recently been diagnosed with recurrent shoulder dislocation Lately he got worse How is recurrent shoulder dislocation treated 0076 c Because I have a chronic stuffed nose and no sense of taste I have be
17. ch patterns by means of gradual refusal of Constraints field elements and further of recognized Object attributes owing to 0414 recurrent Attr shoulder dislocation 0415 shoulder Attr dislocation US 2002 0010574 Al 0416 Thus the less relevant search pattern will be 0417 0418 0419 0420 0421 0422 0423 0424 0425 Note the constraint has been removed which can be in response to a user entered command 0426 With reference to FIG 8 the query driven infor mation search 84 includes a semantic eSAO processing 86 88 for creating eSAO structures index or Knowledge Base including links to documents 90 of source documents 80 and eSAO search patterns 92 of user requests 82 See references nos 2 and 4 for further details on creating one example of a Knowledge Base The present Knowledge Base however can have up to 8 fields for the eSAO structures and constraints as described above The search module 84 further includes comparative analysis 92 of eSAO search patterns 92 of user requests and eSAO struc tures index 90 of source documents The comparative analy sis 92 identifies the eSAO structures 96 of source docu ments which are most relevant for eSAO search patterns of given user requests These structures can be displayed to the user in order of relevance and the full source sentence of user selected structure and link to the full document can be displayed User selection of the documen
18. ction A Ex The process is slowly modi fied The driver must not turn the steering wheel in such a manner 0036 Examples of application of the eSAO format are 2 the Input Is the moon really blue during a blue moon Output Subject moon Action be Object Preposition during Indirect Object blue moon Adjective really blue Adverbial Input Does the moon always keep the same face towards Earth Jan 24 2002 continued Output Subject moon Action keep Object same face Preposition towards Indirect object Earth Adjective Adverbial always 3 Input The dephasing waveguide is fitted with a thin dielectric semicircle at one end and a guide cascaded with the dephasing element completely suppresses unwanted modes Output Subject guide cascaded with the dephasing element Action suppress Object unwanted mode Preposition Indirect Object Adjective Adverbial completely 4 Input It was found that the maximum value of x is dependent on the ionic radius of the lanthanide element Output Subject maximum value of x Action be Object Preposition on IndirectObject ionic radius of the lanthanide element Adjective dependent Adverbial 5 Input This was true even though the RN interphase reacted and vaporized because of water vapor in the atmosphere at intermediate temperatures and glass formation occurred at higher temperatures Output Sub
19. d enc_WDT tags cover all possible question words how long how much how many when why how where which who whom whose what 0159 0160 5 lt First Verbal Group gt enc_MDlenc_HV enc_HVZ enc_HVD enc_HVN enc_BE enc_BEZ Jenc_BEMlenc_BER enc_BED enc_BEDZ enc_ DO enc_DOD enc _DOZ 4 lt How RB gt how enc_RB 0161 6 lt Second Verbal Group gt lt First Verbal Group gt enc_VBlenc_VBZ enc_VBDlenc_VBN enc_VBG enc_HVGlenc_BEN enc_BEG enc_XNOT 0162 It should be noted that above described grammar is build so as not to process posed to syntactic subjects What food can reduce cholesterol in blood Who killed Kennedy because the word order in these questions is direct statement like and does not need to be changed Besides the remaining part of the question we mark as TL tail 0163 In one example of the converting step 40 the elements in the right side of formula 1 are enumerated 0164 1 lt Wh group gt 0165 2 lt First Verbal Group gt 0166 3 NG 0167 4 lt Second Verbal Group gt and TL is marked as 5 0168 Then the formula of the query itself will be 0169 0170 In some cases certain elements of the formula may be absent request 1 2 3 4 5 0171 For example 0172 a What is the chemical composition of the ocean gt 1 What 2 is 3 the chemical composition of the ocean 4 5 0173 b Do the continents move 1 2 Do 3 th
20. e continents 4 move 5 0174 c How much did it help 1 How much 2 did 3 it 4 help 5 0175 d 1 What company 2 is 3 John 4 work ing 5 at the moment for gt 3 John 2 is 4 work ing 5 at the moment for 1 what company 0176 e 1 For what company 2 is 3 John 4 working 5 at the moment gt 3 John 2 is 4 working 1 for what company 5 at the moment 0177 After the structural formula of the request has been defined the question is converted Unit 64 according to the following rule US 2002 0010574 Al 0178 12345 gt 32415 0179 or 0180 12345 gt 32451 0181 The second formula may be regarded as a special type of the first one connected with grammatical peculiari ties of the question 0182 For example 0183 a 1 What 2 is 3 the chemical composition of the ocean 4 5 gt 3 the chemical compo sition of the ocean 2 is 4 1 What 5 0184 b 1 2 Do 3 the continents 4 move 5 gt 3 the continents 2 Do 4 move 1 5 0185 c 1 How much 2 did 3 it 4 help 5 3 it 2 did 4 help 1 How much 5 0186 d 1 What company 2 is 3 John 4 work ing 5 at the moment for gt 3 John 2 is 4 work ing 5 at the moment for 1 what company 0187 e 1 For what company 2 is 3 John 4 working 5 at the moment 3 John 2 is 4 working 1 for what company 5 at the mom
21. en taking a prescribed medicine Claritin D Is there a time limit after which this medicine will no longer have an effect If so what else can I take 0077 d Three years ago after months of extreme fatigue general aches and pains and stomach prob lems my family doctor gave me a diagnosis of US 2002 0010574 Al Epstein Barr He said my titers were 5100 Recently I went to an internist who ran numerous blood tests and said she thinks that I have fibromyalgia She doesn t believe in the Epstein Barr diagnosis I am now being referred to a rheumatologist Is there such a thing as Chronic Epstein Barr And what is the difference between Epstein Barr and fibromyalgia 0078 After the type of request has been classified the request is forwarded to eSAO module for further analysis Unit 6 0079 Ifthe request has been recognized as Keywords i e it satisfies the rules of Boolean grammar Semantic Processor converts the request into standard notation See FIG 3 For example 0080 0081 laser NEAR beam amp amp heating 0082 Output 0083 laser NEAR beam AND heating Input 0084 Ifthe request is of the type bit or command or question sentence or complex query eSAO Processor FIG 4 performs its tagging Unit 36 recognizing intro ductory part of the request Unit 37 parsing Unit 38 conversion Unit 40 If the request type is question sen
22. ent 0188 The described transformations of the questions enable to transform them into narrative form which can be easily translated into the search pattern 0189 Then converted request is subjected to the ques tion word substitution In accordance with special rules question words are substituted with certain so called null words so as not to corrupt sentence structure What Something Which Some How Somehow Who Someonel How long Sometime Whom Someone2 How much Something2 How many Something3 Where Somewhere When Time clause Why Reason clause Whose Somebody s 0190 Then the parsed converted request is submitted to User Request eSAO extraction 44 0191 At the stage of eSAO extraction FIG 7 in the user request in all cases except keywords case semantic elements are recognized of the type S subject Unit 74 A action Unit 72 O object Unit 74 as well as their attributes expressed via preposition indirect object adjec tive adverbial as well as inner structure the components proper and the attributes of Subject S Object O and Indirect Object 10 0192 The recognition of all these elements is imple mented by means of corresponding Recognizing Linguistic Models see Reference No 4 i e U S patent application Ser No 09 541 182 page 41 section SAO Recognition These models describe rules that use part of speech tags lexemes and syntactic categories which are then used
23. ex Query 0324 Input My 15 year old son has recently been diagnosed with recurrent shoulder dislocation Lately he got worse How is recurrent shoulder dislocation treated 0325 0326 0327 0328 0329 0330 0331 0332 0333 0334 At the final stage of processing the user request Semantic Processor forms Search Patterns which are Bool ean expressions in case of keywords and eSAOs in other cases Also sign may be present in some eSAO fields to signal that this field must be filled in to answer the user request 0335 For example 0336 Bit Sentence 0337 0338 0339 a 0340 0341 0342 0343 0344 0345 0346 0347 0348 b 0349 0350 0351 Output Subject continents Action move Object Preposition Indirect Object Adjective Adverbial Constraints Output Subject Action treat Object recurrent shoulder dislocation Preposition Indirect object Adjective Adverbial Constraints 15 year old son diagnose Input clean water Output Subject any Action any Object clean water Preposition any Indirect Object any Adjective any Adverbial any Constraints any Subject any Action clean Object water US 2002 0010574 A1 0352 Preposition any 0353 Indirect Object any 0354 Adjective any 0355 Adverbial any 0356 Constraints any 0357 Statement 0358 Input Give me the nu
24. f i component display to the user ii 11 Jan 24 2002 forming a search pattern of a user request for informa tion search of local or on line databases and iii forming an eSAO knowledge base 2 In the method of claim 1 wherein the expression comprises a user request for information search said method further comprising classifying the expression into at least one category from the class that includes bit sentence statement sentence question sen tence and complex query and simplifying the user request search pattern by applying rules in accordance with the respective expression category 3 In the method of claim 2 wherein the rules include transforming a question sentence rules according to 12345 32415 or 12345 32451 wherein lt wh group gt lt First Verbal Group gt NG Noun Group lt Second Verbal Group gt TL tail NAPUNE 4 The method of claim 1 wherein the expression comprises a sentence of a document down load to the computer and wherein said process com prises using the components for forming an indexed eSAO knowledge base entry and selecting the eSAO entry for display of the eSAO com ponents or of the source expression that includes the eSAO components in response to a user request that includes at least a subset of the expression eSAO components 5 The method of claim 1 wherein the expression includes constraint components that includes components that are not classif
25. fields and relations between them which enables more effective and efficient answer search in local and remote databases information networks etc Also the output search patterns can be used to search for matching eSAO s in eSAO Knowledge Base in the system with much more accuracy and reliability than prior systems and meth ods even for requests being in the form of questions In addition the eSAO format enable greater accuracy in obtain ing precise information of interest One exemplary system according to the present invention also forms an eSAO knowledge base or index of stored processed information that can be managed by various rules related to the eSAO components and fields DRAWINGS 0010 Other and further objects and benefits shall become apparent with the following detailed description when taken in view of the appended drawings in which 0011 FIG 1 shows a schematic view of one example of a digital computer system in accordance with the principles of the present invention 0012 FIG 2 is an example of a classification routine for classifying the type of user request usable in the system of FIG 1 0013 FIG 3 is an example of a parsing routine for the case of user request being key words 0014 FIG 4 is similar to FIG 3 where user request is a bit segment sentence command sentence or question sen tence 0015 FIG 5 shows a parsing routine for the case of user request being bit command
26. he lexeme which corresponds to lt parameter gt is found in the list of parameters included in Linguistic KB 0209 These models are used by Unit 76 of eSAO extrac tion module The output of the unit is a set of 7 fields where some of the fields may be empty 0210 For example for the highlighted fragments of the first two sentences given above 0211 1 The dephasing waveguide is fitted with a thin dielectric semicircle at one end and a guide cascaded with the dephasing element completely suppresses unwanted modes 0212 Subject guide cascaded with the dephasing element 0213 Action suppresses 0214 Object unwanted modes 0215 Preposition 0216 IndirectObject 0217 Adjective 0218 Adverbial completely 0219 2 It was found that the maximum value of x is dependent on the ionic radius of the lanthanide element 0220 Subject maximum value of x 0221 Action be 0222 Object 0223 Preposition on 0224 IndirectObject the ionic radius of the lan thanide 0225 element 0226 Adjective dependent 0227 Adverbial 0228 At the stage 77 User Request eSAO Extractor recognizes constraints i e those lexical units of the query which are not parts of eSAO 0229 The constraints can be represented by any lexical unit except 0230 a Question Words 0231 enc_WP enc_WRB enc_WDT 0232 Example what how where 0233 b Articles 0234 enc_AT enc_ATI 0235 Example a an the
27. ied in any other component type said extracting step further includes extracting constraint components for designation into a constraint field and said using step further includes using the components in at least certain ones of said fields for at least one of i component display to the user ii forming a search pattern of a user request for information search of local or on line databases and iii forming an eSAO knowl edge base 6 The method of claim 5 wherein the object field includes an object component field seg ment and an attribute field segment 7 The method of claim 6 said method further comprising forming a less relevant user request search pattern by deleting one or more components from the constraint field or one or more attributes from the object field US 2002 0010574 Al 8 The method of claim 4 wherein the expression comprises part of a downloaded document said method further classifying the expression into at least one category from the class that includes bit sentence statement sentence question sentence 9 The method of claim 8 wherein the expression includes a question sentence and trans forming the sentence according to the rule 12345 32415 or 12345 32451 wherein a lt wh group gt 7 lt First Verbal Group gt 8 NC Noun Group 12 Jan 24 2002 continued 9 lt Second Verbal Group gt 10 TL tail 10 The method of claim 8 said method comprising processing all
28. ject Object and Indirect Object each have inner structure as described above 0268 In case of bit sentence and complex query more than one set of fields is possible For instance 0269 Bit Sentence 0270 0271 Output 0272 a Input clean water US 2002 0010574 Al 0273 0274 0275 0276 0277 0278 0279 0280 0281 b 0282 0283 0284 0285 0286 0287 0288 0289 0290 Statement 0291 Input Give me the number of employees in IMC company 0292 Output 0293 Subject 0294 0295 pany 0296 0297 0298 0299 0300 0301 Question 0302 Input What is the chemical composition of the ocean 0303 0304 0305 0306 0307 0308 0309 0310 0311 0312 Question 0313 Subject Action Object clean water Preposition Indirect Object Adjective Adverbial Constraints Subject Action clean Object water Preposition Indirect Object Adjective Adverbial Constraints Action Object number of employees in IMC com Preposition Indirect Object Adjective Adverbial Constraints Output Subject chemical composition of the ocean Action is Object What Preposition Indirect Object Adjective Adverbial Constraints Input Do the continents move Jan 24 2002 0314 0315 0316 0317 0318 0319 0320 0321 0322 0323 Compl
29. ject glass formation Action occur Object Preposition at IndirectObject higher temperature Adjective Adverbial 6 Input The composites were infiltrated under vacuum cured at 100 degree C and precalcined in air at 700 degree C Output Subject Action infiltrate Object composite Preposition under IndirectObject vacuum Adjective Adverbial 0037 In addition Subject S Object O and Indirect Object iO have their inner structure which is recognized by the system and includes the components proper Sm Om iOm and their attributes Attr Sm Attr Om Attr iOm The elements of each of the pairs are in semantic relation P between each other 0038 If for purposes of the following description we denote any of the elements Sm Om iOm as m then Subject S Object O and Indirect Object iO are predicate elements of the type P Attr m m The system consid ers and recognizes following types of relation P Feature Parameter Color etc Inclusion Placement Formation Connection Separation Transfer ete US 2002 0010574 A1 0039 Examples Only sentence fragments are given here which correspond to the S or O or iO 0040 1 Input Ce TZP materials with CeO content Output P Formation with Attr Om CeO content m Ce TZP materials 0041 2 Input rotational speed of freely suspended cylinder Output P Feature Parameter of Attr Om rotational speed Om freely suspended
30. ld any means that this field can contain anything 0402 Functionality of all modules of the Semantic Pro cessor is maintained by Linguistic Knowledge Base 12 which includes Database dictionaries classifiers statistical data etc and Database of Recognizing Linguistic Models for text to words splitting recognition of noun phrases verb phrases subject object action attribute type of sentence recognition etc See References Nos 3 4 and 5 above 0403 Thus the output search patterns at 10 in FIG 1 can be used to search for matching eSAO s in eSAO Knowledge Base in the system with much more accuracy and reliability than prior systems and methods even for requests being in the form of questions In addition the eSAO format enables greater accuracy in obtaining precise information of interest 0404 Simultaneously the user is offered the opportunity to receive possibly less relevant information owing to the strategy of less strict identity between the corresponding fields in search patterns and in documents processed during the search Thus for example in the case of the last example 0405 Subject something 0406 Action treat 0407 Object recurrent shoulder dislocation 0408 Preposition any 0409 Indirect object any 0410 Adjective any 0411 Adverbial any 0412 Constraints 15 year old son diagnose 0413 Semantic Processor additionally can form a set of less relevant sear
31. mation search of local or 63 Non provisional of provisional 60 198 782 filed on Apr 20 2000 Sources of Documents to be processed ba on line databases and iii forming an eSAO knowledge application No base A constraint field can also be provided to accept non classified components 82 User Requests Query driven information search 84 Processing Documents documents Semantic eSAO eSAO Structures Index of Source including links to Comparative analysis of eSAO Search Patterns of User Requests and eSAO Structures of Source Documents Semantic eSAO Processing 90 eSAO Search Patterns of User Requests 94 Relevant eSAO Structures of Source Documents Patent Application Publication Jan 24 2002 Sheet 1 of 8 US 2002 0010574 A1 Digital Computer User Input Type of user request recognition User request eSAO analysis Linguistic KB Search pattern generation Output 10 search patterns Ne enna mene nn ee nnn ene nen eee ne en ee a nn nn nn en tn ee nen ee en ee en enn en eee meee erm e meee enema ene ce wne FIG 1 Structural and Functional Scheme of the Semantic Processor for User Request Analysis Patent Application Publication Jan 24 2002 Sheet 2 of 8 US 2002 0010574 A1 User request 16 User request classification 18 12 20 gic Sa Bit sentence Linguistic d KB 22
32. mber of employees in IMC company 0359 Output 0360 Subject Something1 0361 Action any 0362 Object number of employees in IMC com pany 0363 Preposition any 0364 Indirect Object any 0365 Adjective any 0366 Adverbial any 0367 Constraints any 0368 Question 0369 Input What is the chemical composition of the ocean 0370 Output 0371 Subject chemical composition of the ocean 0372 Action be 0373 Object 0374 Preposition any 0375 Indirect Object any 0376 Adjective any 0377 Adverbial any 0378 Constraints any 0379 Question 0380 Input Do the continents move 0381 Output 0382 Subject continents 0383 Action move 0384 Object any 0385 Preposition any 0386 Indirect Object any 0387 Adjective any 0388 Adverbial any 0389 Constraints any Jan 24 2002 10 0390 Complex Query 0391 Input My 15 year old son has recently been diagnosed with recurrent shoulder dislocation Lately he got worse How is recurrent shoulder dislocation treated 0392 Output 0393 Subject somethingl 0394 Action treat 0395 Object recurrent shoulder dislocation 0396 Preposition any 0397 Indirect object any 0398 Adjective any 0399 Adverbial any 0400 Constraints 15 year old son diagnose 0401 If no eSAO field contains the sign that means the question is general Absence of an element in a fie
33. ontext expression left context L_context_1 L_context_n right_context R_context_1 R_context_n 0115 Elements here can be POS tags or groups formed by the UNITE command 0116 All sequences of elements can consist of one or more elements 0117 One or both of context strings defined by left_con text and right_context may be empty 0118 The context sensitive rules are applied to a sen tence in a backward scanning from the end of the sentence to beginning element by element position by position If the present element or elements are the ones defined in brackets in one of the context sensitive rules and context restricting conditions are satisfied these elements are united as a syntactic group or node in the parse tree After that the scanning process returns to the last position of the sentence and the scan begins again The scanning process is over only US 2002 0010574 Al when it reaches the beginning of the sentence not starting any rule Preferably after a context sensitive rule has imple mented elements united into a group become inaccessible for further context sensitive rules instead this group rep resents these elements 0119 A simple example illustrates the above mentioned stages 0120 0121 The device has an open distal end 0122 The _DEF_ARTICLE device_NOUN has_HAVE_s an_INDEF_ARTICLE open_ADJ distal_ADJ end_NOUN _PERIOD Grammar 0123 BEGIN BACKWARD_STAGE 0124 UNITE 0125
34. t link shall access the full source document for display of the paragraph or paragraph segment that includes the eSAO components which can be highlighted for quick recognition This docu ment display is scrollable through the entire document see references nos 2 4 and 5 for further details of these functions 0427 It will be understood that various modification and improvements can be made to the herein disclosed exem plary embodiments without departing from the spirit and scope of the present invention Subject something Action treat Object dislocation Preposition any Indirect object any Adjective any Adverbial any Constraints any We claim 1 In a digital computer the method of processing a natural language expression entered or downloaded to the computer comprising identifying in the expression expanded subject action object eSAO components comprising at least four components including subject action object compo nents and at least one additional component from the class of preposition component indirect object com ponent adjective component and adverbial compo nent and extracting each of said at least four components for designation into a respective subject action object field and at least one respective field from the class of preposition field indirect object field adjective field and adverbial field and using the components in at least certain ones of said fields for at least one o
35. tion search and man agement 0027 Note the prior systems SAOs included three com po nents subject S Action A Object O whereas one example of expanded SAOs hereafter eSAO includes a minimum of 4 classified components up to 7 classified components preferably 7 classified fields and optionally an 8 field for unclassified components 0028 In one example the Extended SAO eSAO components include 0029 1 Subject S which performs action A on an object O 0030 2 Action A performed by subject S on an object O 0031 3 Object O acted upon by subject S with action A 0032 4 Adjective Adj an adjective which charac terizes subject S or action A which follows the subject in a SAO with empty object O ex The invention is efficient The water becomes hot 0033 5 Preposition Prep a preposition which gov erns Indirect Object Ex The lamp is placed on the table The device reduces friction by ultrasound 0034 6 Indirect object i 0 a component of a sen tence manifested as a rule by a notional phrase which together with a preposition characterizes action being an adverbial modifier Ex The lamp is placed on the table The light at the top is dim The device reduces friction by ultrasound 0035 7 Adverbial Adv a component of a sentence which characterizes as a rule the conditions of per forming a
36. uter such that use of the processed data or representation shall lead to more reliable and accurate results than heretofore possible with conventional systems 0003 One example of such use includes processing user queries into search retrieval verification and display desired information 0004 Another example is to analyze the content of processed information or documents and use such informa tion to create a detailed and indexed knowledge base for user access and interactive display of precise information 0005 Reference is made to known systems for extract ing processing and using SAO Subject Action Object data embodied in natural language text document in digital electronic form These prior systems process native lan guage user requests and or documents to extract and store the SAO triplets existing throughout the document as well as the text segment associated with each SAO and link between each SAO and the Text segment Links are also stored in association with each text segment and the full source document which is accessible by user interaction and input 0006 Although SAO extraction processing and man agement has advanced the science of artificial intelligence both stand alone computer and web based systems there is a need in the art for yet greater accuracy in computer reliability in the semantic processing of user requests knowledge base data and information accessed and obtained on the web SUMMARY OF
37. ution 2 Parsed converted request FIG 6 Structural and Functional Scheme of User Request Convertor Patent Application Publication Jan 24 2002 Sheet 7 of 8 US 2002 0010574 A1 70 Parsed or parsed converted request Action recognition 74 Subject and object recognition Linguistic 76 KB Attributes recognition Constraints recognition eSAO request FIG 7 Structural and Functional Scheme of User Request eSAO extractor Patent Application Publication Jan 24 2002 Sheet 8 of 8 US 2002 0010574 A1 82 User Requests Sources of Documents to be processed Query driven information search 84 Oe ee ee ee me eb EER S ae re een nen cern n nnn n nena 86 Semantic eSAO Semantic eSAO Processing Processing eSAO Structures Index of Source Documents including links to documents eSAO Search Patterns of User Requests 94 Comparative analysis of SAO Search Patterns of User Requests and eSAO Structures of Source Documents i 96 Relevant eSAO Structures of Source Documents FIG 8 Query driven information search US 2002 0010574 Al NATURAL LANGUAGE PROCESSING AND QUERY DRIVEN INFORMATION RETRIEVAL RELATED APPLICATION 0001 U S patent application Ser No 60 198 782 filed Apr 20 2000 BACKGROUND 0002 The present invention relates to methods and appa ratus for semantically processing natural language text in a digital comp

Download Pdf Manuals

image

Related Search

Related Contents

PIR USER MANUAL - SPRINKLER TALK  第2章 空調機器(ガスエアコン等) - LPガス保安技術者向けWebサイト  Vibro Acabadora VDA_700 sm    PCL-850 CLOCK GENERATOR User` Manual  フロン類回収業者関係  Athabasca University  manual de usuário  441-01-00008_5.25.11 Fleet Broadband  Flamcomat® Regolatore di scarico AS  

Copyright © All rights reserved.
DMCA: DMCA_mwitty#outlook.com.