Home

NaturalLanguageProcessing-Lecture01 Instructor (Christopher

1. t like that So one reaction to that is to say Well natural languages just aren t the right thing for computers And so that s actually been a dominant strand in the computing industry So what people do is to find different ways of doing things whether it s using XML the idea of the semantic web or designing gooies with things like menus or drop boxes where effectively what we re saying is Look The computer just can t deal with the kind of stuff that humans produce and how they naturally interact And so instead what we re going to do is we re going to substitute different means of communication and interaction which are easy for computers And well to some extent that has been a very successful strategy because things like gooies and XML have been very successful but the basis of that success is essentially accepting that you can t get computers to do clever things but noticing that it s very easy to get human beings to adapt and do clever things in different ways And so we re exploiting human cleverness rather than working out how to have computer cleverness And so eventually it seems like what we want to do is actually to work out some of the hard problems and workout how we can get computers to be at the understand and process and produce human languages and perhaps to learn them just as the way as two year old kids somehow manage to listen to this linguistic signal and learn languages And so
2. But that s again a perfectly good semantic translation according to how natural languages work Okay I have one more example of that but maybe I should start to get ahead and so I get to say a bit of stuff about machine translation before we run out of time But the lesson to takeaway is that natural languages are highly ambiguous at all levels That the reason that they work for human beings is because natural language understanding essentially depends on making complex and subtle use of context both the context of the rest of the sentence and context of prior discourse and thing around you in the world So for humans it s obvious what people are saying But that means that it s hard for computers to do things and so somehow there s a lot of different factors that influence how things are interpreted And so that means that that suggests this approach where we want to somehow make use of probabilistic factors and reasoning to try and do a good job of simulating how human beings can understand natural language So inaudible doing natural language understanding is extremely extremely difficult It s sometimes referred to as one of the AI complete tasks But it s also important to counter that with the other extreme which is it turns out that there s some other things that you can do with NLP which turn out to be surprisingly easy and to work rather well And so a lot of the time we have a vast amount to text available to us
3. We need to be optimized to do the best we can Right now linguistics are right on the edge of what the process it can do As we get another factor of 2 then speech will start to be on the edge of what it can do Well that sounds a nice story and the story that s being told as well just give us a few more CPU cycles and well everything will work fine But the problem is if you then look up at the top of the slide that these are actually remarks that Bill Gates made in 1997 And if you can remember back to 1997 I think that was maybe when what people had was original Pentiums that ran at 166 megahertz or maybe Pentium Pros at 200 megahertz had come out I mean either way you look at it since that time computers have gotten an order of magnitude faster in terms of their CPUs and we now have our dual core and quad core TIFFs and all of these things So it s sort of fairly easy to be getting two orders of magnitude of performance compared to what we had then But somehow that hasn t solved all of the problems of natural language processing It seems like well oftentimes it still seems like we re just on the edge of what computer technology can do That natural language processing and speech recognitions sort of work but still don t work quite well enough to satisfy what people need out of them And so I think one of the important things to realize from both this lecture and the class at a whole is just to really actually get
4. as a decoding process And let s see if we can just automatically get our computers to build models of the translation process And so in a couple of lectures I m going to go through the details of how the IMB models started to do that In the very next lecture I ll first of all talk about language models which is a key ingredient we ll need along the way But for the last five minutes of this lecture what Pd like to do is sort of illustrate this process by just showing how it works at a human level working with a small simulated parallel text And so this is following a kind of a nice example that was done by Kevin Knight who said ISI at the University of Southern California and ISI has been one of the most prominent centers where people have worked on statistical machine translation in the last decade So the Kevin Knight scenario is what we want to do is translate between Sentory or from Sentory into Aucturan So that we start off with this sentence in Sentory foreign language and somehow that we want to learn how to translate this into Aucturan Well fortunately for us we ve actually also managed to find a little bit of parallel text Some example of sentences in Sentory for which we know their translation in Aucturan So here s our parallel corpus We ve got these 12 pairs of sentences in the two languages Now obviously when we ask our computers to do this task typically well if we re building large system
5. of words of data in major languages that anyone can get their hands on Of course now we also know more about linguistic structure and that helps The final thing that has helped progress is this item that I ve got down there as change in expectations That the change of expectations part is saying there s something interesting that s happened is that there s become new opportunities for kind of so so translation That in the early days essentially the only use for translation was for producers of documents So people had their user manual and they wanted to translate it into Spanish German French Italian et cetera And by and large those people had to pay human beings because machine translation was too bad for them to be able to release on the market that their product because frankly it would look bad Notwithstanding those consumer electronic products where even when the human beings translate them that commonly the translation is still really bad and makes the product look bad But the thing that s changed now is that we now have much more of a system through things like the World Wide Web where there are users that can originate their own translation So that I might want to do something like go and look at the Al Jazeera webpage and wonder what it s saying and my Arabic isn t very good so I want it translated for me And I ll be happy providing can I understand the gist of what the page says I don t really mind if t
6. some sense of why this problem is hard I mean some things we can do really well with computers these days Right the visual effects that you see in your Hollywood videos are completely amazing and amazingly realistic and lifelike So why isn t our ability to do human language understanding equally as good And I think one way to think about that is to think about the history of NLP what kind of computers people had and where things headed from that So if you look at the early history of NLP NLP essentially started in the 1950s It started just after World War II in the beginning of the Cold War And what NLP started off as is the field of machine translation of can you use computers to translate automatically from one language to another language Something that s been noticed about the field of computing actually is that you can tell the really old parts of computing because the old parts of computer science are the ones that have machine in the name So that the association for computing machinery is a really old organization and you have finite state machines They were things that were invented a long time ago and machine translation was something that was studied a long time ago Whereas something like computer vision or graphics there s no machine in the name because really those were fields that didn t even begin until decades later So the beginning of NLP was essentially in this Cold War context where the Soviet Union and th
7. talk to False Maria And that s just a very natural interface modality for human beings to think about So in general the goal of the field of NLP is to say that computers could be a ton more useful if they could do stuff for us and then as soon as they you want computers to do stuff for us well then you notice that a lot of human communication is by means of natural language and a lot of the information that is possessed by human beings whether it s Amazon product catalogs or research articles that are telling you about proteins that that s information in natural language So computers could be a ton more useful if they could read our email do our library research chat to us do all of these things involve dealing with natural language But it s precisely that point that there s this problem that well computers are fazed by human languages They re pretty good at dealing with machine languages that are made for them but human languages not so Now maybe that s not the computer s fault It s really it s the programmer s fault and partly that s because natural languages are hard as Ill discuss and partly it s because that a lot of computer scientist and computer science think is directed in the direction that s very different from how natural languages are So computer scientist generally tends to want precision or precisely specified unambiguous APIs and meanings for things and natural languages aren
8. talking a little bit about the class I mean in some sense this class is sort of like an AI systems class in that there s less of a pure buildup coherently theory from the ground up in a concerted way And it s more of a class that s built around Okay there are these problems that we want to deal with in understanding natural language and what kind of methods can we use to deal with those problems And how can we build systems that work on it effectively So there s a lot of hands on doing things in assignments working out how to get things to work kind of issues and one of the things that you ll find doing that is often that practical issues and working out how to define things right and making sure that the data s being tokenized correctly that a lot of these things can be just as important as theoretical niceties And I think that s true of the field of NLP as a whole as well that if you look at papers in the field papers tend to emphasize their key research idea that is cool and novel but in practice they re the kind of systems people build if the experimental results of the papers Yes they contain that idea but they also contain a lot of other hard work getting all of the details right And so that means that in this class we re going to sort of assume that people have some background and can exploit knowledge of a bit of linear algebra a bit of probability and Statistics that they have decent programming sk
9. that s in some sense the very lowest level of natural language technologies I m really going to be interested in this talk and how can we start to do things which don t fully understand the text but somehow go beneath the surface and start to recognize certain parts of the structure and meaning of human language text and can do cleverer things because of that And you start to see that happening even these days in even sort of effectively base the whole technologies like information retrieval and web search that once upon a time you just cut in the key words and you index those key word Where now all the search engines are increasingly starting to do things like how can we do things like having different forms of the word count as equivalent How can we do query expansion to find or turn inaudible related terms and things of that sort Okay well what is it that s holding back the field of natural language understanding Here s a quote that I ve always rather liked It s from Bill Gates who s actually always been an extremely big supporter of natural language understanding as part of his general interest in using natural language interface technologies So Bill Gates says Applications always become more demanding Until the computer can speak to you in perfect English and understand everything you say to it and learn in the same way that an assistant would learn until it has the power to do that we need all the cycles
10. the web and then you spider the web and all those things it s all metaphorical extensions that are going on Juvenile Court to Try Shooting Defender This is then again an ambiguity of syntactic structure in terms of what s modifying what So one possibility is that you have shooting as a modifier of defendant so you have a shooting defendant and then you put together the rest of the sentence so that to try is then the main verb and you re trying the shooting defendant But the alternative is that you have shooting as a verb and defendant as the object of that verb So you re shooting the defendant and that s something the court s going to try And then we can continue on from there and there are a bunch of other examples Teachers Strikes Idle Kids Stolen Painting Found by Tree This is more sort of semantic interpretation of by tree as the to whether that s being regarded as the agent of the finding or the location of the finding Local High School Dropouts Cut in Half Red Tape Holds Up New Bridges Clinton Wins on Budget But More Lies Ahead Hospitals are Sued by Seven Foot Doctors Kids Make Nutritious Snacks Minister Accused of Having Eight Wives in Jail So these are the sort of funny examples where there are real ambiguities that human beings notice and in a sense I went looking for these in a place where you can find them because there s a fact about the way that headlines are constructed that makes t
11. this mysterious word for crok Now we re left with no underlined words here That doesn t mean we know the exact answer One possibility is really we should take a pair of these words and translate them as a word in the other language There are lots of cases in which you have something that can be expressed as two words in one language and one word in the other language It can even happen in the same language that you just have an alternative So you can either say I went into the room or you can say I entered the room So went into and entered they re kind of two possible paraphrases One is two words one is one word So we have to allow for those kind of many to one alignments in various ways But the other possibility is that this is just a word that isn t translated And in the statistical MT literature those referred to as zero fatality words that the idea is this word just doesn t have a key That s there nothing that it becomes in the other language And so that s a possible scenario And so in this this is the kind of process from which we can use parallel text to learn how to translate And the cute thing about this example from me to close on is well actually this example is really Spanish English except that Kevin Knight took all of the words and changed them into these sort of made up Sentory Aucturan words but you actually have exactly the same translation problem where what we want to
12. to understand For a human being it doesn t seem like there s any problem there That it doesn t seem like there s any ambiguity at all if you just read it casually But if you actually start looking at the sentence more carefully there are tons and tons of ambiguities So there s the word rates That can be either a noun or a verb so it can be She rates highly or Our water rates are high Noun or verb The word interest that can be a noun or a verb You can say Japanese movies interest me verb Or you can say The interest rate is 8 percent noun Raises that can be a noun or a verb as well So it can be Fed raises is a verb or The raises we received was small That can be a noun So you notice this fundamental fact that in English a ton of words can be nouns or verbs I mean and very generally you can take nouns and turn them into verbs So you have a noun like butter and you can turn that into butter your bread You have a noun that s a protocol like SSH and then you can immediately turn that into a verb and say I SSH d the file to you So that kind of already introduces a ton of ambiguity And so what does that mean Well what that means is that there are a ton of different syntactic structures that we can build because of that noun verb ambiguity So if you combine the two facts that raises interest and rates can be nouns and verbs and the fact that you can make noun
13. NaturalLanguageProcessing Lecture01 Instructor Christopher Manning Hi everyone Welcome to the first class of Stanford s cs224n which is an intensive introduction to natural language processing concentrating primarily but not exclusively on using probabilistic methods for doing natural language processing So let me just say a teeny bit about the structure of the course and some of the administration If you want more information on any of these things the main thing to do is go to the website which is cs224n Stanford edu So I m the instructor Christopher Manning and there are two TAs this quarter Paul Boundstark and Dejuan Chang The lectures right now are Monday Wednesday 11 00 to 12 15 being broadcast live Most weeks there be a section on Fridays from 11 00 to 12 15 So for the handouts for today there s course syllabus there s the first lecture and then most importantly I m handing out the first assignment already today and I ll say a bit more about that as the class proceeds But this would be a good time to over the weekend look at the first assignment and check that you know how to do all the kinds of things that you need to be able to do to get productively working on the assignment next week in particular So for this class there are three programming assignments that respecify to you and then there s a final project and most of the grade is that work In addition to that there are going to be just a few percent o
14. actic structure of the translation is just a ton better because it s been correctly recognized in the source And it s actually starting to be quite a passable translation Okay so this is a very quick history of machine translation research ve already mentioned the work in the 1950s and 60s essentially just direct word for word replacement That was then the kind of shutdown of empty funding and research in the U S following the ALPEC report in 1966 There then started to be a resurgence in other parts of the world maybe parts of the world that needed translation more So in Europe and Japan these were the days of the Japanese Generation Project when they were going to take over the world with UAI But that s sort of started to slowly lead in the late 80s and the early 90s to a gradual resurgence of machine translation work in the U S But what really turned the world around was when these IBM machine translation models were produced and the approach of statistical machine translation started to be hotly pursued So over the time that I ve been doing this job things have just been completely turned around where from in the late 1990s almost no one worked in machine translation and most of the interesting citing research work was being done in the other areas of natural language processing like parsing information extraction and various other topics Whereas really in the last five years machine translation and a particular stati
15. and we can do a surprisingly good job for many tasks by just looking through a large amount of text and counting things and then predicting basic on counting So that s the kind of thing that s done with context sensitive spelling correction You just count up how often different spelling occur in different context and you say that s the way it should be spelled when a gap appears in similar context And ideas like that just work incredibly successfully and are incredibly useful to people So somehow we want to be in this middle ground where we re doing things that are sometimes not even that hard but give useful results Okay and so that s led into this direction of how can we do a fairly robust and reliable way of combining at least an approximation of knowledge about language in terms of how language is used in other context What kind of context we re in at the moment and put that information together to come up with the right interpretation of sentences in a straightforward way And the answer that s been getting a lot of traction in the last decade and a half or so in natural language processing is to be using probabilistic models It s essentially the whole of probability theory the idea of it is well we have uncertain knowledge about the world but we want to combine all the evidence that we have which we may in turn the uncertain inaudible and solve it and combine it together to give the best prediction or understandi
16. as a bigger noun phrase and there s a ban on that So that s the funny reading Then the reading that you re meant to get is that there s a ban on nude dancing and that ban is on the governor s desk So that what you re effectively then have is this prepositional phrase On Governor s Desk is actually modifying the word Ban And so that s refereed to in a lot of the literatures where does the prepositional phrase attach Meaning what does it modify This one s dated a few years and so I guess this one comes from prior to 1991 Iraqi Head Seeks Arms Like so this one is then playing on words sense ambiguity So Arms can be real arms of animals but here it s talking about weapons And Head can be a real head but here it s then referring to a head of state In both those usages are affectively illustrating the way in which a lot of new words senses come into being That word senses often come from what are referred to as sense extensions which occur fairly productively where more or less metaphorically people will be extending the meaning of one word into further domain So that you have the idea of a head on a person that s controls the person and so you then extend that idea to a state or a country and then you have a head of the country who is the person who controls that And that keeps on happening all the time So you notice all through recent developments in the computer industry So that when we have
17. at during World War II that what were computers used for in World War II Essentially they were used for working out the flight path of bombs and for code breaking And so thinking about machine translation he got interested in can computers do translation as code breaking And he wrote to a professor that he knew who was a foreign languages professor about this idea And he wrote Also knowing nothing official about but having guessed and inferred considerable about the powerful new mechanized methods and cryptography Methods which I believe succeed even when one does not know what language is being coded One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography When I looked at the article in Russian I say This is really written in English but it has been coded in some strange symbols I will now proceed to decode So that was seemed to be the idea in Warren Weaver s head Now in practice in terms of what was actually built in the 1950s and 60s really no one picked up on this idea at all That people started hand building translation dictionaries and hand building analysis rules in grammas and hand building a big system and it wasn t approached in this manner at all So strictly it was then in the 90s that well late 80s 90s that statistical work picked up this idea again say Well let s start with this parallel text and view language translation
18. at you have a kind of a good understanding of the main tools that people are using these days for natural language processing Okay so the first problem that we re going to be starting with is this problem of machine translation So the idea is we re going to start with this piece of text here which most of us aren t very good at reading inaudible there were these characters Inaudible at the beginning but my Chinese isn t very good and we d like to turn it into this text where we can just understand and say Here s the U S island of Guam is maintaining but many people have thought of this as kind of the classic acid test for natural language processing But if we could have a system that could fluently translate text that means that we have completely solved the problem natural language understanding I think in modern times people are actually a little bit less convinced of that because although one approach to doing natural machine translation is to do full text understanding and presumably that s essentially when a human translator does most of the time Really if you look at just the inputs and outputs of machine translation well both the input and the output are streams So really machine translation is just a string transduction problem and it s quite possible that you could do that quite well without really understanding all of the text And indeed quite a lot of the work on statistical machine translation has ta
19. compounds in English by just sticking nouns together you get a lot of ambiguity So noun compounding is things like computer network where you just stick two nouns next to each other to build a bigger noun Very productive in the computer industry That s why when you read the manuals and you have this sort of personal computer power supply enclosure that s just all of the compound noun where you re sticking nouns together So here s the structure that we meant to be getting where we make interest rates the compound noun in Fed raises interest rates that combining those two facts together you could instead make Fed raises a noun phrase as a compound noun and then interest rates where interest is the verb Or you could have Fed raises interest as the noun phrase and rates as the verb And not only are these sentences syntactically wellformed you can tell a sort of semantic story for them as well It s not that you can t come up with a semantic translation for these So this is the normal one where the Fed is raising interest rates For this one I could tell a story about that there was an article in yesterday s paper that the Federal Reserve Board had decided to award everyone on the Board i e themselves a large pay raise and then you could have a headline that says Fed Raises Interest Rates meaning that the market rates were going down or up because people though policy on the Fed was lacks or the Fed was corrupt or s
20. d take your sentence in Russian you d look up the words in the dictionary and replace them with the first word that you saw in the dictionary as a possible translation equivalent And as we ll see when we Start talking about more about machine translation later today that that doest work very well There s a lot more going on in languages that make them difficult And so after awhile people noticed that the problem seemed intractable and so there was this famous report that was issued by the U S government in the 1960s the ALPEC report which essentially cancelled all work on machine translation research in the United States I think rightly arguing that given the technology and science that existed at the time that this was just a hopeless endeavor and essentially recommending that what instead should be being done is the promotion of basic science so that people actually had a much better understanding of how human languages worked Okay so let me go on for a little into then talking about this question of why natural language understanding is difficult and essentially the point that I want to bring home is that natural language understanding is in the space of what sometimes people call inverse problems And the idea of inverse problems is that there s stuff that you see on the surface and then you have to construct stuff that underlies it and doing that is always the hard problem So the analogy is to go back to my example with
21. e United States were both convinced that the others had scientific advances that would allow them to dominate the world And the problem was they wrote about them all in their own respective languages and so both the Soviet Union and the United States started pouring huge amounts of money into this goal of machine translation to be able to translate from one language to the other language And at the beginning there was a lot of excitement and ll show you that in just a moment But that excitement soon turned to disappointment Maybe I should show you the excitement first Video playing So this is a video from the 1950s talking about early work on machine translations Instructor Christopher Manning So Pll stop there and I ll talk about why it runs into trouble in a moment in the context of natural language understanding But I mean in retrospect it s not very surprising that this early work in the 1950s didn t go a long way When you think of the kind of computers were available in those days that they had far less computing power than what you now have in your cell phone or even your average pocket calculator And so one part of it was that there was no there was really just about no computing power But the other part of it is that there was almost no understanding about human how human languages worked It was really only in the late 1950s and early 1960s that work started to be done on understanding how the structure of hu
22. e s text or natural language understanding and there s speech recognition And those two sub areas haven t always had as much communication between each other as they should have had And departmentally speech has always been mainly based in electrical engineering departments But what happened was that at the same time that all the people in computer science and AI in particular were convinced in the use of symbolic methods and doing logical approaches to artificial intelligence and things like that the speech people often electrical engineering departments were separated enough off from the kind of things that they were studying was that they were still studying how to use probabilistic methods How to use real numbers and integrals for doing signal processing and all of those kind of things So that the natural way for electrical engineer and speech people the process speech recognition was thinking about in terms of probabilistic models of reconstructing words from an uncertain audio signal And some of those people then started thinking about how could we then start applying some of those ideas further up the feeding chain for doing other kinds of natural language processing So maybe we could use those same probabilistic ideas to work out the part of speech of words And essentially this work really came out of two places It came out of AT amp T labs Bell labs in those days and IBM research on the East Coast That there were groups the
23. e two languages has some similarities So the fact that furock is second in both of these sentences and ja is second in these sentences too that s at least weak confirmatory evidence And so tentatively we could say Well we can translate furock as ja Okay so then we can go on to the second word crok That s a kind of a tricky one because crok only appears once in our parallel corpus and well I mean if we really believe word order was always the same we could count but that seems kind of weak evidence since we know word order varies a lot between languages So that one seems kind of too hard So maybe we should just leave it for a moment and say it could be any of those words in the translation and maybe we could go on and look at other word Okay so hehawk looks rather more hopeful because we can find hehawk in three sentences And so again we could do this process of saying Okay which word well if we look at these three sentences it seems like arrat appears in all of them Now you know on practice a lot of time we won t be this lucky because a word in Sentorian will be translated sometime by one word in Aucturan and sometimes by a different word And so then it s a bit harder but we ll at least look for words that frequently occur in the translation of the sentence But here we seem to be in good business and we can go with that alignment And so we can move onto yurock So we can do the same kind of rea
24. erent kinds of beans same as we have words for farmer beans and lima beans but that they re different kinds of beans And every language does that right This is kind of like the Eskimos having a lot of words for snow And the problem is if you re actually trying to design an interlingua that works for every language that means you have to put into the interlingua the complexities of every individual language and that makes it a very hard goal to get to Okay so we re not going to be heading in that direction But we re going to be looking at how statistical machine translation can be done at essentially the word direct translation level and then talking a little bit about how to do it at the syntax level And so Ill have you do machine translation The essential answer to how to do machine translation by statistical methods is this picture So this is a picture of the Rosetta Stone which is the famous basis for how the decoding of hieroglyphics happened And well how did that happen How that happened was because someone had a hunch that this stone contained the same text in three languages in Greek in Demotic Coptic and Hieroglyphs and this assumption that this text this correct assumption that this stone had the same text in three languages gave enough information to start to then workout how you could decode hieroglyphs And so this is exactly what we re going to use We re not exactly going to be decoding in exactly the sa
25. he translation isn t perfect It ll be a lot better than having nothing And that s precisely why there s now this opportunity that you find on all the major search engines like Google and Yahoo and Microsoft Live Search web they can do these low quality translations for you And that same facility is being used by users in lots of other places as well So everything inaudible doing online chat and playing games and things like that that you can help out billions of people who s primarily language isn t English especially by providing these kind of facilities Okay so overall there are different approaches to machine translation based on how deep you are going in attempting to translate And this was presented a very very long time ago by someone Voqua which is then called the Voqua Triangle And the idea here is we ve got the source text and we ve got the target text And while we re able to do some amount of processing and then at some point we re going to try and convert across to the other language So one possibility is that we do extremely minimal processing we presumably want to cut the text into words We might want to normalize the words a little and then go straight at the word level across the target language And so that s then called direct translation That was what was done in the 1950s when statistical machine translation started in the 1990s Actually that was again what people did It s just that the
26. hese kind of funny ambiguities more likely to arise and that in headlines you often leave out a lot of the function words and so that tends to increase the amount ambiguity that s going on So in some sense this is atypical but I think the example that you more want to still keep in your head is that example of Fed Raises Interest Rates Half a Percent in Effort to Control Inflation I mean that example s not a funny example When human beings read it they don t see any ambiguities at all And yet still in that example there are tons of ambiguities present And I ll show just one more example that then relates too where there are semantic ambiguities So in the old days in the GRE and still in the LSAT they have these logic puzzle questions And I think probably often the kind of people have become computer science majors that probably some time in high school or something you ve seen these kind of logic puzzles So the nature of the questions is that there s a scenario and then you re meant to answer a question So six sculptures are to be exhibited in three rooms and then there are some conditions Sculpture C and E may not be exhibited in the same room Sculptures D and G must be exhibited in the same room If sculptures E and F are exhibited in the same room no other sculpture may be exhibited in that room And then there s a question that you re meant to ask Now because these are inaudible problems that determine whether people ge
27. his next sentence down here we ve got two words that we haven t aligned here One was the crok that we wanted to work out at the beginning and then we have zanszenot So even using the process of elimination idea there are still two words here and one word there and it might not seem quite obvious what to do But a possible source of information is to see that zanszenot and zanszena Well they look very similar to each other and so often words that look very similar in different languages actually are related to each other Those words are referred to as cognates And now for some languages which are related historically or had a lot to do with each other for various kind of historical reasons there are tons of cognates So French and English have tons of cognates and then as related in the European languages and even for languages which aren t actually related closely in the language families for a kind of historical and cultural reasons there maybe a lot of borrowed words that give cognate So for example you ll find lots of Arabic words through vast stretches of Africa where the languages aren t Arabic because various forms of science and culture were borrowed and so you see these borrowed words That even works in modern times with languages like Chinese and Japanese That they ve been borrowing lots of words from English for various technological things So cognates can often be quite helpful Well what does that leave as
28. ield has been going backwards or making no progress I think actually it is a sign of progress because back in the 70s and 80s natural language understanding wasn t actually practical for any real applications Where now things are in a state where they re actually real practical deployed applications where people can use natural language understanding and so that s encouraging people to work on that end of the spectrum where there are actually things that you can use Okay but something I do want to emphasize about what we re going to focus on in this course is that we re going focus on stuff that is in some sense going beneath the surface of the text and involves some understanding of the text So starting with either speech or text input There s the most superficial level right So there s an audio signal and you can record it and you can play it back And when you re playing it back you re playing back human speech if that s what you ve recorded But you ve done no kind of processing or understanding of any of that Similarly for text format input you can do this very superficial level of processing where you cut your words and white space and then you maybe index them in the information retrieval system like the very kind of base level of web search and then you just kind of match words and you give stuff back And again it seems like you re not really doing any understanding of natural language And while
29. ills Now obviously not everyone has exactly the same background and skills Some people know a ton about probability and machine learning others not so much So that we re kind of hoping that as you go along that you can pickup things that you don t know are rusty on and everyone can do well And I think in practice that works out fairly well because even people who ve done a lot of machine learning often haven t seen so much of getting things to work in practical context And so the class tries to do a mix between teaching the theory and actually learning techniques that can be used in robust practical systems for natural language understanding Okay so where does the idea of natural language understanding come from The idea of natural language understanding is basically is old as people thinking about computers because as soon people started thinking about computers and thinking about robots that they thought about wanting to communicate with them and while the obvious way for human beings to communicate is to use language Video playing I m sorry Dave I m afraid I can t do that Instructor Christopher Manning Okay so there s my little clip from Hal 2001 but I mean actually the idea goes much further back than that That if you go into the earliest origins of science fiction literature that you can find in 1926 Metropolis that you have False Maria who was a nonhuman robot and well not surprisingly you could
30. in this course we ll talk about some of these open research problems in natural language So what kind of things do people do in natural language processing The goals of NLP can be extremely far reaching so there s the idea that you should just be able to pickup any piece of text and understand it You should be able to reason about the consequences of those texts You should be able to have real time spoken dialogues between you and your robot At the other end the goals can be very down to earth So you might want to do context sensitive spelling correction So that was picked out by Walt Mossberg in the Wall Street Journal as one of the three biggest innovations of Office 2007 2008 That s a very simple little bit of natural language processing You might have other low level goals such as working out how much different products sell for at different retailers on the web A funny thing about the field of natural language processing is if you look how the field has moved that in the 70s and the 80s people there was a lot of interest in cognitive science and people spent a lot of time working on those far reaching goals of how can we do complete text understanding Well if you look at natural language processing in the past decade really a vast amount of the work has been doing these very low level problems like finding prices of products and doing context sensitive spelling correction You might think that that just means the f
31. ind of stuff that happens in translation So there s another piece of Chinese text and the reference translation is according to the data provided today by the Ministry of Foreign Trade and Economic cooperation as of November this year China has actually utilitized 46 959 billion U S dollars of foreign capital et cetera The main thing that we re going to talk about and that you guys are going implement some of for the second assignment is these models the canonical early models of machine translation by statistical means were proposed by IBM and they built five models of increasing complexity called IBM Models 1 2 3 4 and 5 not very original names And for the assignment we re going to get you guys to build Models 1 and 2 because you can implement Models 1 and 2 fairly straightforwardly where things get much more complex if you go up to Model 3 But here s some output from IBM Model 4 The Ministry of Foreign Trade and Economic Cooperation including foreign direct investment 40 007 billion U S dollars today provide data include that year to November China actually using Well there are some snatches of that in particular some noun phrases that they system just did fine right The Ministry of Foreign Trade and Economic Cooperation that s fine Then the 40 007 billion U S dollars somehow the number came out slightly different but maybe that s exchange rate But that s kind of an okay bit but clearly if you then lo
32. ken exactly that approach But nevertheless regardless of whether you see it as the fundamental test of natural language processing it s clearly an immensely important practical problem You see it a little bit less in the United States than in some other places since English is so dominant in much of the United States but if you look at other places like the European Union so the European Union spends billions of dollars each year translating stuff Even in the United States if you re a company that does business internationally you have to pay to translate stuff The U N spends an huge amount of money translating stuff And so here s just a cute little example that I got the other week from Scott Klemmer which shows that how even kind of modern new age web companies can t escape from the cost of translating stuff So Scott was talking to some of the people at Google about Google SketchUp and they pointed out this fact that while they only release a new version Google SketchUp every 18 months Why do they do that It s not because they re not working on the code and couldn t regularly release new versions of the code The problem is that they can t get the translations of the user manual done more regularly than that So it s really doing machine translation doing translation that is really killing a lot of the sort of productivity for being able to be nimble Okay here s a few examples just to give you a sense of the k
33. man languages in terms of how words put together in sentences really worked So if people in computer science know the name Noam Chomsky they generally know it for two reasons One reason is because of his political take on the world which I m not going to discuss here and the other reason is because of the Chomsky Hierarchy And so the Chomsky Hierarchy gets taught as an understanding of levels of complexity of formal languages and that s related to things like building transducers and compilers and things like that But Chomsky didn t actually develop the Chomsky Hierarchy for the purpose of making your life grim and formal languages and altimeter classes The reasons that Chomsky Hierarchy was developed was that so that he could try and get a handle on and present arguments about what the complexity of the structure of human languages was That he was wanting to argue that the structure of human languages was clearly beyond that of context free languages whereas most of the models of language that were actually being built in the 30s 40s and early 50s were actually modeling language as finite state and was there for inaudible or completely inadequate the complexity of human languages And so it was really just there was neither the science nor the technology for this work to be able to go far So if you look at what was actually produced what there was was essentially word lookup and substitution programs So you
34. me sense but we re going to use the same source of information We re going to take text that we know is parallel There s lot of sources of parallel data these days in the world because people like the European Union translates stuff or the Hong Kong Parliament produces proceedings in Chinese and English And we re going to exploit this parallel text to induce statistical models of machine translation So here s a list of just some of the places where you can get parallel text There are lots of others as well so lots of magazines appear in multiple languages And so what we re going to do is have our computer look at lots of this parallel text and say Hmm every time I see a particular word in the Spanish text the translation of this sentence has particular words in the English text and so that will be my translation This is actually in a sense also old idea because I mentioned already that in the early years after World War II a ton of money was thrown into machine translation research The person who actually initiated that work was Warren Weaver who was a prominent person in the U S government in the years after World War IL So Warren Weaver essentially was the person that initiated this research in machine translation and precisely where his thinking came from was coming from the direction of knowing about code breaking And so Weaver got interested in where that idea could be exploited because he had noted the fact th
35. movies so that there are two twin problems There s computer graphics where you want to generate good graphics of a scene and there s computer vision where you actually want to be able to look at a scene in a photo and understand what s in it as a three dimensional model in your head So computer graphics is the direct problem That you have a model of what you want on the screen and then you realize that in graphics And direct problems are the easy problems Computer vision that s the inverse problem where you re just seeing the surface picture and you want to reconstruct what objects there are in that scene And that s still a really difficult problem And in just the same way natural language understanding is an inverse problem That you re going from a speech signal or sequence of words and you re trying to reconstruct the structure of the sentence and the meaning of the sentence that underlies it And the reason that that s difficult is that natural languages are extremely extremely ambiguous So here s a fairly straightforward sentence My example here is a few years old but ever year the Fed is either raising or lowering rates So we have this sentence Fed raises interest rates half a percent in effort to control inflation Well you know that s just a sort of an obvious sentence That s the one time if you look at the financial section of your newspaper that s kind of one you read every year It s not hard
36. n weekly quizzes just to check that people are vaguely keeping up with all the other topics But we re seeing this as primarily a project based learning kind of class where a lot of the learning goes on in doing these projects And so for the first three projects all three of those projects have a lot of support code for them to make it easy for you to do interesting things in the two weeks you get to do them in and all of that support code is written in Java In particular it s Java 1 5 for generics and stuff like that So basically a requirement for this class is that you can handle doing the Java programming And so hopefully you ve either seen some Java before or it won t be too difficult to kind of catch up to speed Java s obviously not that different from other languages like C or Python the way most things work And you know we re not it s not that we re heavily using system s libraries beyond a few basic things such as collections for razor lists Most other information see the course webpage The little note on the bottom was I surveyed people where they actually want paper handouts of everything or whether they re completely happy on getting everything off the webpage And it seems like about 2 3 of people are perfectly happy just to deal with stuff electronically So what we re going to do is we re going to print just a few copies of handouts but not enough for everyone that turns up Okay so I started
37. ng we can of what is intended and what was said et cetera That s exactly what probability theory is for and it works rather nicely for a lot of natural language problems So that the general idea is that anything it doesn t immediately look like natural languages look like things you normally see in probability theory because when you see probability theory you normally see numbers and normal distributions and things like that But we re going to imply just exactly the same ideas to language stuff so that we have the French word inaudible which might be translated as house and we to just learn the fact that that s a high probability translation Whereas we have inaudible general we don t want to translate that as the general avocado but we want to say that we ve got enough context around the word avacard that we can see that translating as avocado would be a very bad idea Lately using probabilistic models has become quite trendy and widely used in a lot of areas of artificial intelligence You see it in vision you see it in machine learning you see it in various forms of uncertain knowledge representation That s it s sort of just slightly intrinsic historically that the use of probabilities in natural language understanding didn t come from those recent developments in artificial intelligence Where this strand of work came from is actually from electrical engineering So in the greater field of natural language ther
38. of meaning so it can mean something that you like My main interest is collecting mushrooms It can be that you have a stake in a company He has a 5 percent interest in Vulcan holdings It can be a rate that you re paid the 8 percent interest rate lots and lots of meanings for the word interest But then there are also ambiguities above the word level So towards the end of the sentence you have these phrases in effort and control inflation So in effort is a prepositional phrase then you have this infinitival phrase to control inflation Whenever you have these kinds of prepositional phrases or infinitival phrases you have ambiguities as to what they modify So is in effort going to modify this noun phrase of half a percent Or is it going to be modifying the verb raises and we ll see some more ambiguities of that kind later But to give us slightly more of a sense of what kind of ambiguities occurs let me show the funny examples These are all examples of real newspaper headlines which have ambiguities So Ban on Nude Dancing on Governor s Desk What s this one Well this one is precisely one of these prepositional phrase attachment ambiguities So we have the prepositional phrase On Governor s Desk and there are two ways that you can interpret that You can take On Governor s Desk as modifying Nude Dancing So then you have Nude Dancing on Governor s Desk
39. ok at other parts of the syntactic structure it s just completely garbled because the syntactic structure of Chinese is rather different to English So when you have these bits like U S dollars today provide data include that year to November China actually using right the inaudible it seems like most of the words are there that you re kind of looking up words and sticking the translation in just like people were doing in the 1950s Here it s being done with somewhat better use of context so you re choosing the right words out of ambiguous translations But clearly the syntax is just kind of completely garbled and it s not making any sense So when you do the assignment don t expect your translations to be perfect because really they ll be worse than that one probably But then in the bottom towards to the end about some of the more recent work of trying to syntax based machine translation models which then do a better job at maintaining the struc while recognizing therefore correctly translating the structure of sentences And so this is work from inaudible and their translation is that today s available data of the Ministry of Foreign Trade and Economic Cooperation shows that China s actual utilization of November of this year will include 40 07 billion U S dollars for the foreign directive investment I mean it s not quite perfect It s a little bit awkward in a couple of places but you can see how the synt
40. omething like that And then once you have that idea you could go to this one and now we ve got something like the Nelson Ratings of television and one of the stations was showing a news item about these Fed raises And so Fed raises interest is then the level of interest in rate in raises at the Fed and that rate s only half a percent viewer ship almost no one watched it So it s a bit of a stretch but it s not that you can t tell a semantic story about those how those fit together as well So those are all the kind of ambiguities that our computer systems will find Somehow as human beings that even as we just read or hear the words we ve got all of this kind of highly contextually based plausibility deciding going on in our heads at the same time So that most of the most we just never notice those ambiguities we just kind of go straight to the most plausible interpretation and stick with that So that s one kind of ambiguity There are lots of other kinds of ambiguity so just to mention a few more At the word level it gets still as well as having these syntax ambiguities of noun or verb you also get semantic ambiguities so called word sense ambiguities So the word Fed here refers to the Federal Reserve Board but the word Fed is also used for an FBI agent so A Fed trailed me on my way home It can also of course be the verb that s the past tense of the verb to feed Interest that s a word with tons
41. ou can handle all of the languages of the world without very much work Because if you were down here or down here and you want to translate it between a lot of languages the problem is if there are end languages that you wanted to cover you have to build end squared systems And so that s the kind of problem that Google still has today right That you can pull down the menu of what languages you can do translations from and they re building systems for individual language pairs So you can translate from Chinese to English or English to Chinese and French to English and English to French But if you want to translate from Chinese to French it doesn t provide that So you re in this sort of bad end squared space Whereas if you could actually do this generation up to an interlingua then you d only have to build order end translation systems and you could translate between every language pair So that s seems very appealing but on the other hands it s been a very hard idea to workout And this is perhaps a false idea because the problem is individual different languages have all kinds of distinctions and ways of looking at the world that are of their own and aren t reproduced in other languages So you know there are particular things that we make distinctions about in English We distinguish peas and beans as separate words Whereas in most language don t actually distinguish peas and beans right They could have words for diff
42. re that started spreading from speech into using probabilistic techniques for natural language understanding and essentially that lead to natural language processing being revolutionized by probabilistic methods earlier than and quite separately from any of the work that happened in using probabilistic methods in other parts of AI Okay so in the rest of this course what we re going to do is look at a bunch of different problems and natural language processing and how they can be approached So the very first problem we re going to start off with is that first problem of NLP machine translation and looking at how statistical machine translation systems are built And I m going to use that as a backdrop for introducing some of the other key technologies we need for using probabilistic models in NLP such as building language models and algorithms for estimation and smoothing and things like that Pl then go on and start talking about sequence models the task like information extraction and part of speech tagging and then move on to doing natural language parsing looking context free grammas and probabilistic equivalence of those And then after that we ll then go further into doing semantic interpretations the text and particular applications There s only time in the quarter to look at a few applications so there are lots of other things that I m just going to have to leave out but try and do a few things in enough detail th
43. s commonly we re actually giving them millions of words of parallel text and even inaudible system we might begin at tens of hundreds of thousands of words of parallel text But here P l just illustrate it with this teeny teeny text But nevertheless the although we ll do it here as human reasoning what we re doing is essentially exactly what we ll want to build an algorithm to simulate So what we re going to build for algorithms is what are called EM based alignment algorithms So Pl talk later about the EM algorithm It s a general probabilistic estimation algorithm which tries to learn how to do translations It learns alignments which are parings between translated words So how might we do it in this context Well an obvious thing to do is to start with the first word in the Sentory sentence So suppose we look at that word furock Well what would furock mean Well essentially our source of evidence is to say Well can we find furock in our parallel corpus Oh yes furock appears twice in our parallel corpus Well if we don t know anything else a reasonable thing to assume is maybe that one of these words translates the word furock And presumably that s true here So a good heuristic would be to look for a word that appears in both of the translations and while there s precisely one of those words this word ja Another heuristic that we might have used is is we might have guessed that word order between th
44. soning and say Well for yurock that seems quite plausible that we can line that with maut And so that then gets us onto this word of clock So clock is again the tricky one because clock only appears once in the corpus So we re still in this we might think that we re in this situation Well it only appears once It could be any of those words in the other sentence We can t make any progress But actually that s no longer true because since we ve been kind of working hard on this problem we can start to make some progress now because although if you just look at this you could think it could be anything Well we ve been looking at various words and so we can start to work out by some kind of process of elimination what might be going on So that if we realize that well actually we ve worked out good alignments for other things that these are good alignments Well now it looks like well clock is the only word left over and ba hasn t been aligned to anything so maybe what we should do is translate clock as ba And so that s this been giving us a process of elimination where exploiting of some words we know we can guess that the remaining words go to untranslated words And the EM algorithm one of the features of it is it provides an implementation of this sort of process of elimination idea There are other sources of information we can also use for helping us along in some cases So if we look at t
45. stical machine translation has just become the hot area of machine translation So these days all of the young graduate students or assistant professors when they get jobs it seems like fully half of them what they want to be doing is doing statistical machine translation And so there s a ton of new and interesting work leading to much better statistical machine translation systems Okay so what happened between ALPEC when it was deemed impossible and now Well a lot of things have happened The need for machine translation certainly didn t go away In fact the need for machine translation is just increased and increased for various kind of economic and political reasons So the greater internationalization of trade and multinational companies the greater unification that occurs through things through the European Union or Hong Kong becoming part of China That everywhere that there s sort of more and more need for translation The technology has gotten a ton better and that s made things a lot more possible The other big thing that s made things a lot more possible is not just the computers but actually having the data In the early days that there just wasn t online digital data in large amounts to do thing with whereas now we have a ton of such data the most obvious part of which is looking at the World Wide Web but just more generally all of the data that s available in digital form That there are now just literally billions
46. t into graduate school that the people who build these at ETS they re whole desire is to make these questions completely unambiguous so that people don t sue the ETS saying that they couldn t get into grad school successfully But it turns out that the human languages you just can t make them unambiguous because just by nature natural languages are ambiguous So you take a sentence like this last one at least one sculpture must be exhibited in each room and no more than three sculptures may be exhibited in any room So these phrases like at least one sculpture in each room no more than three any room those are what are referred to as quantifiers And it s just a fact about English that whenever you have multiple quantifiers you have scope ambiguities So there are different ways to interpret things So if we take this last bit no more than three sculptures may be exhibited in any room the way you re meant to interpret that is for any room it has at most three sculptures in that But it s also possible to interpret things differently with a opposite order of quantifiers So you could have the interpretation that there s a special set of sculptures which numbers no more than three maybe they re ones that the museum owns or something or aren t precious and these ones are capable of being exhibited in any room in the museum Whereas all the other sculptures somehow there are restrictions in which rooms you re allowed to show them in
47. translate is Clients do not sell pharmaceuticals in Europe And this is our parallel corpus And what you can see in the alignment is exactly the kind of things that we re talking about So here we get the word order mismatches So in Spanish you have the adjective appearing after the noun So you got these crossed alignments that we were seeing here And then here this was the example with the cognates and the zero futilities so here is the cognates so I m sure what it is I presume it s some kind of drug and so its similar in both languages And well what was our dual futility word The zero futility word was do So this is just the kind of thing that happens commonly with languages That you get these function words So in English you re putting in this axillary verb do to go along with the negation do not sell Whereas in Spanish there just is no equivalent You just have to note no sell and that s all there is So these kind of function words often appear as zero futility words Okay so that s where I ll stop for today and then next time we ll first of all take a little detour into inaudible language modeling which is a kind of core technology that we ll need to do machine translation and then on Wednesday get back to machine translation proper End of Audio Duration 75 minutes
48. y were doing it with statistics to try and do it better But most people have believed that ultimately to get better translation you re actually going to want to understand more of the source text to use that information to make a better translation So that you might want to do syntactic analysis produces syntactic structure for the sentence And then you could do syntactic transfer to a syntactic tree in the other language and then do generation from there to the target text And that s the kind of area that s sort of really the research area these days at how to do syntax based statistical machine translation But you could want to and there was work in the 1980s especially that did go even deeper than that You might want to go up to the level of having semantic analysis of a particular language and then do semantic transfer And then people have had the idea and again this was sort of worked on actively in the 1980s that maybe we could go even deeper than that We could kind of go beyond source language whatever it is the Chinese semantic structure Why can t we define a universal semantic structure which could be used for all languages Which is then referred to as the interlingua And so it s that interlingua based machine translation where you translate from one language into lingua and then from the interlingua back to the other language There s an enormous appeal of that approach because if you could actually do it then y

NaturalLanguageProcessing-Lecture01 Instructor (Christopher

Contents

Download Pdf Manuals

Related Search

Related Contents