Lecture 8
description
Transcript of Lecture 8
![Page 1: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/1.jpg)
Lecture 8
Applications and demos
![Page 2: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/2.jpg)
Building applications
• Previous lectures have discussed stages in processing: algorithms have addressed aspects of language modelling.
• All but the simplest applications combine multiple components.
• Suitability of application, interoperability, evaluation etc.
• Avoiding error multiplication: robustness to imperfections in prior modules.
![Page 3: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/3.jpg)
Demos• Limited domain systems
– CHAT-80– BusTUC
• OSCAR: Named entity recognition for Chemistry• DELPH-IN: Parsing and generation• Automatic construction of research web pages• Rhetorical structure: Argumentative Zoning of
scientific text• Note also: demo systems mentioned in
exercises.
![Page 4: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/4.jpg)
CHAT-80
• CHAT-80: a micro-world system implemented in Prolog in 1980
• CHAT-80 demo– What is the population of India?– which(X:exists(X:(isa(X,population)
and of(X,india))))– have(india,(population=574))
![Page 5: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/5.jpg)
Bus Route Oracle• Query bus departures in Trondheim, Norway,
built by students and faculty at NTNU.– 42 bus lines, 590 stops, 60,000 entries in database– Norwegian and English– in daily use: half a million logged queries
• Prolog-based, parser analyses to query language, mapped to bus timetable database
• BusTUC demo– When is the earliest bus to the airport?– When is the next bus from Dragvoll to the centre?
![Page 6: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/6.jpg)
Chemistry named entity recognition
• SciBorg: OSCAR 3 system: recognises chemistry named-entities in documents– (e.g. 2,4-dinitrotoluene; citric acid)
• Series of classifiers using n-grams, affixes, context plus external dictionaries
• Used in RSC ProjectProspect• Also used as preprocessor for full parsing• Precision/recall balance for different uses
![Page 7: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/7.jpg)
Enhanced browsing of chemistry documents: RSC using OSCAR
![Page 8: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/8.jpg)
Precision and recall in OSCAR: from Corbett and Copestake (2008)
Modest precision, high recall: text preprocessing
High precision, modest recall: text viewing
![Page 9: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/9.jpg)
![Page 10: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/10.jpg)
DELPH-IN
• DELPH-IN: informal consortium of 18 groups (EU, Asia, US) develops multilingual resources for deep language processing– hand-written grammars in feature structure
formalism, plus statistical ranking– English Resource Grammar (ERG): approx
90% coverage of edited text• ERG demo • Metal reagents are compounds often utilized in synthesis.
![Page 11: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/11.jpg)
![Page 12: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/12.jpg)
![Page 13: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/13.jpg)
![Page 14: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/14.jpg)
Some uses of the ERG• Automatic email response (YY Corp, commercial use)• Machine Translation
– LOGON research project: Norwegian to English– smaller-scale MT with other language pairs
• Semantic search– SciBorg (chemistry, research)– WeSearch (Wikipedia, University of Oslo, new research)
• English teaching (EPGY, Stanford: 20,000 users)– http://www.delph-in.net/2010/epgy.pdf
• Smaller-scale projects in question answering, information extraction, paraphrase ...
![Page 15: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/15.jpg)
Application and domain- independent DELPH-INTools
Application- (andmaybe domain-) specific
![Page 16: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/16.jpg)
Automatic web page generation
• Using publication lists to find links between people and to construct summaries– Generating research websites using
summarisation techniques gives NPs like summarisation techniques
– cluster these terms – locate co-authors, summarise collaborations
• Web page generation demo
![Page 17: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/17.jpg)
Collaboration summariesLawrence C Paulson collaborated with Cristiano Longo and
Giampaolo Bella from 1997 to 2003 on ‘formal verification’, ‘industrial payment and nonrepudiation protocol’, ‘kerberos authentication system’ and ‘secrecy goals’ and in 2006 on ‘cardholder registration in Set’ and ‘accountability protocols’.
![Page 18: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/18.jpg)
Argumentative Zoning
• Finding rhetorical structure in scientific texts automatically– Research goals– Criticism and contrast– Intellectual ancestry
• Robust Argumentative Zoning demo– input text (ASCII via Acrobat)
• Usages: search, bibliometrics, reviewing support, training new researchers
![Page 19: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/19.jpg)
![Page 20: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/20.jpg)
NLP Course conclusionsTheme: ambiguity
• levels: morphology, syntax, semantic, lexical, discourse
• resolution: local ambiguity, syntax as filter for morphology, selectional restrictions.
• ranking: parse ranking, WSD, anaphora resolution.
• processing efficiency: chart parsing
![Page 21: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/21.jpg)
Theme: evaluation
• training data and test data• reproducibility• baseline• ceiling• module evaluation vs application
evaluation• nothing is perfect!
![Page 22: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/22.jpg)
Modules and algorithms• different processing modules• different applications blend modules differently• many different styles of algorithm:
– FSAa and FSTs– Markov models and HMMs– CFG (and probabilistic CFGs)– constraint-based frameworks– inheritance hierarchies (WordNet), decision trees
(WSD)– classifiers (Naive Bayes)
![Page 23: Lecture 8](https://reader035.fdocuments.in/reader035/viewer/2022070502/56814acc550346895db7e32c/html5/thumbnails/23.jpg)
More about language and speech processing ...
• Information Retrieval course• MPhil in Advanced Computer Science:
– language and speech modules– in collaboration with speech group from
Engineering– http://www.cl.cam.ac.uk/research/nl/postgrads/– http://www.cl.cam.ac.uk/admissions/acs/modules/