IE, QA, and Dialogmausam/courses/col864/spring2017/slides/22-w… · •10% presentation •Extra...
Transcript of IE, QA, and Dialogmausam/courses/col864/spring2017/slides/22-w… · •10% presentation •Extra...
Wrapup: IE, QA, and DialogMausam
Grading
• 50% 40% project
• 20% final exam
• 15% 20% regular reviews
• 15% 10% midterm survey
• 10% presentation
• Extra credit: participation
Plan (1st half of the course)• Classical papers/problems in IE: Bootstrapping, NELL, Open IE
• Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning
• IE++• coreference
• paraphrases
• inference
Plan (2nd half of the course)• QA:
• Conversational agents:
Plan (1st half++ of the course)• Classical papers/problems in IE: Bootstrapping, NELL, Open IE
• Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning
• IE++: coreference
• paraphrases
• Inference: random walks, neural models
Plan (2nd half of the course)• QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN,
deep feature fusion network
• Conversational agents: Gen. Hierarchical nets, GANs, MemNets
NLP (or any application course)
• Techniques/Models• Bootstrapping• (coupled) Semi-SSL• PGMs: semi-CRF, MultiR, LDA• Tree Kernels• Multi-instance learning• Random walks over graphs• Reinforcement learning
• CNN, LSTM, Bi-LSTM, Recursive NN• Attention, MemNets, • GANs
• Problems• NER• Entity/Rel/Event Extraction• Open Rel/Event Extraction• Multi-task learning• KB inference• Open QA• Machine comprehension• Task-oriented dialog w/ KB• General dialog
How much data?
• Large supervised dataset: supervised learning• Trick to compute large supervised dataset w/o noise
• Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks… (negative data can be artificial)
• Small supervised dataset: semi-supervised learning• Bootstrapping, co-training, Graph-based SSL
• No supervised dataset: unsupervised learning/rules• TwitIE• ReVerb
• Trick to compute large supervised dataset with noise: distant supervision• MultiR, PCNNs
Non-deep L Ideas: Semi-supervised
• Bootstrapping• (in a loop) automatic generation of training data by matching known facts
• Multi-view / Multi-task co-training• Constraints between tasks; Agreement between multiple classifiers for same
concept
• Graph-based SSL• Agreement between nodes of the graph
Non-deep L Ideas: distant supervision
• KB of facts: known. Extraction supervision: unknown• Bootstrap a training dataset: matching sentences with facts
• Hypothesis 1: all such sentences are positive training for a fact: NOISY
• Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER
• Hypothesis 3: each bag can have multiple labels: EVEN BETTER
• Multi-Instance Learning• Noisy OR in PGMs
• maximize the max probability in the bag
Non-deep L Ideas: No Intermediate Supervision
• QA tasks: (Question, Answer) pairs known; inference chain: unknown
• Distant Supervision: KB fact known; which sentence to extract from: unknown• OQA (which proof is better is not known)• Random walk inference (which path is better is not known)• MultiR (which sentence in corpus is not known)
• Approach• create a model for scoring each path/proof using weights on properties of each constituent• train using known supervision (perceptron style updates)
• Differences: OQA scores each edge separately, PRA scores path; MultiR – mil.
Non-deep L Ideas: Sparsity
• Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements
• Paraphrase dataset for QA
• Open relations as supplements in KB inference
Deep Learning Models• Convolutional NNs
• Handle fixed length contexts
• Recurrent NNs• Handle small variable length histories
• LSTMs/GRUs• Handle larger variable length histories
• Bi-LSTMs• Handle larger variable length histories and futures
• Recursive NNs• Handle variable length partially ordered histories
Deep Learning Models (contd)• Hierarchical Recurrent NNs
• RNN over RNNs
• Attention models• attach non-uniform importance to histories based on evidence (question)
• Co-attention models• attach non-uniform importances to histories in two different NNs
• MemNets• add an external storage with explicit read, write, updates
• Generative Adversarial Nets• a better training procedure using actor-critic architecture
Hierarchical Models
• Semi-CRFs: joint segmentation and labeling• Sentence is a sequence of segments, which are sequence of words
• Allows segment level features to be added
• HRED: LSTM over LSTM• Document is a sequence of sentences, which is a sequence of words
• Conversation is a sequence of utterances, which is a sequence of words
RL for Text
• Two uses
• Use 1: search the Web to find easy documents for IE
• Use 2: Policy gradient algorithm for updating weights for generator in GANs.
Bootstrapping
• [Akshay] Fuzzy matching between seed tuples and text
• [Shantanu] Named entity tags in patterns
• [Gagan, Barun] Confidence level for each pattern and fact
• Semantic drift
NELL
• Never-ending/lifelong learning
• Human supervision to guide the learning
• [many] multi-view multi-task co-training
• [many] coupling constraints for high precision.
• [Dinesh] ontology to define the constraints
Open IE
• [many] ontology-free, scalablity
• [Surag] data-driven research through extensive error analysis
• [Dinesh] reusing datasets from one task to another
• [Partha] open relations as supplementary knowledge to reduce sparsity
Tree Kernels
• [Shantanu] major info about the relation lies in the shortest path of the dependency parse
Semi-CRFs
• [many] segment level features in CRF
• [Dinesh] joint segmentation and labeling ?
• Order L CRFs vs Semi-CRFs
MultiR
• [Rishab] Use of KB to create a training set
• [Surag] multi-instance learning in PGMs
• [Akshay] relationship between sentence-level and aggregate extractions
• [Gagan] Vitterbi approximation (replace expectation with max)
PCNNs
• [Haroun] Max pooling to make layers independent of sentence size
• [Akshay] Piecewise max pooling to capture arg1, rel, arg2
• [Akshay] Multi-instance learning in neural nets
• Positional embeddings
TwitIE
• [Haroun] tweets are challenging, but redundancy is good
• [Dinesh] G2 test for ranking entities for a given date
• [Shantanu] event type discovery using topic models
RL for IE
• [many] active querying for gathering external evidence
PRA for KB inference
• [Haroun, Akshay] low variance sampling
• [Arindam] learning non-functional relations
• [Nupur] paths as features in a learning model
Joint MF-TF
• [Akshay, Shantanu] OOV handling
• [Nupur] loss function in joint modeling
Open QA
• [Surag] structured perceptron in a pipeline model
• [Akshay] paraphrase corpus for question rewriting
• [Shantanu] mining paraphrase operators from corpus
• [Arindam] decomposition of scoring over derivation steps
LSTMs
• [Haroun] attention > depth
• [Akshay] cool way to construct the dataset
• [Dinesh] two types of readers
Co-attention
• [many] iterative refinement of answer span selection*
HRED
• [Akshay] pretraining dialog model with a QA dataset
• [Arindam] passing intermediate context improves coherence?
• [Barun] split of local dialog generator and global state tracker
MSQU
• [many] partially annotated data
• [many] natural language -> SQL
GANs
• [many] teacher forcing
• [Akshay] interesting heuristics
• [Arindam] discriminator feedback can be backpropagated despite being non-differentiable
MemNets
• [Surag] typed OOVs
• [Haroun] hops
• [Shantanu, Gagan] subtask-styled evaluation
Open/Next Issues
• IE: mature?• Event extraction
• Temporal extraction
• Rapid retargettability
• KB Inference• Long way to go
• Combining DL and path-based models
Open/Next Issues
• QA systems• Dataset driven research: [MC] SQUaD – tremendous progress
• Answering in the wild: not clear (large answer spaces?)
• Deep learning for large-scale QA
• Conversational agents• [Task driven] how to get DL model to issue a variety of queries
• [General] how to get the system to say something interesting?
• DL: what are the systems really capturing!?
Conclusions
• Learn key historical developments in IE
• Learn (some) state of the art in IE, inference, QA and dialog
• Learn how to critique strengths and weaknesses of a paper
• Learn how to brainstorm next steps and future directions
• Learn how to summarize an advanced area of research
• Learn to do research at the cutting edge
Exam
• Bring a laptop • Internet enabled• PDFLatex enabled
• Bring a mobile• Taking a picture
• Extension cords
• It is ok even if you have not deeply understood every paper
Project Presentations
• Motivation & Problem definition
• 1 Slide of Contribution
• Background
• Technical Approach
• Experiments
• Analysis
• Conclusions
• Future Work