IE, QA, and Dialogmausam/courses/col864/spring2017/slides/22-w… · •10% presentation •Extra...

Wrapup: IE, QA, and DialogMausam

Grading

• 50% 40% project

• 20% final exam

• 15% 20% regular reviews

• 15% 10% midterm survey

• 10% presentation

• Extra credit: participation

Plan (1st half of the course)• Classical papers/problems in IE: Bootstrapping, NELL, Open IE

• Important techniques for IE: CRFs, tree kernels, distant supervision, joint inference, deep learning, reinforcement learning

• IE++• coreference

• paraphrases

• inference

Plan (2nd half of the course)• QA:

• Conversational agents:

Plan (1st half++ of the course)• Classical papers/problems in IE: Bootstrapping, NELL, Open IE

• Important techniques for IE: Semi-CRFs, tree kernels, distant supervision, joint inference, topic models, deep learning (CNNs), reinforcement learning

• IE++: coreference

• paraphrases

• Inference: random walks, neural models

Plan (2nd half of the course)• QA: open QA, semantic parsing. LSTM, attention, more attention, Recursive NN,

deep feature fusion network

• Conversational agents: Gen. Hierarchical nets, GANs, MemNets

NLP (or any application course)

• Techniques/Models• Bootstrapping• (coupled) Semi-SSL• PGMs: semi-CRF, MultiR, LDA• Tree Kernels• Multi-instance learning• Random walks over graphs• Reinforcement learning

• CNN, LSTM, Bi-LSTM, Recursive NN• Attention, MemNets, • GANs

• Problems• NER• Entity/Rel/Event Extraction• Open Rel/Event Extraction• Multi-task learning• KB inference• Open QA• Machine comprehension• Task-oriented dialog w/ KB• General dialog

How much data?

• Large supervised dataset: supervised learning• Trick to compute large supervised dataset w/o noise

• Semi-CRF, Twit-NER/POS, QuizBowl, SQUaD QA, CNN QA, Movies, Ubuntu, OQA, random walks… (negative data can be artificial)

• Small supervised dataset: semi-supervised learning• Bootstrapping, co-training, Graph-based SSL

• No supervised dataset: unsupervised learning/rules• TwitIE• ReVerb

• Trick to compute large supervised dataset with noise: distant supervision• MultiR, PCNNs

Non-deep L Ideas: Semi-supervised

• Bootstrapping• (in a loop) automatic generation of training data by matching known facts

• Multi-view / Multi-task co-training• Constraints between tasks; Agreement between multiple classifiers for same

concept

• Graph-based SSL• Agreement between nodes of the graph

Non-deep L Ideas: distant supervision

• KB of facts: known. Extraction supervision: unknown• Bootstrap a training dataset: matching sentences with facts

• Hypothesis 1: all such sentences are positive training for a fact: NOISY

• Hypothesis 2: all such sentences form a bag. Each bag must have a unique relation: BETTER

• Hypothesis 3: each bag can have multiple labels: EVEN BETTER

• Multi-Instance Learning• Noisy OR in PGMs

• maximize the max probability in the bag

Non-deep L Ideas: No Intermediate Supervision

• QA tasks: (Question, Answer) pairs known; inference chain: unknown

• Distant Supervision: KB fact known; which sentence to extract from: unknown• OQA (which proof is better is not known)• Random walk inference (which path is better is not known)• MultiR (which sentence in corpus is not known)

• Approach• create a model for scoring each path/proof using weights on properties of each constituent• train using known supervision (perceptron style updates)

• Differences: OQA scores each edge separately, PRA scores path; MultiR – mil.

Non-deep L Ideas: Sparsity

• Tree Kernels: two features (paths) are similar if one has many constituent elements with the other. Similarity weighted by penalty to non-similar elements

• Paraphrase dataset for QA

• Open relations as supplements in KB inference

Deep Learning Models• Convolutional NNs

• Handle fixed length contexts

• Recurrent NNs• Handle small variable length histories

• LSTMs/GRUs• Handle larger variable length histories

• Bi-LSTMs• Handle larger variable length histories and futures

• Recursive NNs• Handle variable length partially ordered histories

Deep Learning Models (contd)• Hierarchical Recurrent NNs

• RNN over RNNs

• Attention models• attach non-uniform importance to histories based on evidence (question)

• Co-attention models• attach non-uniform importances to histories in two different NNs

• MemNets• add an external storage with explicit read, write, updates

• Generative Adversarial Nets• a better training procedure using actor-critic architecture

Hierarchical Models

• Semi-CRFs: joint segmentation and labeling• Sentence is a sequence of segments, which are sequence of words

• Allows segment level features to be added

• HRED: LSTM over LSTM• Document is a sequence of sentences, which is a sequence of words

• Conversation is a sequence of utterances, which is a sequence of words

RL for Text

• Two uses

• Use 1: search the Web to find easy documents for IE

• Use 2: Policy gradient algorithm for updating weights for generator in GANs.

Bootstrapping

• [Akshay] Fuzzy matching between seed tuples and text

• [Shantanu] Named entity tags in patterns

• [Gagan, Barun] Confidence level for each pattern and fact

• Semantic drift

NELL

• Never-ending/lifelong learning

• Human supervision to guide the learning

• [many] multi-view multi-task co-training

• [many] coupling constraints for high precision.

• [Dinesh] ontology to define the constraints

Open IE

• [many] ontology-free, scalablity

• [Surag] data-driven research through extensive error analysis

• [Dinesh] reusing datasets from one task to another

• [Partha] open relations as supplementary knowledge to reduce sparsity

Tree Kernels

• [Shantanu] major info about the relation lies in the shortest path of the dependency parse

Semi-CRFs

• [many] segment level features in CRF

• [Dinesh] joint segmentation and labeling ?

• Order L CRFs vs Semi-CRFs

MultiR

• [Rishab] Use of KB to create a training set

• [Surag] multi-instance learning in PGMs

• [Akshay] relationship between sentence-level and aggregate extractions

• [Gagan] Vitterbi approximation (replace expectation with max)

PCNNs

• [Haroun] Max pooling to make layers independent of sentence size

• [Akshay] Piecewise max pooling to capture arg1, rel, arg2

• [Akshay] Multi-instance learning in neural nets

• Positional embeddings

TwitIE

• [Haroun] tweets are challenging, but redundancy is good

• [Dinesh] G2 test for ranking entities for a given date

• [Shantanu] event type discovery using topic models

RL for IE

• [many] active querying for gathering external evidence

PRA for KB inference

• [Haroun, Akshay] low variance sampling

• [Arindam] learning non-functional relations

• [Nupur] paths as features in a learning model

Joint MF-TF

• [Akshay, Shantanu] OOV handling

• [Nupur] loss function in joint modeling

Open QA

• [Surag] structured perceptron in a pipeline model

• [Akshay] paraphrase corpus for question rewriting

• [Shantanu] mining paraphrase operators from corpus

• [Arindam] decomposition of scoring over derivation steps

LSTMs

• [Haroun] attention > depth

• [Akshay] cool way to construct the dataset

• [Dinesh] two types of readers

Co-attention

• [many] iterative refinement of answer span selection*

HRED

• [Akshay] pretraining dialog model with a QA dataset

• [Arindam] passing intermediate context improves coherence?

• [Barun] split of local dialog generator and global state tracker

MSQU

• [many] partially annotated data

• [many] natural language -> SQL

GANs

• [many] teacher forcing

• [Akshay] interesting heuristics

• [Arindam] discriminator feedback can be backpropagated despite being non-differentiable

MemNets

• [Surag] typed OOVs

• [Haroun] hops

• [Shantanu, Gagan] subtask-styled evaluation

Open/Next Issues

• IE: mature?• Event extraction

• Temporal extraction

• Rapid retargettability

• KB Inference• Long way to go

• Combining DL and path-based models

Open/Next Issues

• QA systems• Dataset driven research: [MC] SQUaD – tremendous progress

• Answering in the wild: not clear (large answer spaces?)

• Deep learning for large-scale QA

• Conversational agents• [Task driven] how to get DL model to issue a variety of queries

• [General] how to get the system to say something interesting?

• DL: what are the systems really capturing!?

Conclusions

• Learn key historical developments in IE

• Learn (some) state of the art in IE, inference, QA and dialog

• Learn how to critique strengths and weaknesses of a paper

• Learn how to brainstorm next steps and future directions

• Learn how to summarize an advanced area of research

• Learn to do research at the cutting edge

Exam

• Bring a laptop • Internet enabled• PDFLatex enabled

• Bring a mobile• Taking a picture

• Extension cords

• It is ok even if you have not deeply understood every paper

Project Presentations

• Motivation & Problem definition

• 1 Slide of Contribution

• Background

• Technical Approach

• Experiments

• Analysis

• Conclusions

• Future Work

IE, QA, and Dialogmausam/courses/col864/spring2017/slides/22-w… · •10% presentation •Extra...

Documents

Transcript of IE, QA, and Dialogmausam/courses/col864/spring2017/slides/22-w… · •10% presentation •Extra...