Text classification Day 35
-
Upload
rebecca-patterson -
Category
Documents
-
view
33 -
download
2
description
Transcript of Text classification Day 35
Text classificationDay 35
LING 681.02Computational Linguistics
Harry HowardTulane University
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
2
Course organization
http://www.tulane.edu/~ling/NLP/
Learning to classify text
NLPP §6
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
4
Classification
What is it?Supervision
A classifier is supervised if it is built on training corpora containing the correct label for each input.
This usually means that the program can calculate an error when the predicted label does not match the correct label.
A classifier is unsupervised if it is built on training corpora that does not contain the correct label for each input.
There is no way to calculate an error.
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
5
Diagram of supervised classification
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
6
Philosophical question
Does supervised classification work for the majority of stuff that you learned spontaneously as a child?
NO, life does not come neatly labelled.
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
7
Algorithm Divide the corpus into three sets:
training set test set development (dev-test) set
Choose an initial set of features that will be used to classify the corpus. The part of the program that looks for the features in the corpus is called a
feature extractor. Train the classifier on the training set. Run it on the development set. Refine the feature extractor from any errors produced on the
development set. Run the improved classifier on the test set.
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
8
Choosing the right features
Use too few, and the data will be underfitted.The classifier is too vague and makes too many
mistakes.
Use too many, and the data will be overfitted. The classifier is too specific and will not
generalize to new examples.
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
9
Example: gender id
What would the features be?A female name ends in a, e, i.A male name ends in k, o, r, s, t.
Explain how classification would work.NLTK code pp. 223-4.
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
10
More examples
Classify movie reviews as positive or negative.How?
Classify POS of words.How?
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
11
Beyond the word
Look at word's context. As we have seen, this is crucial to POS tagging.
Classify IMs as to dialogue acts that they instantiate. What could be some such acts? statement, emotion, yes-no question How?
Recognizing textual entailment … is the task of determining whether a given piece of text T
entails another text called the "hypothesis". How?
18-Nov-2009 LING 681.02, Prof. Howard, Tulane University
12
RTE example
T: Parviz Davudi was representing Iran at a meeting of the Shanghai Co-operation Organisation (SCO), the fledgling association that binds Russia, China and four former Soviet republics of central Asia together to fight terrorism.
H: China is a member of SCO.
Next time
Finish NLPP §6
Go on to NLPP §7
Extracting info from text