Bay Area NLP Reading Group - 7.12.16

16
Bay Area NLP Reading Group July 12, 2016

Transcript of Bay Area NLP Reading Group - 7.12.16

Page 1: Bay Area NLP Reading Group - 7.12.16

Bay Area NLPReading Group

July 12, 2016

Page 2: Bay Area NLP Reading Group - 7.12.16

AnnouncementsJoin our Slack channel!

https://bay-area-nlp-reading.slack.com/

To join, message me (Katie Bauer) on Meetup, talk to me after the meeting or email [email protected]

Page 3: Bay Area NLP Reading Group - 7.12.16

Want to help out?Present a paper you love

Demo your favorite NLP tool or library

Host a future meetup

Participate!

Page 4: Bay Area NLP Reading Group - 7.12.16

What is NER?Extracting proper nouns and classifying into categories- Universally: person, location, organization- Date/time, currencies, domain-specific

Traditional Approaches:- gazetteers (list lookup)- shallow parsing - ‘based in San Francisco’

Difficulties:- Reconciling different versions of names - Noam Chomsky vs. Professor Chomsky- Washington - person, place, collective name for US government- May - person or month?

Page 5: Bay Area NLP Reading Group - 7.12.16

What are Convolutional Neural Nets?1. Divide input into windows2. Calculate some sort of summary3. Feed that summary to next layer4. Divide summary into windows5. Summarize the summary

And so on and so forth

Page 6: Bay Area NLP Reading Group - 7.12.16

What does that look like for language?Windows are word contexts

If wi = ‘movie’,[wi-2, wi-1, wi, wi+1, wi+2] = [like, this, movie, very, much]

Wi is a column vector

Page 7: Bay Area NLP Reading Group - 7.12.16

ModelTask: Given a sentence, score the likelihood of each named entity class word for each word

Input:

Sentence of N words{w1,w2, … , wn-1, wn}

Wordswn = [wwrd,wwch]

Page 8: Bay Area NLP Reading Group - 7.12.16

ModelScoring

Concatenate all word vectors centered around word n to get vector rPass r through two layers of the neural networkCheck transition score Aut to see likelihood of tags given previous tagsStore all possible tag sequencesPick most likely sequence at end of sentence

OptimizationSentence score is conditional probability, so minimize negative log likelihoodBackpropogated stochastic gradient descent

Page 9: Bay Area NLP Reading Group - 7.12.16

CorporaPortuguese

- Word embeddings initialized with three corpora - Trained and tested on HAREM- HAREM 1 for training, miniHAREM for test

Spanish- Word embeddings initialized with Spanish Wikipedia- Trained and tested on SPA CoNLL-2002- SPA CoNLL-2002 has predivided training, development and test sets

Page 10: Bay Area NLP Reading Group - 7.12.16

ExperimentsComparable Architectures:

- CharWNN - WNN- CharNN - WNN + capitalization feature + suffix feature

Page 11: Bay Area NLP Reading Group - 7.12.16

ExperimentsState of the Art:

- AdaBoost for Spanish - ETLCMT for Portuguese

Page 12: Bay Area NLP Reading Group - 7.12.16

ExperimentsPortuguese by entity type

Page 13: Bay Area NLP Reading Group - 7.12.16

ExperimentsPretrained word embeddings vs. randomly initialized word embeddings

Page 14: Bay Area NLP Reading Group - 7.12.16

TakeawaysDifferent types of information are captured at word and character level

Prior knowledge (pretrained word embeddings) improves performance

With no prior knowledge, a bigger data set is better

Page 15: Bay Area NLP Reading Group - 7.12.16

Additional ResourcesIntroduction to Named Entity Recognition

https://gate.ac.uk/sale/talks/stupidpoint/diana-fb.ppt

Understanding Convolutional Neural Networks for NLPhttp://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/

Implementing a CNN for Text Classification in Tensorflowhttp://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

Page 16: Bay Area NLP Reading Group - 7.12.16

Thank you!Bay Area NLPReading Group

July 12, 2016