NLP Crash Course

24
NLP “Crash Course” Charlie Greenbacker dcnlp.org

description

Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.

Transcript of NLP Crash Course

Page 1: NLP Crash Course

NLP “Crash Course”Charlie Greenbacker

dcnlp.org

Page 2: NLP Crash Course

Agenda

• Introduction & Motivation

• Famous Examples

• Basics

• Major Task Areas

• Protips

• Resources

Page 3: NLP Crash Course

Introduction& Motivation

By “NLP” we mean...

Natural Language Processing(#NLProc)

aka Computational Linguistics, Text Analytics, etc.

not Neuro-linguistic Programming! (#NLP)

Page 4: NLP Crash Course

Introduction& Motivation

Natural Language Processing is...

Using computers to process (i.e., analyze, understand, generate, etc.) natural human languages (e.g., English, Chinese, Klingon).

Hello, world! 你好,世界!

Page 5: NLP Crash Course

That sounds hard... why should I care?

• Most of the knowledge created by humans is unstructured text (information overload)

• Need some way to make sense of it all

• Enable quantitative analysis of text data

Introduction& Motivation

Page 6: NLP Crash Course

Famous Examples

Siri (Apple, SRI, Nuance)Speech Recognition/Generation

IBM WatsonQuestion Answering

Google TranslateMachine Translation

Page 7: NLP Crash Course

Basics

• Segmentation

• Part-of-speech tagging

• Noun phrase (NP) chunking

• Parsing

• Word sense disambiguation

Page 8: NLP Crash Course

Basics

• Stop words, stemming/lemmatization

• Frequency analysis(terms, ngrams, TF-IDF)

• Machine learning (classification, clustering, recommendation)

Page 9: NLP Crash Course

Major Task Areas

Question Answering

• Match query with knowledge base

• Closed domain vs open domain

• Reasoning about intent of question

Page 10: NLP Crash Course

Major Task Areas

Speech Recognition

• Speech to text

• Trained/untrained user models

• Voice-based interfaces

Page 11: NLP Crash Course

Major Task Areas

Named Entity Recognition

• Entity extraction

• Persons, organizations, location

• Grammar, syntax, phrasing

Page 12: NLP Crash Course

Major Task Areas

Entity Resolution

• Linking names to ground truth

• Disambiguating similar names

Page 13: NLP Crash Course

Major Task Areas

Co-reference Resolution

• Finding antecedents for pronouns

• Name resolution

Page 14: NLP Crash Course

Major Task Areas

Relationship Extraction

• Attribute values

• SVO triples

• Populating ontologies

Page 15: NLP Crash Course

Major Task Areas

Information Retrieval

• Query expansion

• Relevancy of results

• “More like this”

Page 16: NLP Crash Course

Major Task Areas

Assistive Technologies

• Text simplification

• Predictive text input

• Alternative interfaces

Page 17: NLP Crash Course

Major Task Areas

NLG + Automatic Summarization

• Generating text from data

• Extractive summarization

• Abstractive summarization

Page 18: NLP Crash Course

Major Task Areas

Machine Translation

• From source to target, and back!

• Single terms work... sometimes

• Idioms, metaphors, cultural references

Page 19: NLP Crash Course

Major Task Areas

Sentiment Analysis

• Polarity, intensity, direction

• "Easy" for movie/product reviews

• "Impossible" for nearly anything else

Page 20: NLP Crash Course

Protips

• Domain adaptation(retrain your models, social media != news)

• Assume everything is in beta(error rates compound, translate last, consult the research literature)

• Evaluation is essential(human judges, “gold standard” data,cross-validation, appropriate metrics)

Page 21: NLP Crash Course

Resources(toolkits)

Stanford CoreNLPJava, GPL

Apache OpenNLPJava, Apache License

NLTKPython, Apache License

Page 22: NLP Crash Course

Resources(books)

Natural LanguageProcessing with PythonBird, Klein, and Loper

Speech and Language______________Processing______________

Jurafsky and Martin______________

Foundations of StatisticalNatural Language ProcessingManning and Schütze

Page 23: NLP Crash Course

Resources(groups)

ACL (Association for Computational Linguistics)Conferences, Workshops, Journals, SIGs

DC NLPNLP Meetups

Data Community DCNLP Workshops

Page 24: NLP Crash Course

Questions?

Charlie Greenbackerdcnlp.org

@greenbacker