NLP Crash Course
-
Upload
charlie-greenbacker -
Category
Data & Analytics
-
view
188 -
download
24
description
Transcript of NLP Crash Course
NLP “Crash Course”Charlie Greenbacker
dcnlp.org
Agenda
• Introduction & Motivation
• Famous Examples
• Basics
• Major Task Areas
• Protips
• Resources
Introduction& Motivation
By “NLP” we mean...
Natural Language Processing(#NLProc)
aka Computational Linguistics, Text Analytics, etc.
not Neuro-linguistic Programming! (#NLP)
Introduction& Motivation
Natural Language Processing is...
Using computers to process (i.e., analyze, understand, generate, etc.) natural human languages (e.g., English, Chinese, Klingon).
Hello, world! 你好,世界!
That sounds hard... why should I care?
• Most of the knowledge created by humans is unstructured text (information overload)
• Need some way to make sense of it all
• Enable quantitative analysis of text data
Introduction& Motivation
Famous Examples
Siri (Apple, SRI, Nuance)Speech Recognition/Generation
IBM WatsonQuestion Answering
Google TranslateMachine Translation
Basics
• Segmentation
• Part-of-speech tagging
• Noun phrase (NP) chunking
• Parsing
• Word sense disambiguation
Basics
• Stop words, stemming/lemmatization
• Frequency analysis(terms, ngrams, TF-IDF)
• Machine learning (classification, clustering, recommendation)
Major Task Areas
Question Answering
• Match query with knowledge base
• Closed domain vs open domain
• Reasoning about intent of question
Major Task Areas
Speech Recognition
• Speech to text
• Trained/untrained user models
• Voice-based interfaces
Major Task Areas
Named Entity Recognition
• Entity extraction
• Persons, organizations, location
• Grammar, syntax, phrasing
Major Task Areas
Entity Resolution
• Linking names to ground truth
• Disambiguating similar names
Major Task Areas
Co-reference Resolution
• Finding antecedents for pronouns
• Name resolution
Major Task Areas
Relationship Extraction
• Attribute values
• SVO triples
• Populating ontologies
Major Task Areas
Information Retrieval
• Query expansion
• Relevancy of results
• “More like this”
Major Task Areas
Assistive Technologies
• Text simplification
• Predictive text input
• Alternative interfaces
Major Task Areas
NLG + Automatic Summarization
• Generating text from data
• Extractive summarization
• Abstractive summarization
Major Task Areas
Machine Translation
• From source to target, and back!
• Single terms work... sometimes
• Idioms, metaphors, cultural references
Major Task Areas
Sentiment Analysis
• Polarity, intensity, direction
• "Easy" for movie/product reviews
• "Impossible" for nearly anything else
Protips
• Domain adaptation(retrain your models, social media != news)
• Assume everything is in beta(error rates compound, translate last, consult the research literature)
• Evaluation is essential(human judges, “gold standard” data,cross-validation, appropriate metrics)
Resources(toolkits)
Stanford CoreNLPJava, GPL
Apache OpenNLPJava, Apache License
NLTKPython, Apache License
Resources(books)
Natural LanguageProcessing with PythonBird, Klein, and Loper
Speech and Language______________Processing______________
Jurafsky and Martin______________
Foundations of StatisticalNatural Language ProcessingManning and Schütze
Resources(groups)
ACL (Association for Computational Linguistics)Conferences, Workshops, Journals, SIGs
DC NLPNLP Meetups
Data Community DCNLP Workshops
Questions?
Charlie Greenbackerdcnlp.org
@greenbacker