NLP Crash Course

download NLP Crash Course

of 24

Embed Size (px)

description

Charlie Greenbacker, founder and co-organizer of the DC NLP meetup group, provides a "crash course" in Natural Language Processing techniques and applications.

Transcript of NLP Crash Course

  • NLP Crash Course Charlie Greenbacker dcnlp.org
  • Agenda Introduction & Motivation Famous Examples Basics Major Task Areas Protips Resources
  • Introduction & Motivation By NLP we mean... Natural Language Processing (#NLProc) aka Computational Linguistics,Text Analytics, etc. not Neuro-linguistic Programming! (#NLP)
  • Introduction & Motivation Natural Language Processing is... Using computers to process (i.e., analyze, understand, generate, etc.) natural human languages (e.g., English, Chinese, Klingon). Hello, world!
  • That sounds hard... why should I care? Most of the knowledge created by humans is unstructured text (information overload) Need some way to make sense of it all Enable quantitative analysis of text data Introduction & Motivation
  • Famous Examples Siri (Apple, SRI, Nuance) Speech Recognition/Generation IBM Watson Question Answering Google Translate MachineTranslation
  • Basics Segmentation Part-of-speech tagging Noun phrase (NP) chunking Parsing Word sense disambiguation
  • Basics Stop words, stemming/lemmatization Frequency analysis (terms, ngrams,TF-IDF) Machine learning (classication, clustering, recommendation)
  • Major Task Areas Question Answering Match query with knowledge base Closed domain vs open domain Reasoning about intent of question
  • Major Task Areas Speech Recognition Speech to text Trained/untrained user models Voice-based interfaces
  • Major Task Areas Named Entity Recognition Entity extraction Persons, organizations, location Grammar, syntax, phrasing
  • Major Task Areas Entity Resolution Linking names to ground truth Disambiguating similar names
  • Major Task Areas Co-reference Resolution Finding antecedents for pronouns Name resolution
  • Major Task Areas Relationship Extraction Attribute values SVO triples Populating ontologies
  • Major Task Areas Information Retrieval Query expansion Relevancy of results More like this
  • Major Task Areas Assistive Technologies Text simplication Predictive text input Alternative interfaces
  • Major Task Areas NLG + Automatic Summarization Generating text from data Extractive summarization Abstractive summarization
  • Major Task Areas Machine Translation From source to target, and back! Single terms work... sometimes Idioms, metaphors, cultural references
  • Major Task Areas Sentiment Analysis Polarity, intensity, direction "Easy" for movie/product reviews "Impossible" for nearly anything else
  • Protips Domain adaptation (retrain your models, social media != news) Assume everything is in beta (error rates compound, translate last, consult the research literature) Evaluation is essential (human judges,gold standard data, cross-validation, appropriate metrics)
  • Resources (toolkits) Stanford CoreNLP Java, GPL Apache OpenNLP Java,Apache License NLTK Python,Apache License
  • Resources (books) Natural Language Processing with Python Bird, Klein, and Loper Speech and Language______________ Processing______________ Jurafsky and Martin______________ Foundations of Statistical Natural Language Processing Manning and Schtze
  • Resources (groups) ACL (Association for Computational Linguistics) Conferences,Workshops, Journals, SIGs DC NLP NLP Meetups Data Community DC NLPWorkshops
  • Questions? Charlie Greenbacker dcnlp.org @greenbacker