CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing...

17
CS 4705 CS4705 Natural Language Processing Fall 2010

Transcript of CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing...

Page 1: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

CS 4705

CS4705

Natural Language Processing

Fall 2010

Page 2: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

What is Natural Language Processing?

• Designing software to recognize, analyze and generate text and speech

• Current real world applications– Searching very large text and speech corpora: e.g. the

Web, Facebook, online news sources, telephone calls– Translating from one language to another: e.g.

Arabic/English, – Summarizing very large amounts of text or speech: e.g.

your email, the news– Building spoken dialogue systems: e.g. Amtrak’s ‘Julie

Page 3: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

What’s the Problem?

• Something Went Wrong In Jet Crash, Expert Says

• Police Begin Campaign To Run Down Jaywalkers

• Drunk Gets Nine Months In Violin Case

• Farmer Bill Dies In House

• Iraqi Head Seeks Arms

• Enraged Cow Injures Farmer With Ax

• Stud Tires Out

• Eye Drops Off Shelf

• Teacher Strikes Idle Kids

• Squad Helps Dog Bite Victim

Page 4: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

What will we learn about in this course?

• Morphology: the way words are formed• Syntax: the way words are grouped together

into larger constituents and phrases and the way these phrases can be ordered – how sentences are related

• Semantics: the context-independent ‘meaning’ of utterances

• Pragmatics: the context-dependent ‘meaning’ of utterances

• Goal: what is a speaker/writer meaning to convey

Page 5: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Morphology

Stud tires out: Is `Tires’: a noun or a verb?• Internet search: `union activities in New York’

– What to look for?

• Union/unions; activities/activity

• Active? Action? Actor?

• New vs. New York

Page 6: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Syntax

• Constituent Structure:– Teacher Strikes Idle Kids

– Enraged Cow Injures Farmer With Ax

• Word Order and Meaning– John hit Bill.

– Bill was hit by John.– Bill, John hit.– Who John hit was Bill.– I said John hit Bill.– John hits Bill.

Page 7: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Semantics

• Word meaning– John picked up a bad cold.

– John picked up a large rock.

– John picked up Radio Netherlands on his radio.

• Is meaning compositional?– Squad helps dog bite victim

– Enraged cow injures farmer with ax

Page 8: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Pragmatics

• “Going Home'' -A play in one act (thanks to Bonnie Dorr)– Scene 1: Pennsylvania Station, NY

• Bonnie: Long Beach?

• Passerby: Downstairs, LIRR Station.

– Scene 2: Ticket Counter, LIRR Station

• Bonnie: Long Beach?

• Clerk: $4.50.

Page 9: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

– Scene 3: Information Booth, LIRR Station

• Bonnie: Long Beach?

• Clerk: 4:19, Track 17.

– Scene 4: On the train, vicinity of Forest Hills

• Bonnie: Long Beach?

• Conductor: Change at Jamaica.

– Scene 5: On the next train, vicinity of Lynbrook

• Bonnie: Long Beach?

• Conductor: Right after Island Park.

Page 10: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Algorithms

• Rule-based– Symbolic Parsers and morphological analyzers

– Finite state automata

• Probabilistic/statistical– Learned from observation of (labeled) data

– Predicting new data based on old

– Machine learning

Page 11: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Current Real-World Applications

• Search: very large corpora, e.g. Google• Question answering: e.g. IBM’s Jeopardy!, DARPA

who/what/where…, Ask Jeeves• Translating between one language and another: e.g.

Google Translate, Babelfish• Summarizing very large amounts of text or speech:

e.g. your email, the news, voicemail• Sentiment analysis: restaurant or movie reviews• Dialogue systems: e.g. Amtrak’s ‘Julie’

Page 12: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Instructor

• Julia Hirschberg– CEPSR 705, [email protected]

– Focus: Spoken Language Processing

– Lab: The Speech Lab, CEPSR 7LW3-A

– Research:

• Deceptive speech

• Charismatic speech:

• Emotional speech: anger, uncertainty

• Speech summarization: Broadcast News

• Spoken Dialogue Systems: Games Corpus

• `Translating Prosody’: English – Mandarin

• Text2Scene Synthesis

Page 13: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Course Details

• Teaching Assistant: Mohamed Altantawy– ([email protected])

– Office Hours: CEPSR 7LW1 (Speech Lab), W 5-6, Th 5:30-6:30

• Syllabus http://www1.cs.columbia.edu/~julia/courses/CS4705/syllabus10.htm

Page 14: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

• Text: Daniel Jurafsky and James H. Martin, Speech and Language Processing, second edition – Note errata available on website

• Check courseworks for additional information on class, homework assignments, posting questions

• Assignments: – 3 homework assignments: Question-answering, text

classification, parsing

– Midterm and final exams

– Five ‘free’ late days for homeworks – not usable on HW1 though – after that 10% off per late day

– You will need a CS account

Page 15: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Grading

• HW1: 10%• Hw2: 20%• Hw3: 20%• Midterm: 15%• Final: 25%• Class participation: 10%

Page 16: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

Academic Integrity

Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is forbidden, and will result in an automatic grade of 0 for the entire assignment or exam in which the copying or paraphrasing was done. Your grade should reflect your own work. If you are going to have trouble completing an assignment, talk to the instructor or TA in advance of the due date please. Everyone: Read/write protect your homework files at all times.

Page 17: CS 4705 Natural Language Processing Fall 2010 What is Natural Language Processing? Designing software to recognize, analyze and generate text and speech.

For Next Class

• Look at syllabus – ask questions about anything you don’t understand

• Read Chapters 1-2 of J&M