6 Nov 2001IS202: Information Organization and Retrieval Information Extraction Ray Larson & Warren...

45
Information Extraction Ray Larson & Warren Sack IS202: Information Organization and Retrieval Fall 2001 UC Berkeley, SIMS lecture author: Warren Sack
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    1

Transcript of 6 Nov 2001IS202: Information Organization and Retrieval Information Extraction Ray Larson & Warren...

Information Extraction

Ray Larson & Warren Sack

IS202: Information Organization and Retrieval

Fall 2001

UC Berkeley, SIMS

lecture author: Warren Sack

Cognitive Science

• 10/30/01 – AI, knowledge representation and common sense

• 11/01/01 – Computational Linguistics, Cognitive Psychology and Lexical Knowledge

• 11/06/01 – AI and information extraction• 11/08/01 – Linguistics, Philosophy,

Psychology, categories, and cognition

Last Time

• Lexical relations– Linguistics

• Two approaches to semantics: – Compositional– Relational

– Psycholinguistics

• WordNet– Description– Structure– Applications

Levels of Linguistic Analysis

• Sentences– Phonological/Morphological analysis– Syntactic analysis– Semantic analysis

• More than one sentence– Pragmatic analysis

Pragmatics• Deixis

– E.g., “I’ll be back in an hour” depends upon the time of the utterance.

• Conversational implicature– A: “Can you tell me the time?”– B: “Well, the milkman has come.” [I don’t know exactly, but

perhaps you can deduce it from some extra information I give you.]

• Presupposition– “Are you still such a bad driver?”

• Speech acts– Constatives vs. performatives– e.g., “I second the motion.”

• Conversational Structure– E.g., turn-taking rules

Last Last Time

• What is Cognitive Science?• What is Artificial Intelligence?

– Knowledge Representation• Languages

– Representing Common Sense• Common Sense Interfaces

•Story Understanding, Story Generation, and Common Sense

Today: Information Extraction• A short history: AI Story Understanding,

SAM, and FRUMP

• Basic Techniques: Lexical analysis, name recognition, syntax, scenario, coreference, inference, template

• Evaluation: MUC-3 to MUC-7

• What else can you do with an IE system? SpinDoctor and PLUM

History: Story Understanding

• Roger Schank and Robert Abelson, Scripts, Plans, Goals and Understanding, 1977

• Richard Cullingford, SAM, 1979

• Robert Wilensky, PAM, 1978

• Gerald DeJong, FRUMP, 1979

SAM: Script Applier Mechanism#| restaurant script: (1) go to the restaurant; (2) order a meal; (3)

eat the meal; (4) pay; (5) leave the restaurant |#(events-script '$restaurant '((ptrans (actor ?diner) (object ?diner) (to ?restaurant)) (mtrans (actor ?diner) (object (ingest (actor ?diner) (object ?meal)))) (ingest (actor ?diner) (object ?meal)) (atrans (actor ?diner) (object (money)) (from ?diner) (to ?restaurant)) (ptrans (actor ?diner) (object ?diner) (from ?restaurant) (to ?elsewhere))))

PAM: Plan and Goal Applier Mechanism?#| restaurant plan: goal: you’re hungry and you want to

eat; plan: go to a restaurant |#(goal (planner ?x)

(objective (is (actor ?x) (state (hunger (val 0))))(do-restaurant-plan (planner ?x) (restaurant ?y)))

(subgoal (do-restaurant-plan (planner ?x) (restaurant ?y)(goal (planner ?x)

(objective (proximity (actor ?x) (location ?y)))

(isa restaurant ?y)))

FRUMP: Fast Reading Understanding and Memory Program

$demonstration script• The demonstrators arrive at the

demonstration location.• The demonstrators march.• Police arrive on the scene.• The demonstrators communicate with the

target of the demonstration.• The demonstrators attack the target of the

demonstration.• The demonstrators attack the police.

(From DeJong, 1979; pp. 19-20)

FRUMP: I/O Example

Information Extraction: Basic Techniques

• Lexical analysis

• Name recognition

• Syntax

• Scenario

• Coreference

• Inference

• Template

Levels of Linguistic Analysis

• Sentences– Phonological/Morphological analysis– Syntactic analysis– Semantic analysis

• More than one sentence– Pragmatic analysis

Lexical Analysis• Input: Sam Schwartz retired as executive vice

president of the famous hot dog manufacturer, Hupplewhite Inc. He will be succeeded by Harry Himmelfarb.

• Output: – Sam/name Schwartz/name retired/verb as/prep

executive/adj vice/adj president/noun of/prep the/det famous/adj hot/adj dog/noun manufacturer/noun ,/comma Hupplewhite/name Inc/name ./period<end of sentence>

– He/pron will/verb be/verb succeeded/verb by/prep Harry/name Himmelfarb/name ./period <end of sentence>

Name Recognition

Syntactic Analysis

Syntactic Analysis (continued)

Scenario Matching

Scenario Matching (continued)

Coreference analysis

Inference and Event Matching

Event Template Matching

Evaluation

MUC/Tipster

MUC 3 to MUC 7

What else can you do with an IE system?

• SpinDoctor (Sack, 1994)

• PLUM (Elo, 1995)

PLUM: Peace Love and Understanding Machine

SpinDoctor: Categorizing News Stories by

Ideological Point of View

Next Time

• Categories and Cognition

according to George Lakoff