1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction,...

33
1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta, Xuerui Wang, Charles Sutton, Wei Li UMass Amherst

Transcript of 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction,...

Page 1: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

1

People in CALO’s World:Contact Info, Expertise, Groups & Roles

Information Extraction, Coreference, Group/Topic Models

Andrew McCallum Aron Culotta, Xuerui Wang, Charles Sutton, Wei Li

UMass Amherst

Page 2: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

4

DEX ExampleTo: “Andrew McCallum” [email protected]

Subject ...

First Name:

Andrew

Middle Name:

Kachites

Last Name:

McCallum

JobTitle: Associate Professor

Company: University of Massachusetts

Street Address:

140 Governor’s Dr.

City: Amherst

State: MA

Zip: 01003

Company Phone:

(413) 545-1323

Links: Fernando Pereira, Sam Roweis,…

Key Words:

Information extraction,

social network,…

Search for new people

Page 3: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

6

Outline

Information Extraction– Learning in the wild– Transfer learning

Identity Uncertainty

Modeling Groups, Roles and Topics

Page 4: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

7

Outline

Information Extraction– Learning in the wild– Transfer learning

Identity Uncertainty

Modeling Groups, Roles and Topics

Page 5: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

9

User feedback “in the wild”as labeling

Labeling forClassification

Easy:Often found in user interfaces

e.g. CALO IRIS, Apple Mail

Seminar:How to Organize your Life

by Jane Smith, Stevenson & SmithMezzanine Level, Papadapoulos Sq

3:30 pmThursday March 31

In this seminar we will learn how to use CALO to...

Seminar announcement

Todo request

Other

Labeling forExtraction

Painful:Difficult even for paid labelers

Complex tools

Seminar:How to Organize your Life

by Jane Smith, Stevenson & SmithMezzanine Level, Papadapoulos Sq

3:30 pmThursday March 31

In this seminar we will learn how to use CALO to...

Click, drag, adjust, label,Click, drag, adjust, label,...

Page 6: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

10

Multiple-choice Annotation forLearning Extractors “in the wild”

[Culotta, McCallum 2005]

Jane Smith , Stevenson & Smith , Mezzanine Level, Papadopoulos Sq.

Task: Information Extraction.Fields: NAME COMPANY ADDRESS (and others)

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

user corrects labels, not segmentations

Interface presents top hypothesized segmentations

Page 7: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

11

Multiple-choice Annotation forLearning Extractors “in the wild”

[Culotta, McCallum 2005]

Jane Smith , Stevenson & Smith , Mezzanine Level, Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

user corrects labels, not segmentations

Interface presents top hypothesized segmentations

Task: Information extraction.Fields: NAME COMPANY ADDRESS (and others)

Page 8: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

12

Multiple-choice Annotation forLearning Extractors “in the wild”

[Culotta, McCallum 2005]

Jane Smith , Stevenson & Smith , Mezzanine Level, Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

Jane Smith , Stevenson & Smith Mezzanine Level , Papadopoulos Sq.

29% percent reduction in user actions needed to train

Interface presents top hypothesized segmentations

Task: Information extraction.Fields: NAME COMPANY ADDRESS (and others)

Page 9: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

13

Outline

Information Extraction– Learning in the wild– Transfer learning

Identity Uncertainty

Modeling Groups, Roles and Topics

Page 10: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

14

Piecewise Training in Factorial CRFsfor Transfer Learning

Emailed seminar ann’mt entities

Email English words

[Sutton, McCallum, 2005]

Too little labeled training data.

60k words training. GRAND CHALLENGES FOR MACHINE LEARNING

Jaime Carbonell School of Computer Science Carnegie Mellon University

3:30 pm 7500 Wean Hall

Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on.

Page 11: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

15

Piecewise Training in Factorial CRFsfor Transfer Learning

Newswire named entities

Newswire English words

[Sutton, McCallum, 2005]

Train on “related” task with more data.

200k words training.

CRICKET - MILLNS SIGNS FOR BOLAND

CAPE TOWN 1996-08-22

South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional.

Page 12: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

16

Piecewise Training in Factorial CRFsfor Transfer Learning

Newswire named entities

Email English words

[Sutton, McCallum, 2005]

At test time, label email with newswire NEs...

Page 13: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

17

Piecewise Training in Factorial CRFsfor Transfer Learning

Newswire named entities

Emailed seminar ann’mt entities

Email English words

[Sutton, McCallum, 2005]

…then use these labels as features for final task

Page 14: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

18

Piecewise Training in Factorial CRFsfor Transfer Learning

Newswire named entities

Seminar Announcement entities

English words

[Sutton, McCallum, 2005]

Use joint inference at test time.

An alternative to hierarchical Bayes.Needn’t know anything about parameterization of subtask.

AccuracyNo transfer < Cascaded Transfer < Joint Inference Transfer

Page 15: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

20

Outline

Information Extraction– Learning in the wild– Transfer learning

Identity Uncertainty

Modeling Groups, Roles and Topics

Page 16: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

21

Y/N

Y/N

Y/N

Joint Co-reference Decisions,Discriminative Model

Stuart Russell

Stuart Russell

[Culotta & McCallum 2005]

S. Russel

People

Page 17: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

22

Y/N

Y/N

Y/N

Y/N

Y/N

Y/N

Co-reference for Multiple Entity Types

Stuart Russell

Stuart Russell

University of California at Berkeley

[Culotta & McCallum 2005]

S. Russel

Berkeley

Berkeley

People Organizations

Page 18: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

23

Y/N

Y/N

Y/N

Y/N

Y/N

Y/N

Joint Co-reference of Multiple Entity Types

Stuart Russell

Stuart Russell

University of California at Berkeley

[Culotta & McCallum 2005]

S. Russel

Berkeley

Berkeley

People Organizations

Reduces error by 22%

Page 19: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

25

Outline

Information Extraction– Learning in the wild– Transfer learning

Identity Uncertainty

Modeling Groups, Roles and Topics

Page 20: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

26

Social network from my email

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 21: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

30

From LDA to Author-Recipient-Topic

(ART)

Page 22: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

32

Enron Email Corpus

250k email messages 23k people

Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001

Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions.

DP

Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]

Page 23: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

33

Topics, and prominent sender/receiversdiscovered by ART

Titles chosen by me

Page 24: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

34

Topics, and prominent sender/receiversdiscovered by ART

Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice Presidence of Regulatory Affairs”Steffes = “Vice President of Government Affairs”

Page 25: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

35

Comparing Role Discovery

connection strength (A,B) =

distribution overauthored topics

Traditional SNA

distribution overrecipients

distribution overauthored topics

Author-TopicART

Page 26: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

36

Comparing Role Discovery Tracy Geaconne Dan McCarty

Traditional SNA Author-TopicART

Similar roles Different rolesDifferent roles

Geaconne = “Secretary”McCarty = “Vice President”

Page 27: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

38

Traditional SNA Author-TopicART

Different roles Very differentVery similar

Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”

Comparing Role Discovery Lynn Blair Kimberly Watson

Page 28: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

40

McCallum Email Corpus 2004

January - October 2004 23k email messages 825 people

From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]

There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for:

NIPS registration receipt.CALO registration receipt.

Thanks,Kate

Page 29: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

42

Four most prominent topicsin discussions with ____?

Page 30: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

44

Two most prominent topicsin discussions with ____?

Words Problove 0.030514house 0.015402

0.013659time 0.012351great 0.011334hope 0.011043dinner 0.00959saturday 0.009154left 0.009154ll 0.009009

0.008282visit 0.008137evening 0.008137stay 0.007847bring 0.007701weekend 0.007411road 0.00712sunday 0.006829kids 0.006539flight 0.006539

Page 31: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

47

Page 32: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

49

Role-Author-Recipient-Topic Models

Page 33: 1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,

50

Year Three Plans: “People”

Extraction, for Expert-finding and Group/Role Analysis• Make learning-in-the-wild practical for extraction.• Transfer from noisy/incomplete databases to improve IE.• Support questions about contact info, organizational affiliation, etc.

Identity Uncertainty• Central problem for going from text to knowledge base. • Many interacting entity types, relationships.

Group/Role/Topic Analysis• Explicit “topic models” of groups, roles, expertise, tasks,

and its interation with extraction...• Support Qs about topical expertise, forwarding messages, team building.

Etc.• Continue to support and enhance MALLET toolkit, in collaboration

with UPenn and others.