1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron...

29
1 SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia 11/27/2007 Advisor: I. Budak Arpinar Committee: Prashant Doshi Robert J. Woods

Transcript of 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron...

Page 1: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

1

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Delroy CameronMasters ThesisComputer Science, University of Georgia

11/27/2007

Advisor: I. Budak ArpinarCommittee: Prashant Doshi

Robert J. Woods

Page 2: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

2

OUTLINE

BackgroundExpertise ProfilesRanking ExpertsCollaboration Networks ExpansionResults and EvaluationConclusionDemo

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 3: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

3

BACKGROUND

Semantic WebWhat?

Extension of current WebAttach Meaning to Data

Why? Under Utilization of Current WebHTML Limitations

GoalEnhance Information ExchangeAutomatic Information DiscoveryInteroperability of Services

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 4: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

4

BACKGROUND

Semantic WebTechnologies

XMLRDF/RDFS/OWLURIOntology

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

“David Billington is a Professor of Mathematics”

<course name=”Mathematics”>

<lecturer>David Billington</lecturer>

</course>

<lecturer name=”David Billington”>

<teaches>Mathematics</teaches>

</lecturer>

<teachingOffering>

<lecturer>David Billington</lecturer>

<course>Mathematics</course>

</teachingOffering >

<rdf:Description rdf:id=mynamespace:Professor_2”>

<rdf:has_name>David Billington</rdf:has_name>

<rdf:teaches rdf:resource=”#Mathematics”/>

</rdf:Description>

Page 5: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

5

BACKGROUND

Semantic WebCommon Challenges

Entity DisambiguationOntology Mapping/AlignmentTrust/ProvenanceSemantic Association Discovery

ApplicationSocial NetworksBio-InformaticsNational SecurityGPS Data Mining

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 6: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

6

BACKGROUND

Social NetworksWhat?

Connected through Social Relationships

Characteristics Clustering Coefficient (connectedness to neighbors)Centrality (average shortest path length)Geodesic (shortest path length)

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 7: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

7

BACKGROUND

Peer-Review ProcessWhat?

Review scholarly manuscripts

Challenges SlowConflict of Interest Finding Suitable Reviewers

Arbitrary Knowledge ApproachResearch DiversificationEmerging Fields

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 8: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

8

CONTRIBUTIONS

Applicability of Semantics Finding Expertise

Fine Levels of Granularity

Finding ExpertsTaxonomy

Collaboration NetworksDiscovery of Unknown Experts

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 9: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

9

SEMEF

SEMantic Expert FinderFinding Expertise (Expertise Profiles)

Collecting ExpertiseQuantifying Expertise

Finding (Ranking) Expertsw/ and w/o taxonomy

Collaboration NetworksGeodesicC-Nets

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 10: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

10

EXPERTISE PROFILES

Collecting ExpertiseCollect All PublicationMap papers to topicQuantify all papers

Publications DatasetDBLP 473,296 papers (conference/session names - Nov.

2007)ACM, IEEE, Science Direct 29,454 papers (abstracts/index

terms)Combined 476,299 papers

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 11: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

11

EXPERTISE PROFILES

Collecting ExpertisePapers-to-Topics Dataset

Combined (476,299)Topics (320)Relationships (676,569)Expertise Profiles (560,792)

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 12: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

12

EXPERTISE PROFILES

Quantifying ExpertiseMapping each paper to distinct value

Publication ImpactHector Garcia-Molina (248 papers - 2003)E. F. Codd (49 papers - 2003)Citeseer Impact Statistics (1221 venues)DBLP URIs

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 13: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

13

EXPERTISE PROFILES

Figure 1: Expertise Profile

author_A

topic1 (4.50)

paper1

1.54

topic2 (1.86) topic3 (3.08)

paper2 paper3

1.541.10 1.86 1.54

paper4 paper6paper5

1.86

Page 14: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

14

RANKING EXPERTS

Taxonomy of TopicsSession namesConference NamesO’CoMMAPaper AbstractsIndex Terms

Figure 2: Taxonomy of Topics

192

128

320

216

60

50

Page 15: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

15

RANKING EXPERTS

Case 1 Single Topic without Taxonomy

Traverse all Expertise Profiles Sum impact, (papers topics)

Case 2 Single Topic with Taxonomy

Traverse all Expertise Profiles Sum impact, (papers topics, subtopics)

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Prevent Expertise Overestimation

1) Map

2) Papers to leaf nodes only

Page 16: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

16

RANKING EXPERTS

Case 3 Array of Topics without Taxonomy

Same as Case 2

Case 4 Array of Topics with Taxonomy

Filter input topics Sum impact, (papers topics, subtopics)

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 17: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

17

COLLABORATION NETWORKS EXPANSION

Geodesic

Figure 3: Geodesic Relationships

author_A

author_1

author_Bauthor_A

author_B

author_B

author_2author_A

author_B

opus:Article_in_Proceedings_179

opus:Proceedings_543

opus:Article_in_Proceedings_35 opus:Article_in_Proceedings_8

author_A

STRONG

MEDIUM UNKNOWN

WEAK

opus:author opus:author

opus:authoropus:author

opus:Article_in_Proceedings_291

opus:author

opus:Article_in_Proceedings_3

opus:author opus:author

opus:isIncludedIn opus:isIncludedIn

opus:author

Page 18: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

18

COLLABORATION NETWORKS EXPANSION

C-NetOrdering Cluster of ExpertsCollaboration Strength*

* Newman, M. E. J.: Coauthorship Networks and Patterns of Scientific Collaboration. National Academy of Sciences of the United States of America, 1(101): 5200- 5205, (2004).

coauthor_1 {0.73, 0.5}

Super Node {14.80}

coauthor_2 {1.81, 1.0}

coauthor_3 {0.73, 0.5}

coauthor_4 {0.73, 0.5}

coauthor_5 {1.54, 1.0}

coauthor_n {1.1, 0.8}

Figure 3: Geodesic Relationships

Page 19: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

19

RESULTS AND EVALUATION

EvaluationWWW Search Track (2005/6/7)Input Topics Call For PapersSWETO-DBLP Subset (67,366 authors)DBLP (560,792)

ValidationCollaboration Networks Expansion

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 20: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

20

RESULTS AND EVALUATION

Validation

Table 1: Past PC Lists comparison with SEMEF

52%668510-20%

58%200620-30%

65%211430-40%

73%302640-50%

79%211350-60%

82%100460-70%

85%101170-80%

85%000180-90%

85%000090-100%

29/3421/2526/2940/48

Total 89

13

Search2006

84

13

Search 2007

85

12

Average

83

35%10(top) 0-10%

Search 2005

Cumulative Percentage in

PC List

Search Track (Number of PC Members in SEMEF List)

Percentage in SEMEF List

Page 21: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

21

RESULTS AND EVALUATION

Validation

Figure 4: Average Number of PC in SEMEF List

Page 22: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

22

RESULTS AND EVALUATION

Validation

Figure 5: Average PC Distribution in SEMEF List

Page 23: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

23

RESULTS AND EVALUATION

Collaboration Networks Expansion

Table 4: PC Chair – SEMEF List Geodesic Relationships

10141120151731WEAK

2

2

0

Chair2

1

6

3

Chair1

Search2006

0

7

3

Chair1

Search 2007

PC List (Number of Expert Relationships)

EXTREMELY WEAK

MEDIUM

STRONG

Relationships

1

10

2

Chair1

Search 2005

2

7

0

Chair2

00

48

00

Chair2

Above Average Expertise

(in PC)

58576605582608293649WEAK

26

55

3

Chair2

66

88

10

Chair1

Search2006

66

88

10

Chair1

Search 2007

SEMEF (Number of Expert Relationships)

EXTREMELY WEAK

MEDIUM

STRONG

Relationships

99

106

6

Chair1

Search 2005

26

53

2

Chair2

32

1676

343

Chair2

Above Average Expertise

(in PC)

Table 3: PC Chair – PC Member Geodesic Relationships

Page 24: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

24

CONCLUSION

Expertise Profiles Publication Data Publication Impact Statistics Papers-to-Topics Relationships

Ranking Experts w/ and w/o Taxonomy Single and Array of Topics

Collaboration Networks Expansion Semantic Association Discovery Geodesic C-Nets

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 25: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

25

DEMO

Web ApplicationApache Tomcat 6.0Java Server PagesUbuntu 7.10

Delroy CameronMasters ThesisComputer Science, University of Georgia

Page 26: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

26

RELATED WORK

Particle Swarm Algorithm

ExpertiseNets

Expertise BrowserExperience Atoms

Expertise RecommenderChange historyTech Support HeuristicsProfiling, Identification, Supervisor

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 27: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

27

RELATED WORK

Web-Based CommunitiesExpert Rank

Formal Probabilistic ModelsCandidate ModelsDocument Models

RDF-Matcher

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 28: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

28

EXPERTISE PROFILE ALGORITHM

Algorithm findExpertiseProfile(researcherURI, list of publications)create ‘empty expertise profile’

foreach paper of researcher do

get ‘topics’ list of paper (using papers-to-topics dataset)

get ‘publication impact’

if ‘publication impact’ is null do ‘publication impact’ default weight

else ‘weight’ ‘publication impact’ + existing ‘weight’ from expertise profile

if ‘expertise profile’ contains ‘topic’ do update ‘expertise profile’ with <’topic,’ ‘weight’>

else add <’topic,’ ‘weight’> pair to ‘expertise profile’

end

return ‘expertise profile’

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks

Page 29: 1 SEMEF : A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks Delroy Cameron Masters Thesis Computer Science, University of Georgia.

29

RANKING EXPERTS ALGORITHM

Algorithm rankValue(researcherURI, list of topics)

set expertRank to zero

create temp ‘expertise profile’

filter topics

foreach topic in filtered topics list do

get ‘papers’ for this topic (using papers-to-topics dataset)

foreach paper in papers list do if researcher is author do

get ‘publication impact’ as ‘weight’ expertRankValue = expertRankValue + ‘publication impact’

add <’topic,’ ‘weight’> pair to temporary ‘expertise profile’

end if

end

end

return ‘rankValue’

SEMEF: A Taxonomy-Based Discovery of Experts, Expertise and Collaboration Networks