Dr. Virendra C. Bhavsar - Faculty of Computer Science | UNBwdu/ssworkshop/submissions/bhavsar... ·...

Post on 21-Jun-2018

223 views 0 download

Transcript of Dr. Virendra C. Bhavsar - Faculty of Computer Science | UNBwdu/ssworkshop/submissions/bhavsar... ·...

1

Dr. Virendra C. BhavsarProfessor and Director, Advanced Computational Res. Lab.

Faculty of Computer Science

University of New Brunswick (UNB)

Fredericton, Canada

bhavsar@unb.caThanks:

BCS Student: Marcel Ball

MCS Students: Anurag Singh, Jin Jing, Sebastien Mathieu,, Jie Li

PhD Student: Lu Yang

Post-Doctoral Fellows: Dr. Biplab Sarker and Dr. Manish Joshi

Collaborators: Dr. Riyanarto Sarno and Dr. Harold Boley

June 14, 2010

Semantic Matching

2

Virendra C. Bhavsar

• UNB: since 1983; > 35 years of software research

development experience

• Interests: real-time embedded systems, computer

graphics, software engineering, natural language

processing, databases, bioinformatics, parallel computing,

artificial intelligence, …

• Bioinformatics - Canadian Potato Genomics Project

• Atlantic Computational Excellence Network (ACEnet):

~30 million Atlantic Canada project in high performance

computing

• Semantic Matching

3

Outline

• Syntactic Matching

• Semantic Matching

• Semantic Matching: Taxonomy, Ontology and

Partonomy

• UNB Semantic Matching Engines – Applications

• Conclusion

4

Exact String Matching

• Binary result 0.0 or 1.0

Permutation of strings

“Java Programming” versus “Programming in Java”

Number of identical words

Maximum length of the two strings

Example 1

For two node labels “a b c” and “a b d e”, their similarity is:

2

4= 0.5

Syntactic Matching

5

Example 2

Node labels “electric chair” and “committee chair”

1

2= 0.5 meaningful?

• Syntactic Matching does not consider additional

domain knowledge

•Semantic matching techniques are needed for the

above problems

Syntactic Matching

6

Semantic Matching Applications

• Semantic searching, e.g. Google

• e-Business

• e-Learning

• Matchmaking portals

• Information Retrieval

• Web Services

• Information Integration

• Semantic Web

7

Semantic Matching

• Examples

{Car : Truck} {Toyota Corrolla : Toyota Camry}

{Car : Automobile} {Car : Apple}

• Semantic Similarity versus Semantic Distance

Matching of: words, short texts, documents,

schemas/structures, pictures, videos

• Taxonomy

• Partonomy

• Ontology

8

Taxonomy

• Practice and science of classification

9

Ontology

• Domain Ontology: Explicit formal

specifications of the terms in a domain and relations among them

Upper Ontology: Across domains

10

Concept Similarity in a Taxonomy

Given a taxonomy and two

concepts (e.g., A and B),

find the semantic similarity

of the two concepts

A B

Taxonomy

11

{Produce, Green goods} 3.034

{Fruit} 3.374

{Apple} 3.945{Berry} 4.907

{Banana} 5.267

{Boxberry} 7.576 {Cranberry} 6.285

Concept Similarity in a Taxonomy

12

• More and more on-line transactions (e.g. e-Bay, Kijiji, etc.)

• Buyers and sellers input key words and/or specify values

for some product features

• A list of recommended sellers (with product advertisements)

and/or buyers (with product requests) is presented

• Flat representation of products cannot represent the

hierarchical „part-of‟ relationship of product parts

• Match-making is not precise

• Negotiation space is large

Motivation

13

Main Server

User Info

User Profiles

User Agents

Agents

Matcher1 Matchern

To other sites

(network)

Web

BrowserUser

e-Market

• e-business, e-learning …

• Buyer-Seller matching

• Metadata for buyers and sellers

• Keywords/keyphrases

e-Business Applications

14

Programming Techniques

Applicative

Programming

0.6

0.5General

Automatic

Programming

Concurrent

ProgrammingSequential

Programming

Object-Oriented

Programming

Distributed

Programming

Parallel

Programming

0.8 0.50.9

0.7

0.7 0.5

• The taxonomy tree of “Programming Techniques” according

to the ACM Computing Classification System

•Arc Weights

Semantic Matching ─ A Taxonomy Tree

15

Partonomy

• Tree representation for product/service descriptions

• Weights

2002

Car

FordBlack

Make

Color Year

0.3

0.2

0.5

16

Similarity of Buyer and Sellers

buyer seller1

2002

Car

FordBlack

Make

Color Year

0.1

0.1

0.8

2002

Car

FordRed

Make

Color Year

0.05

0.05

0.9

0.925

2002

Car

FordRed

Make

Color Year

0.2

0.2

0.6

seller2

2002

Car

FordRed

Make

Color Year

0.1

0.6 0.3

seller3

0.85 0.65

17

Semantic Matching ─ Local Similarity

• Local similarity measures for leaf nodes

• “Price” type

• “Date” type

• . . .

18

PriceRangeSim ([Bpref, Bmax], [Smin, Spref])

Begin

If Spref <= Bpref similarity = 1.0

else if Bmax < Smin similarity = 0.0

else if Bmax = Smin

similarity =

else

{ MIN = min{MIN, Smin}

MAX = max{MAX, Bmax}

similarity =

}

return similarity

End.

• This algorithm can be easily adapted to the “price”-typed attributes

e.g. “salary range” in job seeking and recruiting e-Market

• Pseudo code of the price-range similarity algorithm

MINMAX

005.0

MINMAX

minmax SB

Semantic Matching ─ Price Matching Algorithm

19

UNB Similarity Engines -

Implementation

• Java Implementation

• Testing on systematically varied cases

20

• eduSource e-Learning project

•Learning Object Metadata Generator: LOMGen

Partonomy Tree Similarity Engine ─

eLearning Application

SimilarityEngine(Java)

Translator(XSLT)

CANLOM(XML)

Prefilter(SQL)

LOMGen(Java)

LOR(HTML)

Enduser

Administrator

user input

prefilter parameters (Query URI)

WOO RuleML file

Recommended results

HTML files

partial CanCore filesCanCore

files

prefiltered CanCore files

WOO RuleML files

DATABASE(Access)

UI (Java)

Keyword Table

Administrator input

(1)

(2)

(4) (5)

(6) (7)

(3)

(8)

(a)

(b)

(c)

Search

Results

21

(si (wi + w'i)/2) (A(si)(wi + w'i)/2)A(si) ≥ si

lom

educational

0.5

general

format platform0.50.50.5

Introduction

to Oracle

t t´

technical0.3334 0.33330.3333

edu-set gen-set tec-set

language

en

title

HTML WinXP

lom

0.1

general

format platform0.90.80.2

Basic

Oracle

technical0.70.3

gen-set tec-set

language

en

title

* WinXP

* : Don’t Care

• Partonomy similarity [Bhavsar et al. 2004]

Fragments of learning object trees [Boley et al. 2005] for learning object

matching (http://www.cs.unb.ca/agentmatcher)

Partonomy Tree Similarity Algorithm

─ Similarity Algorithm

22

• Teclantic protal http://www.teclantic.ca

•ca)

Partonomy Tree Similarity Engine

─ Matchmaking Application

23

Current Work

• Weighted Tree Semantic Tree Similarity Engines

•Semantic searching

• Weighted Graph Similarity Engines

• Multi-core and cluster implementations

• Matchmaking portals

24

Conclusion

• UNB Weighted Tree Similarity Engines

• Semantic Global and Local Matching

• Applications: e-Learning, e-Business, Matchmaking portals, …

• Looking for licensing and adapting the UNB technology to commercial partners

25

Publications

5 Journal papers

10 Conference papers

1 Book Chapter

4 MCS Theses

1 PhD Thesis

26

Looking for a Post-doctoral Fellow

to start working right now!

Thank you !