CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

45
CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010

Transcript of CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Page 1: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

CMPS 561Fuzzy Set Retrieval

Ryan Benton

September 1, 2010

Page 2: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Agenda

• Fuzzy Sets• Fuzzy Operations• Fuzzy Set Retrieval• Processing Fuzzy Queries• Set Issues• -level Sets

Page 3: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy Sets

Page 4: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Issue

• Problem– Boolean assumes documents

• Exist in 1 of 2 states• States are Non-Overlapping

– CRISP!

– Other crisp states may exist• Grades: A, B, C, D, F

– Can be in one and not the other.

– Often, exact boundaries are not known• Hot, Warm, Luke-warm, Cool, Chilly, Cold

– When does Hot become Warm?

Page 5: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Issue

• Fuzzy Sets– Informal Definition

• Fuzzy Set is a class with fuzzy boundaries.• Alternatively, fuzzy set is a class with non-crisp

boundaries.

Page 6: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy Sets – Slightly More Formal

• Regular (Crisp) set– Let X = {xi} be the universe of objects of interest.

• Fuzzy Subset of X A.– A is a set of order pairs

• A = { < x, (x)> }

– (x) Grade of membership of x in A

– : X [0,1]

Page 7: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Example 1

• Assume Crisp Set is– X = {x1, x2, x3, x4}

• Fuzzy Subsets– A = {<x1,1.0>,<x2,0.5>< x3,0.8>}

– B = {<x2,0.8>}

Page 8: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Example 2Membership Functions

4’6” 6’

short medium tall

1

0

1

0

.75

.2

4’

tall4’medium(4’)=0.2, short(4’)=.75

Crisp

Fuzzy

Page 9: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy Operations

Page 10: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Operations 1

• Let– A = { < x, A(x)> }

– B = { < x, B(x)> }

• Equality– A = B iff

• A(x) = B(x), for all x X.

Page 11: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Operations 2

• Containment– A B iff

• A(x) B(x), for all x X.

• Complement– Ā = { < x, Ā(x)> } iff

• Ā(x) = 1 - A(x), for all x X.

Page 12: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Operations 3

• Union– A B = {< x, AB(x)> } iff

• AB = max(A(x), B(x)), for all x X.

• Intersection– A B = {< x, AB(x)> } iff

• AB = min(A(x), B(x)), for all x X.

Page 13: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy Set Retrieval Model

Page 14: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Document Representation Boolean

• Relation– D = { < d, ti, D(d, ti)> }

– D: D x T {0,1}

– D(d, ti)

• 1, if d contains ti

• 0, otherwise

• Dt = {d D | D(d, t) = 1}

• d Dd = {ti T | D(d, ti) = 1}

Page 15: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Document RepresentationFuzzy

• D = { < d, ti, D(d, ti)> }

• D: D x T [0,1]

• D(d, ti)

– The degree of membership of ti to d

• Dt = {<d, D(d, t)> | d D >}

• Dd = {<ti, D(d, ti)> | ti T>}

Page 16: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Processing Boolean Query (Method 2)

• Dt1t2 = {d D | D(d, t1) D(d, t2) = 1}

• Dt1t2 = {d D | D(d, t1) D(d, t2) = 1}

• Dt = set of Documents containing term t

• T = {a,b,c,d,e}

• Da, Db, Dc, Dd, De,

Page 17: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Processing Fuzzy Query (Method 2)

• Dt1t2 = {<d , D(d, t1) D(d, t2) > | d D }

• Dt1t2 = {<d , D(d, t1) D(d, t2) > | d D }

• Dt = set of <Document, membership value> pairs containing term t

• T = {a, b, c, d, e}

• Da, Db, Dc, Dd, De

Page 18: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Retrieval FunctionBoolean

• RSV F

• RSVt(d) = D(d, t)

• RSVe(d) = 1 - RSVe(d)

• RSVe1e2(d) = RSVe1(d) RSVe2(d)

• RSVe1e2(d) = RSVe1(d) RSVe2(d)

Page 19: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Retrieval FunctionFuzzy

• RSVt(d) = D(d, t)

• RSVe(d) = 1 - RSVe(d)

• RSVe1e2(d) = min(RSVe1(d),RSVe2(d))

• RSVe1e2(d) = max(RSVe1(d),RSVe2(d))

Page 20: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Processing Fuzzy Queries

Page 21: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Term-Document Incidence Matrix

d1 d2 d3 d4 d5 d6 d7 d8

t1 0.1 0.7 0.6 0.4 0.8 0.6 0.3 0.6

t2 0.3 0.8 0.9 0.8 0.6 0.5 0.7 0.2

t3 0.8 1 0.2 0.6 0.8 0.2 0.8 0.5

t4 0.4 0.6 0.7 0.5 0.7 0.3 0.1 0.9

t5 0.7 0.2 0.8 0.4 0.6 0.2 0.8 0.4

Page 22: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4

Page 23: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example(Method 1)

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.10.50.6

0.70.30.2

0.70.50.6

Page 24: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Inverted File Processing(Method 2)

• Dt = {<d, Dt(d)> | Dt(d) > 0}

• De = {<d, 1-RSVe(d)> | 1-RSVe(d) > 0}

• De1e2 = {<d, RSVe1e2(d)> | RSVe1e2(d) > 0}

• De1e2 = {<d, RSVe1e2(d)> | RSVe1e2(d) > 0}

Page 25: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Processing Fuzzy Query (Method 2)

Output

• Dt

• De1

• De2

• De1 De2

• De1 De2

• D\De1

Query• t• e1• e2• e1e2• e1e2• e1

Page 26: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Set Issues

Page 27: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Set Identities

• A B = B A

• (A B) C = A (B C)

• A B C) =

(A B) C)

• A = A

• A A’ = S

• A B = B A

• (A B) C = A (B C)

• A B C) =

(A B) C)

• A S = A

• A A’ =

Page 28: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Definitions of Union and Intersection

Name Union Intersection

max, min max, product ·

probabilistic sum, product ·

Einstein

bold

Hamacher

Yager

Schweizer-Sklard ⊥p ⊤p

+ ·

+

v

v

Page 29: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Operations – Union & Intersection

• a · b = ab• a b = a + b – ab• a b = (a + b) / (1 + ab)• a b = (ab) / ( 1 + (1-a)(1-b)• a b = min(a + b, 1)• a b = max(0, a + b - 1)

+

Page 30: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Operations – Union & Intersection

• a b = (a + b – (1 – )ab) / (1- ab– ∞)

• a b = (ab) / (a+b– ∞)

• a b = min(1, (av + bv)(1/v))– v∞)

• a b = 1 - min(1, [(1-a)v + (1-b)v](1/v))– v∞)

+

·

v

v

Page 31: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Operations – Union & Intersection

1 – ((1-a)-p + (1-b)-p -1)(1/p) p > 0

a b p = 0

1 – ((1-a)-p + (1-b)-p -1)(1/p) p < 0,(1-a)-p + (1-b)-p >1

1 p < 0,(1-a)-p +(1-b)-p ≤ 1

aT

p b =

Page 32: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Operations – Union & Intersection

(a-p + b-p -1)(1/p) p > 0

ab p = 0

(a-p + b-p -1)(1/p) p < 0,a-p + b-p >1

0 p < 0,a-p +b-p ≤ 1

a Tp b =

Page 33: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Example of Set Operations

• Assume – Union = Probabilistic Sum

• = a b = a + b – ab

– Intersection = Product• = a · b = ab

Page 34: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Term-Document Incidence Matrix

d1 d2 d3 d4 d5 d6 d7 d8

t1 0.1 0.7 0.6 0.4 0.8 0.6 0.3 0.6

t2 0.3 0.8 0.9 0.8 0.6 0.5 0.7 0.2

t3 0.8 1 0.2 0.6 0.8 0.2 0.8 0.5

t4 0.4 0.6 0.7 0.5 0.7 0.3 0.1 0.9

t5 0.7 0.2 0.8 0.4 0.6 0.2 0.8 0.4

Page 35: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4

Page 36: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example (Method 1)Max, Min

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.10.50.6

0.70.30.2

0.70.50.6

Page 37: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example (Method 1)probabilistic sum and product

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.060.30.48

0.5040.120.09

0.53160.3840.4998

Page 38: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Differences

Document Max, Min Prob. Sum, Product

d3 0.7 0.5316

d6 0.5 0.384

d8 0.6 0.4998

Page 39: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

-level Sets

Page 40: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

-level Sets - Definitions

• -level Set– D() = {<d,t> | D (d, t) ≥ }– Dt() = { d | <d, t> D ()}

• Then meaning of Dt() of descriptor tTT* – Dt() = {<d, Dt()

(d) = Dt()(d) > | d Dt()}

Note: T* has same meaning as E used in Boolean IR

• If t’T*, then meaning of Dt() of descriptor t =t’– Dt() = {<d, Dt()

(d)=1-Dt’()(d)> |

1-Dt’()(d) > ,d Dt’() }

Page 41: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

-level Sets - Definitions

• If t’, t’’T*, then meaning of Dt() of descriptor t = t’ t’’– Dt() = {<d, Dt()

(d)=min(Dt’()(d), Dt’’()

(d))> |d Dt’() Dt’’()

• If t’, t’’T*, then meaning of Dt() of descriptor t = t’ t’’– Dt() = {<d, Dt()

(d)=max(Dt’()(d), Dt’’()

(d))> |d Dt’() Dt’’()

Page 42: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4

Page 43: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example Max, Min

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2

0.90.50.2

0.20.20.5

0.70.30.9

0.10.50.8

0.80.80.5

0.10.50.6

0.70.30.2

0.70.50.6

Page 44: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Fuzzy example Max, Min, *=0.3

q = (t1 t2) (t2 t3 t4))

t1

t2

t2 t3t4d3 0.6

d6 0.6d8 0.6

0.90.50.2 (0)

0.90.50.0

0.2 (0)0.2 (0)0.5

0.70.30.9

0.1 (0)0.51.0

1.01.00.5

0.0 0.50.6

0.70.30.0

0.70.50.6

Page 45: CMPS 561 Fuzzy Set Retrieval Ryan Benton September 1, 2010.

Questions?