GBS_Final.ppt
Transcript of GBS_Final.ppt
Mining Fuzzy Spatial Mining Fuzzy Spatial Association Rules from Association Rules from
Image DataImage Data
G. Brannon Smith
Mississippi State University
6 June 2001
ContentsContents
• Introduction
• Motivation
• Brief Background– Fuzzy Relative
Position– Object Co-occurence
• Theory– Aggregate– Traditional
• Experiments• Conclusion• Future Work• Selected References• Acknowledgements
IntroductionIntroduction
• Operating on raster image data = image space
• Images can be partitioned into regions or objects
(groups of like pixels)
• Like objects compose classes
• Would like to know general spatial, i.e.,
directional, arrangement of these
Depends
on DSK
Introduction (cont.)Introduction (cont.)
• Association Rules seem appropriate – but
not made for raster data, so…
• Need an approach for finding generalized
fuzzy association rules on object spatial
relations pulled from image space
MotivationMotivation
• GENERAL: Periodic collection of vast
amounts of data = tedious for human to
analyze
• SPECIFIC: OKEANOS project sponsored
by NAVO collects many seafloor images
• Data Mining/Knowledge Discovery helps
BackgroundBackground
Fuzzy Set Theory (Zadeh)
Fuzzy Relative Position (Bloch)
Association Rules (Agrawal et al.)
Fuzzy Association Rules (Kuok, Fu & Wong)
Spatial Data Mining (Koperski & Han)
Object co-occurrence rules (Ordoñez &
Omiecinski)
Fuzzy Spatial RelationsFuzzy Spatial Relations
• I. Bloch applies fuzzy sets to spatial
relations
• Fuzzy concepts of position: right of is fuzzy
• Morphology (shape & size) has effect…
AR A R
Fuzzy Spatial Relations (cont.)Fuzzy Spatial Relations (cont.)
• Objects described as fuzzy sets (crisp OK)
Ex. µA(x) and µR(x) , x∈S
• Landscape: µα(R)(x) is whole image S in
relation to R in direction α
• Relation: want µA(x) and µα(R)(x) overlap
Fuzzy Landscape (single)Fuzzy Landscape (single)
Test Image Landscape RO#2, α=0
ReferenceObject #2
Background;Empty Space
OO#4H G
F
Membership IntervalMembership Interval
• Bloch algorithm on all points in objects
• Result: 3 stats per relation, M∈[N,Π]:
∑
∏
∈
∈
∈
=
−=
=
SxA
R
ASx
R
ASx
xRxA
A
xxRsA
xxRt
M
N))(()(
1)(Mean
)](1),)(([inf)(Necessity
)](),)(([sup(A)yPossibilitR
αα
αα
αα
µµ
µµ
µµ
membership average)(Mean
membership minimum)(Necessity
membership maximum(A)yPossibilitR
=
=
=∏
A
A
MN
R
R
α
α
α
Captures imprecision
Fuzzy Relation StatsFuzzy Relation Stats
N=0.9959, M=0.9999, Π = 1.0000
N=0.7557, M=0.9079, Π = 1.0000
R A R A
Image Data Mining Image Data Mining
• Ordoñez and Omiecinski have done
preliminary work in image space
• Used Blobworld to convert images to
transactions, objects to item meta-data
• ARM to find simple co-occurrence rules
HypothesisHypothesis
• Unified system of above can be made
• Raster Image data input (K&H)
• Fuzzy Spatial Relation metadata (Bloch)
• Fuzzy Assoc Rule mining (Agrawal et al., KFW)
• Result: useful fuzzy rules describing generalities of object spatial relations
Main ProblemMain Problem
• How to get from Fuzzy Relation metadata tuples (Bloch) to useful rules?
• What are rule forms?
• What are Support and Confidence or analogs thereof?
• Time? Space? Usefulness?
TheoryTheory
• A pre-emptive approach
• By aggregating objects into classes first, can do pseudo-mining right away
• PRO: Few landscapes, small, quick, no mining per se
• CON: lost info (e.g., no more indiv objs)
Fuzzy Landscape (multi)Fuzzy Landscape (multi)
Test Image Landscape RO#7, α=0
ReferenceObject #7
Background;Empty Space
Theory (cont.)Theory (cont.)
• “Class-class” or “Pixel-Pixel” rule form:
• S & C
),,(),(),(
)()()()(,
yxinDirByinCAxinC
ypBcxpAcyx
α⇒∧
∧∧∧∧∀∀
NPNPNPSumS BA ),(=
yPossibilitC
MeanC
NecessityC
M
N
===
ΠNPNPNPMaxSNPNPNPMinS
BA
BA
),(),(
3
2
==
For any pixel x of class A and any given pixel y of class B, it is implied that y is in direction α of x, with some degree of confidence supported by some portion of the (meta) database.
Theory (cont.)Theory (cont.)
• Prev. ex.:
),,0(),(),(
)()()()(,
yxinDirGyinCGxinC
ypGcxpHcyx
⇒∧
∧∧∧∧∀∀
%33.83=S
%100
%99.99
%59.99
===
ΠC
C
C
M
N
1.00000.99990.9959GH0.0
∏MNOC#RC#alpha
%00.50%33.33
3
2
==
SS
Theory (cont.)Theory (cont.)
• More traditional… (aggreg loses obj id)
• Given: relations for all obj pairs in 4 dirs
• 1.
),,(),()()(),()()(,
yxinDirByinCBcyoAxinCAcxoyx
α∧∧∧∧∧∧∃∀
NONONOSumS BA ),(=( )
{ }Π∈
=∑∈ ∈
,,
),,(inDirmax
MNi
NO
yx
CA
Axi
Byi
α
For any object x of class A, there exists some object y of class B, such that that y is in direction α of x, with some degree of confidence supported by some portion of the (meta) database.
Theory (cont.)Theory (cont.)
• Prev. ex. (same source objs):
0.60190.54340.4842G5H30.0
0.82330.75900.6887G4H30.0
0.82420.76030.6904G5H20.0
1.00000.99990.9959G4H20.0
1.00000.99990.9959G5H10.0
0.82330.75900.6887G4H10.0
∏MNOC#OO#RC#RO#alpha
Theory (cont.)Theory (cont.)
),,0()183,()183()()255,()255()(,
yxinDiryinCcyoxinCcxoyx
∧∧∧∧∧∧∃∀
%33.83=S
%11.94
%96.91
%35.89
===
ΠC
C
C
M
N
Object based
Theory (cont.)Theory (cont.)
• 2.
),,(),()()(),()()(,
yxinDirByinCBcyoAxinCAcxoyx
α⇒∧∧
∧∧∧∀∀
formulae C)-(C above usingpixels OR objs with dealcan S''
{ }Π∈+
=∑∑∈ ∈
,,
),,(inDir
MNi
NONO
yx
CBA
Ax Byi
i
α
Theory (cont.)Theory (cont.)
),,0()183,()183()()255,()255()(,
yxinDiryinCcyoxinCcxoyx
⇒∧∧
∧∧∧∀∀
%33.83=oS
%54.84
%36.80
%73.75
===
ΠC
C
C
M
N
object
%50%33.33
3
2
==
o
o
SS
Time Time ⇒⇒ Parallel Parallel
• Landscape generation/Relation extraction independent for given RO,α
• “Embarrassingly Parallel”
• mpiShell by Wooley shortens development time, allows user to exploit parallel
• Not linear: 16CPU → 4×; BUT very useful considering min implementation effort…
Simple Hand Constructed Simple Hand Constructed
3 classes
Hand graphHand graph
)66.9991.6986.15 85.71,)(,,0(),()()(),()()(,
≤≤∧∧∧∧∧∧∃∀
yxinDirGyinClassGCyoFxinClassFCxoyx
)100100100 85.71,)(,,(),()()(),()()(,
≤≤∧∧∧∧∧∧∃∀
yxinDirFyinClassFCyoGxinClassGCxoyx
π
Experiments:Experiments:Synthetic Data Sample GraphsSynthetic Data Sample Graphs
Scatter plots of rules mined from
synthetic images with a
fuzzy spatial relation extractor, using
Obj-Obj rules
Synthetic DataSynthetic Data
• Synthetic Data Generator to produce images with bias – “loaded” images
• Can we extract rules that reflect the bias?
• Regular
• Extended
• Half
Side EffectsSide Effects
• Edge Effect – image edges
• Counterbias – wrong direction
• “Spillover” - other classes benefit
• Probability – bias is NOT a guarantee
Sample random 2 (6 classes)Sample random 2 (6 classes)
R2 graphR2 graph
Sample 4Sample 4
R=G, A=H of 6 classes, bias=90% Extended, α=0
4 graph4 graph
Sample 2Sample 2
R=G, A=H of 6 classes, bias=80% Extended, α=0
2 graph2 graph
Sample 10Sample 10
R=G, A=H of 6 classes, bias=95%, α=0
10 graph10 graph
R=H, A=I
R=J, A=I
HalfHalf
A=I of 6 classes, bias=85% Half, α=0
Half graphHalf graph
SeafloorSeafloor
Seafloor graphSeafloor graph
Seafloor Rule #148Seafloor Rule #148
)10096.4394.50 63.43,(
),,23(
),()()(),()()(,
≤≤
∧Γ∧Γ∧∧Φ∧Φ∧∃∀
yxinDir
yinClassCyoxinClass
Cxoyx
π ΦΓ
ConclusionsConclusions
• Fairly recent discovery of Association Rules (1993) has enjoyed much growth. (Agrawal)
• Expansion into categorical, fuzzy , etc. (Srikant, Kuok/Fu/Wong, et al.)
• Many have done work with Spatial Databases – in Object Space (Koperski & Han)
• BUT…
ConclusionsConclusions
• Preliminary investigation on image object co-occurrence rules by Ordonez and Omiecinski aside…
• Very little work done in Association Rule Mining in (raster) Image Space, esp. fuzzy
• We have endeavored to fill this gap
ConclusionsConclusions
• Used Bloch Fuzzy Spatial Relations as tool for meta-data generation
• Used techniques inspired by (not implemented) Kuok, Fu & Wong
• Showed that we can find interesting and useful rules – both “loaded” and unknown
Future WorkFuture Work
• Better exploitation of fuzzy membership interval
• Application of thresholding typical to most AR to prune low fuzzy values
• Addition of a distance measure attribute
• Exploration of different kinds of rules such as Spatial Relation Co-occurence
SummarySummary
• Introduction
• Motivation
• Brief Background– Fuzzy Relative
Position– Object Co-occurence
• Theory– Aggregate– Traditional
• Experiments• Conclusion• Future Work• Selected References• Acknowledgements
Selected ReferencesSelected References
Agrawal, R., T. Imielinski, and A. Swami. 1993. Mining associations between sets of items in massive databases. In Proceedings of the 1993 ACM SIGMOD Int’l Conferences on Management of Data held in Washington, DC, May 26-28, 1993, 207-216. New York: ACM Press.
Bloch, I. 1999. Fuzzy relative position between objects in image processing: A morphological approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(7):657-664.
Fayyad, U. M., G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.). 1996. Advances in knowledge discovery and data mining. Menlo Parks, CA: AAAI/MIT Press.
Knorr, E. M., and R. T. Ng. 1996. Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE Transactions on Knowledge and Data Engineering 8(6):884-897.
Selected References (cont.)Selected References (cont.)
Koperski, K., J. Adhikary, and J. Han. 1996. Knowledge discovery in spatial databases: Progress and challenges. In Proceedings of the 1996 ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’96) held in Montréal, June 2, 1996, 55-70. IRIS/Precarn.
Kuok, C. M., A. W.-C. Fu, and M. H. Wong. 1998. Mining fuzzy association rules in databases. SIGMOD Record 27(1):41-46.
Luo, J. and S. M. Bridges. 2000. Mining fuzzy association rules and frequency episodes for intrusion detection. International Journal of Intelligent Systems 15(8):687-703.
Ordonez, C. and E. Omiecinski. 1999. Discovering association rules based on image content. Proceedings of the 1999 IEEE Forum on Research and Technology Advances in Digital Libraries held in Baltimore, MD, May 19-21, 1999, 38-49. IEEE.
Selected References (cont.)Selected References (cont.)
Wooley, B. 2000. mpiShell Documentation. http://www.cs.msstate.edu/~bwooley/software/mpiShellDoc.html (Accessed 02 May 2001}.
Zadeh, L.A. 1965. Fuzzy sets. Information and Control 8(3):338-353.
Zimmerman, H.-J. 1996. Fuzzy set theory – and its applications (3rd ed.). Boston: Kluwer Academic Publishers.
AcknowledgementsAcknowledgements
Thanks to…
• Dr. Susan Bridges (Major Professor) for being a great editor of a very long document
• Bruce Wooley for creating mpiShell• Sean Taylor for code review
AcknowledgementsAcknowledgements
• Grants from NAVO Research group based at Stennis Space Center in Bay St. Louis, MS– National Science Foundation Grant
#9818489– ONR EPSCoR Grant N00014-96-1-1276– Naval Oceanographic Office via NASA
Stennis NAS1398033 DO92
URL for Thesis MaterialsURL for Thesis Materials
http://www.cs.msstate.edu/~smithg/thesis/
Includes this presentation, previous presentations (proposal, seminar, etc.), proposal text and thesis
text in PostScript and PDF formats
Questions and Comments?Questions and Comments?
Fuzzy Set TheoryFuzzy Set Theory
• Classical/Crisp set membership is TOTAL
or NULL
• Can describe with characteristic function -
map universe onto {0,1}, a set itself
• OK, for definite sets, e.g. Turing winners
Fuzzy Set Theory (cont.)Fuzzy Set Theory (cont.)
• PROBLEM: imprecise sets such as TALL
• Where is NOT TALL/TALL boundary?
• Zadeh proposed set membership function
µA(x) mapping to [0,1] (interval), so 0.7 OK
• Exact membership function at user
discretion – domain specific
Fuzzy Set Theory (cont.)Fuzzy Set Theory (cont.)
• Classical operator analogs: complement,
cardinality, etc.
• Union & Intersection typically max & min
respectively (there are others)
• Still give proper results for crisp sets
Association RulesAssociation Rules
• Rules of Agrawal et al. usually of form
antecedent X ⇒ consequent Y (s,c)
• XY is set of items in a transaction and
X∩Y = ∅ i.e., disjoint
• Ex. Beer ⇒ Chips (support:3%,conf:87%)
Association Rules (cont.)Association Rules (cont.)
• Notions of support and confidence
• Support = % of ALL transactions with both X & Y - high support = “large”– Measures importance (freq) in database
• Confidence = % of X transactions with Y– Strength of relationship between X and Y
Association Rules (cont.)Association Rules (cont.)
• Rules use binary/boolean attributes
• Ex. “Transaction includes chips/Trans. does
NOT include chips”
• Classical Set Theory
• But what about range data (e.g., Price or
Age)?
Association Rules (cont.)Association Rules (cont.)
• Srikant & Agrawal offer mapping to Quantitative Rules to use range
• Can map values from range to booleans
• Ex. Price = 700/Price ≠ 700
Price ∈[500,999]/Price∉[500,999]
• Still use boolean algorithms
Association Rules (cont.)Association Rules (cont.)
• Kuok, Fu & Wong complaint: interval boundaries (like TALL vs. NOT TALL)
• SOLUTION: Use fuzzy set intervals• [20,25] becomes Young Adult• Attribute values have degrees of
membership in several fuzzy sets
Association Rules (cont.)Association Rules (cont.)
• KFW rule: X is A ⇒ Y is B (s,c)
• A and B are sets of fuzzy sets for attribs
• s,c are fuzzy analogs of supp and conf:
Significance and Certainty
• Weighted by fuzzy vals
Spatial Data MiningSpatial Data Mining
• Koperski & Han leaders adapting General DM to Spatial Data specifically Spatial DB
• Spatial DB stores spatial data, object attribs, does spatial ops, e.g., Spatial Join
• Object Space
• This work does NOT use a Spatial DB
Spatial Data Mining (cont.)Spatial Data Mining (cont.)
• But Koperski & Han do acknowledge Raster Image Data (not in Spatial DB)
• Kind of bridge between strict SDM and Image Processing