Facilitating Interactive Mining of Global and Local Association
Rules Abhishek Mukherji* Elke A. Rundensteiner Matthew O. Ward
Department of Computer Science, Worcester Polytechnic Institute,
MA, USA. *Samsung Research America, CA, USA. Xmdvtool is an open
source multivariate visual analytics tool developed at WPI with a
series of NSF grants over the past 20 years
(http://sourceforge.net/projects/xmdvtool/). This PhD research work
was partly supported by NSF under grants IIS-0812027, CCF-0811510
and IIS-1117139.
Slide 2
Era of Big Data . And we are DRIVING! 11/03/20142 1. Wheres the
Data in the Big Data Wave? Gerhard Weikum, Res. Director at Max
Planck Institute,
http://wp.sigmod.org/?p=786.http://wp.sigmod.org/?p=786 2. Analytic
DB Technology for the Data Enthusiast. Pat Hanrahan, Stanford &
Tableau, SIGMOD12 Keynote Talk. Volume Veracity Variety
Velocity
Slide 3
XmdvTools Efforts Towards This Paradigm Shift 11/03/20143
Visualize Static DataI. Visualize Stream & Sensor Data SNIFTool
& FireStream *Di Yang et al., Interactive visual exploration of
neighbor-based patterns in data streams, ACM SIGMOD10 Demo.
ViStream* II. Visualize Mined Results Visualize Data Records
PARAS/FIRECOLARM
Slide 4
I. Stream & Sensor Data Processing 1. SNIFTool/FireStream:
Discover Patterns in Live Stream [CIKM 08, ICDE Demo 07] 2. JAQPOT:
High Velocity Streams MJoin Exec. [BNCOD 11] Summary of Graduate
Research Works 11/03/20144 CAPE*XMDVTool^ *
http://davis.wpi.edu/dsrg/PROJECTS/CAPE/index.html
^http://davis.wpi.edu/xmdv/index.html III. Scalable Nugget-guided
Hypothesis Testing 1. SPHINX: Evidence-Hypotheses Explor.[CIKM13]
2. Iterative Multi-Evidence-Hypotheses Model II. Interactive Mining
1.PARAS /FIRE [VLDB13, SIGMOD13, CIKM13] 2.COLARM [EDBT14]
Slide 5
PARAS/FIRE: Interactive Visual Support for Parameter
Space-Driven Mining of Global Rules [PVLDB 2013, SIGMOD 2013, CIKM
2013] Joint work with Xika Lin, Christopher Ryan Botaish, Jason
Whitehouse, Elke A. Rundensteiner, Matthew O. Ward Department of
Computer Science, Worcester Polytechnic Institute (WPI), MA,
USA.
Slide 6
Association Rule Mining (ARM) Basics and Support = 40%,
Confidence = 100% RecordIDAgeMarriedNumCars 10023No1 20025Yes1
30029No0 40034Yes2 50038Yes2 R. Agrawal and R. Srikant, Fast
algorithms for mining association rules in large databases, VLDB94.
R. Srikant and R. Agrawal, Mining quantitative association rules in
large relational tables, SIGMOD96. 6 Which customers to target for
multi-car discount promos? 11/03/2014
Slide 7
Motivation for Interactive Mining Data Miner (minsupp, minconf)
{ARs} Improve turnaround times of mining queries. Provide parameter
recommendations. Preprocess data to enable fast interactive mining
experience. Unacceptably long response time. Trial-and-error
iterations. Forced to rerun for each subset. Data Analyst C.C.
Aggarwal and P.S. Yu, A new approach to online generation of
association rules, IEEE TKDE01. C. Hidber. Online Association Rule
Mining, SIGMOD99. B. Nag, P. M. Deshpande, and D. J. DeWitt, Using
a knowledge cache for interactive discovery of association rules,
SIGKDD99. M. Kubat et al., Itemset trees for targeted association
querying, IEEE TKDE03. M. Kaya and R. Alhajj. Online mining of
fuzzy multidimensional weighted association rules. Applied
Intelligence08. Limitations Research Goals 711/03/2014
2. Pre-processing Times C.C. Aggarwal and P.S. Yu, A new
approach to online generation of association rules, IEEE TKDE01. B.
Nag, P. M. Deshpande, and D. J. DeWitt, Using a knowledge cache for
interactive discovery of association rules, SIGKDD99. PARAS
requires ~10% extra offline preprocess time compared with
AdjLatticeRR. 11/03/201422 Rule Generation T5000k = 4 sec Webdocs =
220 sec Confirmed: Cost(Freq. Itemset Generation) >>
Cost(Rule Generation)
Slide 23
FIRE: User Study Questions Stable Region Usage Tests T1: What
are the most prominent rules by support and confidence? T2: Which
settings (out of choice of 4) returns a different set of rules? T3:
Find the common and unique rules for two distinct parameter
settings. Filter/Redundancy Test T4: Find the most frequent
characteristics of edible and poisonous mushrooms. Skyline View
Test T5: Find the parameter settings that produce top-k rules in
the dataset, where k = 20, 50, 100. 22 subjects Mushroom and chess
datasets Cached Rule Miner (CRM) versus FIRE Randomization to
eliminate pre-knowledge 2311/03/2014
Slide 24
Mushroom Dataset: Tasks 1, 2 and 3 2411/03/2014 Overall, FIRE
outperforms the competitor CRM approach such that the users can
achieve similar or better accuracy while having to use
significantly less time for the tasks.
Slide 25
Tasks 4 and 5 2511/03/2014 Overall, FIRE outperforms the
competitor CRM approach such that the users can achieve similar or
better accuracy while having to use significantly less time for the
tasks.
Slide 26
Conclusion Gains of several orders of magnitude when using
PARAS for online processing outweigh the one-time minimal offline
preprocessing time and storage requirements. 2611/03/2014 We
proposed a novel parameter space model, developed optimal
algorithms and designed effective visualizations to facilitate
interactive rule exploration by tackling challenges related to both
computational and visualization aspects of online rule mining. Our
user study establishes usability and effectiveness of the proposed
features and interactions of the FIRE system in facilitating
interactive rule mining.
Slide 27
Recent works at Samsung Research America 27 MobileMiner: Mining
Your Frequent Behavior Patterns On Your Phone V Srinivasan et al.,
ACM UbiComp 2014 (Best Paper Nominee), HotMobile 2013. Mobile
Sequence Miner: Adding Intelligence to Your Mobile Device via
On-Device Sequential Pattern Mining A Mukherji et al., ACM MCSS
Workshop in UbiComp 2014. User Behavior Analysis via On-device
Mobile Sensing Unobtrusively learn sequential patterns of mobile
users Typically, when I am home on Sunday nights, I call my parents
Association rule mining over multi-modal mobile context data
11/03/2014
Slide 28
Thanks Contact me with questions: Abhishek Mukherji Samsung
Research America [email protected] 2811/03/2014