Event Mining From Distributed E-commerce Systems · Merge events with the same period length level...
Transcript of Event Mining From Distributed E-commerce Systems · Merge events with the same period length level...
![Page 1: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/1.jpg)
Event Mining From Distributed E-commerce
SystemsSheng Ma
IBM T.J Watson Research2004
![Page 2: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/2.jpg)
Contributors
Chang-shing PerngGenady GrabarnikMark BrodieIrina RishJoseph HellersteinDavid Thoenen (IBM ATS)Alina BeygelzimerRicardo Vilalta (Univ. of Huston)Carlotta Domenicon (GMU)Tao Li
![Page 3: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/3.jpg)
������������� �������������������
# Tr
ansi
stor
s pe
r C
hip
x
x
x
x
x Logic Density
1GHz
1980 1990 2000
1MHz 10MHz
100MHz
2010
x10GHz
1010
109
108
107
106
105
104
103
1011
���������������������� �����������
1980 1990 2000 20100.001
0.01
0.1
1
10
100
1000
10000
Are
al D
ensi
ty (
Gb
/in2)
Lab demos (year 2001) at 106 Gbit/in2
25% CGR 60% CGR
100% CGR
Progress from breakthroughs, including MR, GMR heads, AFC media
Products
Lab Demos
1980 1985 1990 1995 2000 2005 20100.0001
0.001
0.01
0.1
1
10
100
1000
Pri
ce/M
Byt
e (D
olla
rs)
HDD
DRAM
Flash Paper/Film
Range of Paper/Film
3.5 " HDD
2.5 " HDD
1 " Microdrive
Flash
DRAM
Since 1997 raw storage prices have been declining at 50%-60% per year
Storage Density Storage Price
Director Director and Security Servicesand Security Services
ExistingExistingApplicationsApplications
and Dataand Data
BusinessBusinessDataData
DataDataServerServer
WebWebApplicationApplication
ServerServer
Storage AreaStorage AreaNetworkNetwork
BPs andBPs andExternalExternalServicesServices
Inte
rnet
Fir
ewal
lIn
tern
et F
irew
all
WebWebServerServerDNSDNS
ServerServer
DataData
Cac
he
Cac
he
Loa
d B
alan
cer
Loa
d B
alan
cer
Inte
rnet
Fir
ewal
lIn
tern
et F
irew
all
Dozens of systems and applications
Hundreds of components
Thousands of tuning parameters
���������������������������������
![Page 4: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/4.jpg)
Self-protectingSelf-protectingAnticipate, detect, identify, and protect against attacks from anywhere
Self-configuringSelf-configuringAdapt automatically to the dynamically changing environments
Self-healingSelf-healingDiscover, diagnose, and react to disruptions
Self-optimizingSelf-optimizingMonitor and tune resources automatically
Autonomic Computing
���������� ������������ ����
![Page 5: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/5.jpg)
Informaiton Management is key for high performance and avialability
Event management system (e.g. Tivoli's product)- Correlation rules are the key - Example: rebooting signature: link_down is followed by link_up in five seconds => filteringlink_down without link_up => an operator should be paged (avialability problem)
CorrelationRules
Event Adapter
Eevent Database
Event Correlation
Sensor
Sensor
Sensor
Sensor
Data collection
Correlationanalysis
![Page 6: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/6.jpg)
IBM T.J. Watson
© 2004 IBM Corporation
A custom er environm ent
db2
db2
M Q/W AS
Tuxedo
C ognos
Many data sources, many boxes, a lot of data2GB raw data from Linux, DB2, WAS, MQ, TEC
Tier 1 Tier 2 Tier 3Userransactionrequest
![Page 7: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/7.jpg)
• Title/subtitle/confidentiality line: 10pt Arial Regular, white • Copyright: 10pt Arial
Research
© 2003 IBM Corporation
Optional slide number:
Problem motivation: Event Mining
Rule-based engine can be used for processing such events in real-time
Pain-points: - How to build knowledge
(or correlation rules)- How to identify problems- How to tune Current approach: knowledge-
basedOur approach: mine historical events to
identify repeated event patterns (sequences)
Review such patterns are reviewed with domain experts to identify causes and define action (e.g. filtering, paging)
Programmatically translatepatterns in to correlation rule or symptom DB
Network down pattern
Restart sequences:
node down, link down followed by node up, link up in 2 minutes
![Page 8: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/8.jpg)
Event Mining to discovery rules for automated run time operation
On-line: monitoring
Rule MatchingEvent/Pattern
Filter/throttling
Correlation
Modify events
Display
Discard
Database Rules
Raw events
Run an application
Off-line: analysis
EventBrowser
EventMiner
EventAnalyzer
�� ��� � �
DataAccess
Statistical importanceDomain importance
Reports
ERNs Domain expertise
ERN Validations
LogsLogs Rules
![Page 9: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/9.jpg)
Alarms in a large computer system
��������������� �����������
������������������������
�����������������������
������������������
Hosts
Alarms in computer system managementExamples: router is down at 9:00am; cpu_utilization of a file server is above 90% Many types of patterns: periodic, temporal dependency, etcPatterns signify problemsHow to discover such patterns; how to perdict
![Page 10: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/10.jpg)
Value of Event Mining
Mining many logs collected from different locationsIdentify system problems
Configuration problemDesign problem
Discover essential rules for run-time monitoring and automated operation
Discover filtering rulesDiscover correlation rules
Problem diagnosis tool and root cause analysisPredictive analysis to avoid problems if possible
![Page 11: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/11.jpg)
Research areas
VisualizationFast ordering of large categorical datasets for better visualization, A. Beygelzimer, C. Perng, and S. Ma, KDD 2001.
Ordering categorical data to improve visualization, S. Ma and J.L. Hellerstein, InfoVis99, July 1999.
Pattern discoveryMining Partially Periodic Event Patterns with Unkonw Periods, Sheng Ma, Joseph L Hellerstein, ICDE 2001. Mining Mutual Dependent event patterns, S. Ma and J.L. Hellerstein, ICDM 2001. Discovering fully dependent patterns, Feng Liang, Sheng Ma and J.L. Hellerstein, SIAMDM 2002. FARM: a framework for exploring mining spaces with multiple attributes, C. Perng, H. Wang, S. Ma and J.L. Hellerstein, ICDM 2001
PredictionRule Induction of Computer Events, Ricardo Vilalta, S. Ma and J.L. Hellerstein, DSOM 20001. Local Predictions in Event Sequences Using Asociations and Classification. , R. Vilalta and S. Ma, ICDM 2002. A classification approach for prediction of targetted events in temporal sequences, C. Domeniconi, C.S. Perng, R. Vilala, ECML 2002.
ClusteringProfiling
![Page 12: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/12.jpg)
Tools and Applications
ToolsEvent Browser
EventBrowser: exploratory analysis of event data for event management, S. Ma and J.L. Hellerstein, DSOM 1999.
Event MinerProgressive and interactive analysis of event data using event miner, S. Ma, J.L. Hellerstein, C. Perng and G. Grabarnik, submitted 2002
ERN CVCData-driven validation, completion and construction of event relationship networks, C.S. Perng, D. Thoenen, S. Ma and J.L. Hellerstein, KDD 2003.
ApplicationsSystem management
Discovering actionable pattern from event data, J.L, Hellerstein, S. Ma and C. Perng, to be appeared in IBM system journal on AI. Scalable Visualization of EPP Data, David J. Taylor, Nagui Halim, Joseph L. Hellerstein, and Sheng Ma, DSOM 2000
IDSA data mining system for network-based intrusion detection system, M. Mei, D. George, M. Brodie and S. Ma, in preparation.
BlueLightCritical Event Prediction for Proactive Management of Large-Scale Computer Clusters
![Page 13: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/13.jpg)
Parser
Parsing rules
Textual format
Table format:timestamp, event type, host name, severity, interest, importance
Domain knowledgeInterest/importace rating for each events and host
First step: normalization
DB2 tables
![Page 14: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/14.jpg)
������������������
�����������������������
�����������
![Page 15: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/15.jpg)
Event Predition
![Page 16: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/16.jpg)
Event Prediction
����
������������ ���������������������
���������������������������������������������������������������������������������
������������������������������ ��������������������������
Problem
Event Data
Technical challenges
![Page 17: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/17.jpg)
Classification-Based Prediction
����
������������ ���������������������
!!"# !"#
���$���������������������%�������
�����������
&��������������'()�����������������!�������*���������������'()�����������������+�
,�����$'������-�����������������������������������������.�������������/����0��
Range of accuracy:65%-93%
Sample results
Training Data
������������&W
with Carlotta Domeniconi and Charles Perng, ECML 2002
![Page 18: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/18.jpg)
• Title/subtitle/confidentiality line: 10pt Arial Regular, white • Copyright: 10pt Arial
IBM T.J. Watson
© 2004 IBM Corporation
Optional slide number:
![Page 19: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/19.jpg)
IBM T.J. Watson
© 2004 IBM Corporation
Rule-based prediction
False negatives and false positives through rule-based prediction
100 200 300 400 500 600 700 800
Time window before predictions
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pro
bab
ility
of e
rro
r
Different test and training data
Same test and training data
100 200 300 400 500 600 700 800
Time window before predictions
0
0.05
0.1
0.15
0.2
0.25
0.3
Pro
babi
lity
of
err
or
Different test and training data
Same test and training data
![Page 20: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/20.jpg)
Pattern-Based Algorithms
����
������������ ������������ ���$������������������
���������
������������&
Event Data
WFrequent patterns
Predictive patterns
Prediction rules
��!�������
Frequent pattern search
Hypothesis test
Significance weighting
with Ricardo Villata ICDM 2002
![Page 21: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/21.jpg)
Data characteristicsinfrequent, important patternsnoisy environmentskewed distribution: 20/80 rulesdata volume is large
HostTypeEventType
patterns
patterns
ChallengesPattern definitionsHow to find all qualified patternsScalability: data volume and search
space
![Page 22: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/22.jpg)
����
� � � �� � � � � � � ! !� �
frequent patterns may not be significant
���
"���������������������
#����
� � $! �� �
��������
#����
� %�! ��! �&&&&
'(�)!*�+�'(�!*,'(!*�+��,�-�!+��'(!)�*�+�-��+�!.�!/������'������.�/�������0����1��2����
![Page 23: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/23.jpg)
Issues associated with frequent patterns
Frequent patterns: Find all patterns whose count is above a thresholdFrequent may not be significant
item a occurs frequentlymany false patterns related to a
Infrequent, but important patternsLow minsup => too many false patterns
Long pattern issuesSay, a pattern of 15 items happens 15%, but each item may be missing with a small probability 5%The chance to see an instance of a full pattern is slim => many subpatterns (if minsup=10%, 6435 qualified, maximal subpatterns)
Need a better measurement!
![Page 24: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/24.jpg)
P-patternICDE 2002
![Page 25: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/25.jpg)
����� � � �� �
3��������� 3���������311��������
�+% �4δ'
5����Noisy partial periodic point process
Random eventsMissing events
A stream of points
![Page 26: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/26.jpg)
Partially periodic temporal association
p-patternA set of items is called p-pattern with window size w, minsup, and time tolerance δ
It frequently occurs together (i.e. frequent temporal pattern, Mannila97) within w (> minsup)It occurs partially periodically with δ
![Page 27: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/27.jpg)
����� � � �� �
� � � � � ! ! ! ! ! !� � � � �� ��
w = 2, δ = 1, minsup = 3
![Page 28: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/28.jpg)
How to discover p-patterns
Two tasksNeed to figure out period lengthsNeed to figure out frequent itemset (>minsup)
IdeasStatistical testing on inter-arrival time to determine period lengthMerge events with the same period length level by level
{a,b} is likely to be a p-pattern, if a is p-pattern and is p-patternreduce search thorough pruning based on current results
ICDE01
![Page 29: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/29.jpg)
One More Problem
Problem: non-uniform counts High count for a small interval even for random eventsLow count for a high interval may indicate periodic behavior
Example:1000 events generated uniformly and randomly in [0, T]
pmfigure1.eps
Cτ
τ
ConstantThreshold
![Page 30: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/30.jpg)
Hypothesis test for inter-arrivals in [τ-δτ/2, τ+δτ/2]
Compared with the expected occurrences of a random processκτ = (Cτ - N*Pτ)/sqrt(N*Pτ*(1-Pτ))
Cτ: # of observations of inter-arrivals in [τ-δτ/2, τ+δτ/2]
Inter-arrival τ, and tolerance δτ
N*Pτ: expected observations for a random processN: # of samples in T time windowPτ: probability that an inter-arrival is [t-δt/2, t+δt/2] for a random processIf random process is Poison,
Pτ ~ N/T * δτ* exp(-N/T*τ)
Statistical confidence, 95% => κ�τ = ����(3.84)��τ = κ�τ∗�2��(56�6(���**�4�56�ττ�������������������������� ��τ!���τ
������������� ����������� ������ ��������������
![Page 31: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/31.jpg)
Noisy environmentHow much noise can be tolerant?
N/S = #noise events/# signal events
pm2figure.epsPerformance of our algorithm
"�#�����#���������������������������������τκ = �κ − �κ−1
τ�κ = �κ − �κ−2
Ο(2Κ)
Successrate
N/S
![Page 32: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/32.jpg)
Period-first algorithm
Inputs: D, confidence 95%, minsup, �����$��������$��������%�������������&��������'����������������������������������������� ������������'���������������������'�������������(���������� �(������������#����)����#�������������
![Page 33: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/33.jpg)
Definition: Mutually Dependent Pattern (m-pattern)
Intuition: a set of items happen together with high probabilityE2 strongly depends on E1
E1=>E2P(E2|E1) = count(E1+E2)/count(E1)
Formal definition of m-patternE is a m-pattern with minp, iff P(E2|E1)>minp is held for any two non-overlapped subsets of E
Special case: for minp=1, a m-pattern becomes deterministic. Event compression, event correlation, etc.
![Page 34: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/34.jpg)
����
� � � �� � � � � � � ! !� �
frequent patterns != m-patterns
���
"���������������������
#����
� � $! �� �
��������
#����
� %�! ��! �&&&&
'(�)!*�+�'(�!*,'(!*�+��,�-�!+��'(!)�*�+�-��+�!.�!/������'������.�/�������0����1��2����
![Page 35: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/35.jpg)
Final algorithmInput: minp and DOutput: all qualified m-patterns {L_k}
Count each itemC_2 all pairwise candidates; k=2Prune C_k based on upper boundCounting: Scan D and count patterns in C_kQualification: Compute the qualified m-patterns L_kConstruct the new candidate set C_{k+1} based on L_{k} if C_{k+1} is empty, output L_k and terminatek=k+1; go back to (4)
Algorithm
![Page 36: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/36.jpg)
"7��8���!���������������!�������79�������������������������(�0��*���!��������!����1�'��������
![Page 37: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/37.jpg)
:;���<�����'���������8������'+�&$-�=��1��1��8���8������''�����������8�������'�����������������1����'���������8���������'�������������1��2����
![Page 38: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/38.jpg)
Fully dependent patternSIAMDM 2002with Feng Liang
![Page 39: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/39.jpg)
Issues associated with frequent patterns
Frequent patterns: Find all patterns whose count is above a thresholdFrequent may not be significant
item a occurs frequentlymany false patterns related to a
Infrequent, but important patternsLow minsup => too many false patterns
Long pattern issuesSay, a pattern of 15 items happens 15%, but each item may be missing with a small probability 5%The chance to see an instance of a full pattern is slim => many subpatterns (if minsup=10%, 6435 qualified, maximal subpatterns)
Need a better measurement!
![Page 40: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/40.jpg)
Another idea
Statistical testderived from a statistical modeltreatment for infrequent, noise
But,No closure propertyHigh-order dependency is complex
Fully dependent patternClosure property for efficiencyDistinguishing different types of dependency
![Page 41: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/41.jpg)
Hypothesis test
Given observation C(E), is itemset E independent?Hypothesis test:
Given a significant level a (probability of false negative), the dependency test:
How do we compute minsup(E) (e.g. Ca)?
Exact: slowApproximation!
![Page 42: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/42.jpg)
Compute Ca
When np* is large, central limit theory can be applied
Za = 1.64 for a = 5%
How large is large?A guideline: np*>5
![Page 43: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/43.jpg)
How about rare patterns?
when p* is extremely small (np*<5), Poisson distribution provides good accuracy
![Page 44: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/44.jpg)
![Page 45: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/45.jpg)
Practical solution to test a pattern
When C(E) > 5, use normal approximationWhen C(E) <= 5, use Poisson approximationIf E passes the test, E is dependent
![Page 46: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/46.jpg)
How to discover all dependent patterns?
Dependence test is not downward nor upward closedDependence can be quit complex
![Page 47: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/47.jpg)
D-Patterns
Define a new type of dependency and enforce closure propertyA itemset E is fully dependent, iff all subsets of E are dependent (D-pattern)Rationales
Efficiency/feasibilityWhile, it may be more informative ...
Model 1: {ab, bc}Model 2 {abc}Model 3: {ab}So, we can distinguish different types of dependency
![Page 48: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/48.jpg)
Algorithm
Levelwise search strategy similar to that for frequent patterns(1) Form candidate patterns with length k based on all qualified patterns of length k-1(2) Scan data to obtain counts for each candidate(3) D-pattern test for each pattern to find qualified d-patternsIterate until no more patterns
Closure property: if {a,b} is not a d-pattern, so are all its supersetThe qualification threshold varies
C(E) > minsup(E)Recall
minsup(E) takes into the considerations of the length of Edistribution of itemsbetter noise resistance
*
![Page 49: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/49.jpg)
Some remarks
Level-wise, PF, depth-first and more can be applied to discover all d-patternsCan be combined with minsup
If qualification function Qi(E) are downward close, so are the conjunction and disjunction of {Qi}Conjunction: support > minsup and d-pattern condition
Skip patterns that only occur once or twiceSimiliar strategy for negative correlation
![Page 50: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/50.jpg)
Experiment
Model: noisy items + instances of true patternsNoise/signal ratio: #noise/#instancesNSR=0: f-pattern and d-pattern are equally goodIn the presence of noise,
![Page 51: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/51.jpg)
NSR
CPU ~ #false patterns
f-pattern
d-pattern
![Page 52: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/52.jpg)
SNR=20 SNR=10 SNR=5
Za
True patternsfound (%)
Normalizedfalse patterns
![Page 53: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/53.jpg)
Uneven factor: one group is u times more
CPU
![Page 54: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/54.jpg)
f-pattern: minsup=10 over 1k maximal patterns; minsup=3 30k for the third level
![Page 55: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/55.jpg)
���'�������'������� �'�����������������1������
�������������������� ��� �� � ���� ��������������
�������� ������� ��
>��2���������!�������'��������������7?�������1����������#�������!�'�������1��2�����'�������#���������1��!����'�����!�
��'��������'������
���������
t-pattern
Statistical test
![Page 56: Event Mining From Distributed E-commerce Systems · Merge events with the same period length level by level {a,b} is likely to be a p-pattern, if a is p-pattern and is p-pattern reduce](https://reader034.fdocuments.in/reader034/viewer/2022050521/5fa4db83af06ab4cae672faa/html5/thumbnails/56.jpg)
Backup