Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T....

93
Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment of the Requirement for the Degree of Master of Science in Artificial Intelligence-Computer Engineering 1

Transcript of Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T....

Page 1: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Artificial Immune based Approach to Association

Rule Mining

By: B. Hoda HelmiSupervisor: Adel T. RahmaniJanuary 2008

A Thesis Submitted in Partial Fulfillment of the Requirement for the Degree of Master of Science in Artificial Intelligence-Computer Engineering

1

Page 2: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Outline

The Immune System Natural

and Artificial

Association Rules

Web Usage Mining

Proposed Algorith

mAISWUM

Results and

Conclusion

2

Page 3: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Natural Immune System

Immune System

• A system that protects the body from foreign substances and pathogenic organisms.

Antibody

• The immune system creates antibodies which match the antigens and cause the pathogens to be destroyed

Antigen

• Substances capable of starting a specific immune response are referred to as antigens (viruses, bacteria, fungi).

3

Page 4: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

A High Level Overview4

Page 5: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Natural Immune System

Immunity

Innate

Danger Theory

Adaptive

Clonal

Selection

Network

Theory

Affinity Maturation

Hyper

mutatio

n

5

Page 6: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Innate versus Adaptive IS

Innateimmediately available for combat

6

Page 7: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Adaptive Immunity

epitope

Low affinity

receptor

structurally similar – high affinity

7

Page 8: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Clonal Selection &Affinity Maturation

8

Page 9: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Network Theory

1

2

3

Ag

Stimulation (Positive Response)

Suppression (Negative Response)

Idiotypic network (Jerne, 1974):B cells stimulate each other.Creates an immunological memory

9

Page 10: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Danger Theory10

Page 11: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Artificial Immune System

Algorithms

Affinity

Representation

Application

Solution

AIS

A Framework

for A

IS

11

Page 12: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Association Rules

Set of items: I={I1,I2,…,Im}Transactions: D={t1,t2, …, tn}, tj IItemset: {Ii1,Ii2, …, Iik} ILarge (Frequent) itemset: Itemset

whose number of occurrences is above a threshold.

Support of an itemset: Percentage of transactions which contain that itemset.

12

Page 13: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Given:a set of items I={I1,I2,…,Im},a database of transactions

D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij I,

The Association Rule Problem is to identify all association rules X Y with a minimum support and confidence.

13

Page 14: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Association Rule Mining Steps

Find Frequent Itemsets.

Generate rules from frequent itemsets.

Challenging Step In Association Rule Mining

14

Page 15: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Goal

In this project our goal is to find all the

in

using

frequent itemsets

Web usage data

artificial immune system

15

Page 16: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Web Usage Mining

Web usage mining also known as Web log mining

Mining techniques to discover interesting usage patterns from the secondary data derived from the interactions of the users while surfing the web.

16

Page 17: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Web Usage Mining

Applicatio

ns

•Target potential customers for electronic commerce•Enhance the quality and delivery of Internet information services to the end user•Improve Web server system performance•Identify potential prime advertisement locations•Facilitates personalization/adaptive sites•Improve site design•Fraud/intrusion detection•Predict user’s actions (allows prefetching)

17

Page 18: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Motivations(of choosing this application)

Web

Unstable

Noisy

Enormous

Distributed Data

18

Page 19: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

WUM-Definitions

Web Logs

• Set of all accessed to URLs of a Web site that is stored in Web server

Session

• A sequence of URLs that are accessed by a user in one visit of Web site. (Itemset)

Strong trend

• crowded paths that frequently are traversed by users. (Frequent Itemsets)

19

Page 20: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Web Log

O:0000002560 || T:1997/09/12-22:43:00 ||U:/ || R:http://www.hyperreal.org/

O:0000002560 || T:1997/09/12-22:50:27 || U:/categories/software/ || R:http://www.hyperreal.org/music/machines/

O:0000002560 || T:1997/09/12-22:50:38 || U:`/categories/software/Windows/ || R:http://www.hyperreal.org/music/machines/categories/software/

O:0000002560 || T:1997/09/12-22:50:47 || U:/categories/software/Windows/V909V03.TXT || R:http://www.hyperreal.org/music/machines/categories/software/Windows/

O:0000002560 || T:1997/09/12-22:51:06 || U:/categories/software/Windows/ || R:http://www.hyperreal.org/music/machines/categories/software/

20

Page 21: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Session Construction

URLS IDX

/ 0/categories/software/

1

/categories/software/Windows/

2

/categories/software/Windows/V909V03.TXT

3

/categories 4/manufacturers 5/samples.html/ 6/gearlists/ 7/features/ 8/ecards/ 9

1 1 1 1 0 0 0 0 0 007:27

00:11

02:10

00:19

02:01

00:00

00:00

00:00

00:00

00:00

1 1 2 1 0 0 0 0 0 0

Duration

Frequency

eVisitedPagPagePagesitsNumberOfVi

PagesitsNumberOfViPageFrequency

))((

)()(

))(/)((max

)(/)()(

PageLengthPageionTotalDurat

PageLengthPageionTotalDuratPageDuration

eVisitedPagpage

21

Page 22: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Representation

Antibody: (strong trends)

Antigen: (incoming sessions)

URL1(0/1)

URL2(0/1)

URLm(0/1)

URL1(0/1)

URL2(0/1)

URLm(0/1)

• Age• Stimulation Level• Scale

Antibody features

• ValidityAntigen features

22

Page 23: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Scenario

Antigen enters the body

Determine if the first signal is produced? (2 signals are needed for an antigen to trigger AIS, first signal is

produced if antigen is harmful to body)

If first signal is produced, present antigen to antibodies and compute distance, weight and influence zone.

Determine antibody with maximum weight. If maximum weight > threshold

compute SL and IZ for antibodyelse create by duplication a new antibody.

Clone and Mutate.

23

Page 24: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Danger Signal

Danger Theory (two signal approach) If antigen is harmful trigger an IS response else discard

the antigen.

In data mining context : harmful interesting (valid)

What is Danger signal in our system?◦ We should find a measure to determine the validity of

sessions.

24

Page 25: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Validity Measure

)2

1)(1(

),(

)(

1

1 1

PP

jksimilarity

SessionyConsistenc

P

k

P

kj

D

djisimilarity

ji,1),(

))(/)((max

)(/)()(

PageLengthPageionTotalDurat

PageLengthPageionTotalDuratPageDuration

eVisitedPagpage

eVisitedPagPagePagesitsNumberOfVi

PagesitsNumberOfViPageFrequency

))((

)()(

25

Page 26: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Validity Measure

)()(

)()(2)(

PageDurationPageFrequency

PageDurationPageFrequencyPageInterest

P

w

SessionInterest

P

i

pi 1)(

)()(

)()(2)(

SessionyConsistencSessionInterest

SessionyConsistencSessionInterestSessionValidity

26

Page 27: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Affinity Measure

What affinity measure is used in our proposed algorithm?

L

l

L

l

i

L

l

ji

ji

lantigenlantibody

lantigenInterestlantibody

antigenantibodyS

1 1

1cos

][][

])[(][

),(

27

Page 28: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Affinity Measure

)2

(2

2

ij

ijd

ij ew

Weight function decreases with distance from the antigen/data location.

is a scale parameter that controls the decay rate of the weights along the spatial dimensions

2ij

28

Page 29: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Stimulation Level

2

1

iJ

J

j

ij

iJ

w

s

21

iJ

iJiJiJ

wWs

1

1

1

J

j

ijiJ wW

29

Page 30: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Weighted Stimulation

)(2

1 JwwW

ws validityiJ

iJiJiJ

30

Page 31: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Network Stimulation & Suppression

21

21

21 )(

iJ

N

n

in

iJ

N

n

in

validityiJ

iJiJiJ

BB

ww

JwwW

ws

31

Page 32: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Cloning

min

1

ageage

ws

wsKN iN

nn

iclonesclones

B

Antibodies are cloned in proportion to their stimulationlevels relative to the average network stimulation.

To avoid preliminary proliferation of antibodies and to encourage a diverse repertoire new antibodies do not clone before they are mature (their age exceeds a threshold)

32

Page 33: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Hypermutation

Somatic hyper mutation is a powerful natural exploration mechanism in IS, that allows it to learn how to respond to new antigens that have never been seen before.

very costly and inefficient operation since its complexity is exponential in the number of features.

we model this operation in AIS by an instant antigen duplication whenever an antigen is encountered that fails to activate the entire immune network.

33

Page 34: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

Antibodies which are added to population via mutation are always superior individuals.

In this mutation mechanism whenever the system realize there are not enough good antibodies to confront with antigens, new antibodies add to population.

It is a new from of DANGER THEORY.

Directed mutation mechanism is as follow:

34

Page 35: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

0 1 1 0 0 0 0 1 0

0 1 0 0 1 1 1 0 0

0 1 1 1 0 1 1 0 0

1 1 0 0 1 1 1 0 0

1 0 0 0 0 1 1 1 1

0 1 0 0 0 1 1 1 0

Web log

In to the system

35

Page 36: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

0 1 0 0 0 1 1 1 0

36

Page 37: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

1 1 0 1 0 0 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 0 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 1 0 1 0

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

37

Page 38: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

1 1 0 1 0 -1 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 1 0 1 0

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

38

Page 39: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

1 1 0 1 0 -1 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 1 0 1 0

0 1 1 0 0 0 0 -1 0

0 1 0 0 +2 1 1 -1 0

0 1 +2 +2 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

39

Page 40: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

0 1 1 0 0 0 0 1 0

0 1 0 0 +2 1 1 -1 0

0 1 +3 1 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

0 1 0 1 0 1 1 0 0

1 1 0 1 0 -1 0 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

40

Page 41: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

1 1 0 1 0 0 -1 1 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 0 1 0 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +2 1 1 -1 0

0 1 +3 1 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

41

Page 42: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

1 1 0 1 0 0 -1 +2 0

0 1 1 1 0 0 0 1 0

1 1 0 -1 0 1 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

1 1 0 1 0 0 1 0 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +2 1 1 -1 0

0 1 +3 1 0 1 1 -1 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

42

Page 43: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Decide to Mutate

After some times

1 1 -9 1 0 0 -1 +8 0

0 1 1 1 0 0 0 1 0

1 1 0 -10 0 -9 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +9 1 1 -7 0

0 1 +9 1 0 1 1 -8 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

43

Page 44: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Mutation Occur

After some times

1 1 -9 1 0 0 -1 +8 0

0 1 1 1 0 0 0 1 0

1 1 0 -10 0 -9 0 1 0

1 0 0 1 0 1 0 0 0

0 1 0 1 0 0 0 1 0

0 1 1 0 0 0 0 1 0

0 1 0 0 +9 1 1 -7 0

0 1 +9 1 0 1 1 -8 0

1 0 0 0 1 1 1 0 0

+2 -1 0 0 0 1 1 1 +2

0 1 0 0 0 1 1 1 0

0 1 0 1 0 1 1 1 0

1 1 1 1 0 0 -1 0 0

1 1 0 1 0 0 0 1 0

44

Page 45: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Directed Mutation

Directed mutation is not computationaly complex.

It doesn't cause antibodies to destroy before they have to leave population.

It make system intelligent -> system can decide when to create new individuals.

After each T antigens enter the system, directed mutation happens.

45

Page 46: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Compression

Compression: cluster antibody population into k clusters.

external interactions: those occurring between an antigen (external agent) and the antibody in the immune network.

internal interactions: those occurring between one antibody and all other antibodies in the immune network.

The most expensive computation and storage overhead stems from calculating and storing all the internal network interactions (quadratic complexity with respect to the network size).

After compression: ◦ internal interactions:

◦ external interactions: k

choosing an appropriate number of clusters

2BN 1)( 2 k

k

NB

BN

BNk )( BNO

46

Page 47: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

1

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

47

Page 48: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

48

Page 49: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

49

Page 50: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

50

Page 51: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

51

Page 52: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

52

Page 53: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

53

Page 54: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

54

Page 55: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

55

Page 56: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

56

Page 57: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

57

Page 58: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

5

58

Page 59: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

55

1

59

Page 60: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

16

29 2

8

30

43

44

42

45

41

48

46

49

50

47

26

36

40

37

39

35

34

33

38

32

12

31

15

11

13

14

27

18

201

9

17

9

1

0

87

6

25

24

23

22

21

1

2

3 4

55

1

60

Page 61: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

6

9

24

23

25

22

21

1

4

3 5

2

1

2

10

49

50

61

Page 62: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

62

Page 63: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

63

Page 64: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

64

Page 65: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

65

Page 66: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

66

Page 67: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm VisualizationX

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

6

6

68

67

Page 68: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm VisualizationX

17

28 2

9

30

45

44

42

43

41

48

46 4

7

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

49

50

6

6

68

68

Page 69: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

6

6

68

69

Page 70: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Algorithm Visualization

17

28 2

9

30

45

44

42

43

41

26

36

39

37

40

35

34

33

38

32

12

31

13

11

15

14

27

18

192

0

16

8

7

76

9

24

23

25

22

21

1

4

3 5

2

1

2

10

6

6

6

68

46

70

Page 71: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Pseudocode

maxBN

1- Fix the maximal population size,

maxBN .

2- Initialize antibodies using a cross section of input data, initi 2 .

3- Compress immune network into K subnets using five iteration of K-means.

4- Repeat for each antigen jantigen {

4-1 Compute )( jantigenvalidity ;

4-2 If )( jantigenvalidity < validity threshold

4-2-1 discard jantigen and continue with a new antigen;

4-1 Present jantigen to each subnet centroid ( KkCk ,...,1, ) in network, compute distance

and weight. 4-2 Determine the most activated subnet (ma subnet) which has maximum kjw .

4-3 If all antibodies in ma subnet have minwwij (antigen weak to activate subnet){

4-3-1 create by duplication a new antibody (antibody= jantigen , initi 2 )

}else{ 4-3-1 Increment number of stimulation of antibody i; 4-3-2 Compute iantibody stimulation level ( ijws )

4-3-3 Update iantibody scale value ( 2ij )

} 4-4 clone antibodies; 4-5 If population size >

maxBN {

4-5-1 For each antibody i in network

4-5-1-1 If min. ageageantibodyi BN

n niJi swsantibody1

. ;

4-5-2 Sort antibodies in ascending order of their stimulation level; 4-5-3 Kill worst excess ))((

maxBB NNtop antibodies.

} 4-6 mutate antibodies after every T antigen. 4-7 After every T antigen, use five iteration K-means with previous centroid as initial centroid.

}

71

Page 72: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Data

Data set 1• One week of HTTP

requests to Music Machine Web site. www.hyperreal.org

• 220146 Requests.• 19542 Sessions.• 4756 URLs.

Data set 2• One week of HTTP

requests to the University of Saskatchewan’s WWW server.

• 44298 Requests.• 9188 Sessions.• 1519 URLs.

72

Page 73: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Ground Profiles

For evaluating learned profiles, it should be shown that the learned profiles are good representatives of the input data:

Summarization ability of AISWUM

In order to show this ability, a comparison between distribution of the learned profiles and input data should be done, so:

we need some ground profiles

Ground profiles are extracted using:

Scalable K-Means

73

Page 74: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Evaluation Metrics

L

kki

L

kkcki

ci

tAb

gtAb

gtAbprc

1,

1,,

)(

))((

)),((

L

kkc

L

kkcki

ci

g

gtAb

gtAbcvg

1,

1,, ))((

)),((

otherwise

prcgtAbprcifgtABPRC ci

tN

ic

Ab

0

min)),((max1)),((

)(

1

otherwise

cvggtAbcvgifgtABCVG ci

tN

ic

Ab

0

min)),((max1)),((

)(

1

)),(()),((),(, ccCVGPRC gtABCVGgtABPRCctS

74

Page 75: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Results (Music Machine)

Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t.

75

Page 76: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Precision

Distribution of precise antibodies per input category at time t.

76

Page 77: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Coverage

Distribution of complete antibodies per input category at time t.

77

Page 78: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Results (Saskatchewan University)

Distribution of the learned antibodies that are simultaneously precise and complete per input category at time t.

78

Page 79: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Precision

Distribution of precise antibodies per input category at time t.

79

Page 80: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Coverage

Distribution of complete antibodies per input category at time t.

80

Page 81: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Evaluation Metrics

x c

x c

N

t

N

cCVGPRC

N

t

N

cCVGPRCCVGPRC

ctS

tctSctS

tP

1 1,

1 1,,

),(

),,(),(

)(

Overall level of learned antibodies precision with respect to input datat

Ratio of learned antibodies that accurately represent the past input data to all of learned antibodies

t

81

Page 82: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Evaluation Metrics

Overall coverage of learned antibodies with respect to input data

x c

x c

N

t

N

cCVGPRC

N

t

N

cCVGPRCCVGPRC

tctS

tctSctS

tC

1 1,

1 1,,

),,(

),,(),(

)(

t

Ratio of past input data that are summarized accurately with antibodies to the all input data.

t

82

Page 83: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Results (Music Machines)

Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies.

Ratio of past input data that are summarized accurately with antibodies to the all input data.

t

t

83

Page 84: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Results (Saskatchewan)

t

84

Ratio of learned antibodies that accurately represent past input data to the all of learned antibodies.

Ratio of past input data that are summarized accurately with antibodies to the all input data.

t

Page 85: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Results

Maximum

Contentment

Minimum

Contentment

Average

Contentmen

t of 50 users

41% 15% 28% State 1

60% 40% 51% State 2

67% 45% 56% State 3

Danger Theory

Weighted Items

Weighted Sessions

State 1 No No No

State 2 Yes No No

State 3 Yes Yes Yes

85

Page 86: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Run Time

The rune time with one scan of data with non-optimal C++ code on Pentium 4 PC tooks:◦For the first dataset: less than 6 min.◦For the second dataset: less than 3 min.

86

Page 87: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Comparison with other methods

Method AIS-WUM SKM DBSCAN BIRCH aiNet Fuzzy AIS SOSDM

Reliability/

Insensitivity to initial

condition

Yes No Yes No Yes Yes Yes

Noise tolerance Yes No Yes No No Yes Moderately

Need to scan before

learning

No Yes Yes Yes Yes Yes No

Time complexity O(N) O(N) O(Nlog(N)) O(N) O(N²) O(N²) O(N)

Buffer data No Yes Yes Yes Yes Yes Yes

Number of clusters

specified

No Yes No Yes No No Yes

Handle evolving

clusters

Yes No No No Yes Yes Yes

Automatic scale

estimation

Yes No No No No Yes No

Clustering Model Network Centroids Medoids Centroids Network Network Network

Handle different

similarity measures

Yes No Yes No Yes Yes Yes

Density/Partition

based

Density Partition/

Distance

Density Partition Partition/

Distance

Density Partition/

Distance

87

Page 88: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Novelties of the proposed algorithm

Low Computational Complexity.

Danger Theory in Two Forms

Directed Mutation

Weighted Stimulation

Learning the Data in a Single Pass

Natural Mechanism

Applicable to Stream Data

Bi-functionality: Frequent Itemsets Mining + Finding Centroids of Clusters in Large Datasets

Clear and fast identification of outliers.

88

Page 89: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Conclusion

A robust and scalable algorithm for frequent itemsets mining is designed which is well fitted for noisy sparse data like Web usage data.

89

Page 90: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Conclusion

The main factor behind the ability of proposed algorithm to learn in a single pass lies in the richness of the immune network structure that form a dynamic synopsis of the data and danger theory which decide which antigen is dangerous and when new antibodies are needed for combating antigens.

90

Page 91: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Publications

B.Hoda Helmi, Adel T. Rahmani, Nona Helmi, “An Evolutionary Control Model for a Generic Multiagent System Using Artificial Immune Systems”, in proceeding of First Joint Congress on Fuzzy and Intelligent Systems,2007, Ferdowsi University.

B. Hoda Helmi, Adel T. Rahmani, “Image Segmentation with a New Texture Feature Based on AIS ”, In proceeding of the first conference on Data Mining, AmirKabir University, 2007, Tehran, Iran.(farsi)

B.Hoda Helmi, Adel T. Rahmani, “An AIS Algorithm for Web Usage Mining with Directed Mutation”, accepted in IEEE World Congress on Computational Intelligence, CEC division, 2008, Hong Kong.

B. Hoda Helmi, Adel T. Rahmani, “An Enhanced AIS for WUM, inspired by Danger Theory”, submitted to ICEE 2008, Tarbiat Modarres University, 2008, Tehran, Iran. (farsi)

91

Page 92: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

Publications

Adel T. Rahmani, B.Hoda Helmi, “EIN-WUM an AIS-based Algorithm for Web Usage Mining”, submitted to Genetic and Evolutionary Computation Conference, 2008, Atlanta, Georgia.

B. Hoda Helmi, Adel T. Rahmani, “A New Web Usage Mining Method based on An Artificial Immune System Solution with Enhanced Network and Danger Theory ”, submitted to International Journal of Control, Automation, and Systems.

B.Hoda Helmi, Adel T. Rahmani, “Evolutionary based Combining of Evolved Neural Network Classifiers”, accepted in IASTAD International Conference on Signal Processing, Pattern Recognition and applications, 2006, Austria. (unrelated)

92

Page 93: Artificial Immune based Approach to Association Rule Mining By: B. Hoda Helmi Supervisor: Adel T. Rahmani January 2008 A Thesis Submitted in Partial Fulfillment.

پایان

Thanks

93