Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer...

Post on 19-Jan-2016

213 views 1 download

Transcript of Parallel Apriori Algorithm Using MPI Congressional Voting Records Çankaya University Computer...

Parallel Apriori Algorithm Using MPI

Congressional Voting Records

Çankaya University

Computer Engineering Department

Ahmet Artu YILDIRIM

January 2010

Efficient Association Rules Mining Using MPI

Overview

• Apriori algorithm used for discovery of association rules

• Computation time is the major issue if dataset is pretty large

• The aim is to increase efficiency of mining process in running time manner utilizing computers for parallel computation

Efficient Association Rules Mining Using MPI

Apriori Algorithm (Example)

• Confidence({5}→{2,3})=Prob({2,3,5}/{5})=2/3=0.66

• Min support=50%

• Min support count=0.5x4 = 2

• Min confidence = 0.50

Efficient Association Rules Mining Using MPI

Technology and Methodology• Platform: GNU/Linux 2.6.20.7 i386

Programming language: ISO C99 language Cross platform APIs: MPICH API for MPI implementation and Glib API utility library Compiler suite: GNU toolchain

• Division Methodology:

1. Dataset division

2. Large frequent itemset division

• Dataset division methodology used

Efficient Association Rules Mining Using MPI

Data Division (Merging Local Support)

Efficient Association Rules Mining Using MPI

Parallel Apriori Algorithm Flowchart

Efficient Association Rules Mining Using MPI

Dataset

• 1984 United States congressional voting records

• Attribute Information: Democrat, republican, handicapped infants yes-no, water project cost sharing yes-no, adoption of the budget resolution yes-no, physician fee freeze yes-no, el salvador aid yes-no, religious groups in schools yes-no, aid to nicaraguan contras yes-no, mx-missile yes-no, immigration yes-no, synfuels corporation cutback yes-no, education spending yes-no, superfund right to sue yes-no, crime yes-no, duty free exports yes-no, export admin act south africa yes-no

Efficient Association Rules Mining Using MPI

Preprocessing of Dataset

• Data transformation applied before processing

• Attributes numbered such as democrat = 1, republican = 2, handicapped infants yes = 3, handicapped infants no = 4, water project cost sharing yes = 5 …

Efficient Association Rules Mining Using MPI

Config File and Run CommandConfig File:

attributecount=34

transactioncount=435

minsupportpercent=50

minconfidencepercent=80

Command:

mpirun -np x -machinefile machines ./aprioriparallel

Efficient Association Rules Mining Using MPI

Program Output

Efficient Association Rules Mining Using MPI

Rules

Rules according to confidence threshold level 80%:

• Democrats support

• Adoption of the budget resolution

• Aid to Nicaraguan contras

• Democrats do NOT support

• Physician fee freeze

Efficient Association Rules Mining Using MPI

Rules (cont.)

Rules according to confidence threshold level 80%:

• Those who do not support physician fee freeze, support adoption of the budget resolution

• Those who support adoption of the budget resolution also do not support physician fee freeze

Efficient Association Rules Mining Using MPI

Parallel Computation Speed Up

• Run on Çankaya University wee cluster

• Processor Specs: 600 MHz CPU, 250 Mb Ram

• Speed up = ts / tp

Efficient Association Rules Mining Using MPI

Conclusion

• Parallel version of Apriori algorithm is efficient in running time manner with large datasets

• Scalability gained via adding additional nodes (computers) or memory without modification of code

• High price-performance ratio by utilizing less powerful computers

Thank You

Questions?