1 Powerset Explorer: A Datamining Application Jordan Lee.

67
1 Powerset Explorer: A Datamining Application Jordan Lee
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    226
  • download

    0

Transcript of 1 Powerset Explorer: A Datamining Application Jordan Lee.

Page 1: 1 Powerset Explorer: A Datamining Application Jordan Lee.

1

Powerset Explorer: A Datamining Application

Jordan Lee

Page 2: 1 Powerset Explorer: A Datamining Application Jordan Lee.

2

Background

Page 3: 1 Powerset Explorer: A Datamining Application Jordan Lee.

3

Background

PAST– Datamining accomplished with human intuition

Page 4: 1 Powerset Explorer: A Datamining Application Jordan Lee.

4

Background

PAST– Datamining accomplished with human intuition

PRESENT– Computer aided with AI and brute force CPU cycles

Page 5: 1 Powerset Explorer: A Datamining Application Jordan Lee.

5

Background

PAST– Datamining accomplished with human intuition

PRESENT– Computer aided with AI and brute force CPU cycles

FUTURE– Enter PowersetViewer….

Page 6: 1 Powerset Explorer: A Datamining Application Jordan Lee.

6

Dataset

Page 7: 1 Powerset Explorer: A Datamining Application Jordan Lee.

7

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Page 8: 1 Powerset Explorer: A Datamining Application Jordan Lee.

8

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }

Page 9: 1 Powerset Explorer: A Datamining Application Jordan Lee.

9

Dataset

Alphabet– Items that can be found in transactions– Eg. Apples, bread, chips

Transaction– Sets of items (unordered)– Eg. Tx1 = { Apples, Chips }– Eg. Tx2 = { Bread }

Transaction database– Collection of transactions (unordered, possibly repetitive)– Eg. Walmart transaction logs

Page 10: 1 Powerset Explorer: A Datamining Application Jordan Lee.

10

Example Dataset

Student enrollment database

Page 11: 1 Powerset Explorer: A Datamining Application Jordan Lee.

11

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

Page 12: 1 Powerset Explorer: A Datamining Application Jordan Lee.

12

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }

Page 13: 1 Powerset Explorer: A Datamining Application Jordan Lee.

13

Example Dataset

Student enrollment database– Alphabet = courses

{ CPSC124, CPSC126, PHIL120, ANTH100, ENGL112 }

– Transaction = courses student is enrolled in #29389002 -> { CPSC 124, PHIL120, ENGL112 }

– Transaction DB = list of student course schedules

Page 14: 1 Powerset Explorer: A Datamining Application Jordan Lee.

14

Example Dataset (cont’d)

72423298 5 676 1701 3046 3900 1327 38578546 7 175 178 1182 1701 3038 680 39127660625 5 326 676 1701 3038 390843359163 3 1177 1699 4317 26495781 6 676 1177 1701 3038 3900 4275 48536452 4 1699 2339 1327 2826 64251972 6 676 1177 1701 3038 3900 2549 23212318 5 676 1701 3040 3813 3900 19820119 5 104 676 1699 3038 3900 65954629 4 480 676 3040 3908 54392012 5 676 1701 3038 3813 3899 85833501 5 676 1699 3040 3813 3900 65136197 5 676 1699 3038 3900 2580

Page 15: 1 Powerset Explorer: A Datamining Application Jordan Lee.

15

Why?

Why is this interesting?

Page 16: 1 Powerset Explorer: A Datamining Application Jordan Lee.

16

Why?

Why is this interesting?– Consumer transaction logs -> trends in consumer

buying

Page 17: 1 Powerset Explorer: A Datamining Application Jordan Lee.

17

Why?

Why is this interesting?– Consumer transaction logs -> trends in consumer

buying– Student enrollment database -> trends in

enrollment What electives do most undergrad computer science

students take? Departments can determine which joint majors would fit

the student population.

Page 18: 1 Powerset Explorer: A Datamining Application Jordan Lee.

18

Why? (cont’d)

Dataset sizes growing exponentially

Page 19: 1 Powerset Explorer: A Datamining Application Jordan Lee.

19

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits

Page 20: 1 Powerset Explorer: A Datamining Application Jordan Lee.

20

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)

Page 21: 1 Powerset Explorer: A Datamining Application Jordan Lee.

21

Why? (cont’d)

Dataset sizes growing exponentially– Human intuition has reached its limits– Require computers and AI (expensive)– Information visualization can scale the power of

human intuition

Page 22: 1 Powerset Explorer: A Datamining Application Jordan Lee.

22

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Page 23: 1 Powerset Explorer: A Datamining Application Jordan Lee.

TreeJuxtaposer

Page 24: 1 Powerset Explorer: A Datamining Application Jordan Lee.

24

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals

Page 25: 1 Powerset Explorer: A Datamining Application Jordan Lee.

25

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids

Page 26: 1 Powerset Explorer: A Datamining Application Jordan Lee.

26

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids– Guaranteed visibility

Page 27: 1 Powerset Explorer: A Datamining Application Jordan Lee.

27

Powerset Explorer

Code base from TreeJuxtaposer (Munzner)– AccordianDrawer package

Goals– Focus + context exploration using grids– Guaranteed visibility– Marking of groups

Page 28: 1 Powerset Explorer: A Datamining Application Jordan Lee.

28

Milestones Status Update

Page 29: 1 Powerset Explorer: A Datamining Application Jordan Lee.

29

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

Page 30: 1 Powerset Explorer: A Datamining Application Jordan Lee.

30

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”.

Page 31: 1 Powerset Explorer: A Datamining Application Jordan Lee.

31

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6)

Page 32: 1 Powerset Explorer: A Datamining Application Jordan Lee.

32

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items.

Page 33: 1 Powerset Explorer: A Datamining Application Jordan Lee.

33

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items. #5 Implement multiple constraints

Page 34: 1 Powerset Explorer: A Datamining Application Jordan Lee.

34

Milestones Status Update

#1 Completion of the basic visualization of a randomized database of small set size (~10)

#2 Addition of a single level of “marking”. #3 Addition of multiple levels of “marking” (6) #4 Addition of background marking to demarcate

areas of sets containing different amounts of items. #5 Implement multiple constraints #6 Increase maximum possible dataset size to at

least 100.

Page 35: 1 Powerset Explorer: A Datamining Application Jordan Lee.

35

Difficulties

Page 36: 1 Powerset Explorer: A Datamining Application Jordan Lee.

36

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Page 37: 1 Powerset Explorer: A Datamining Application Jordan Lee.

37

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger

Page 38: 1 Powerset Explorer: A Datamining Application Jordan Lee.

38

Difficulties

Multiple constraints difficult to implement on current server-side dataminer

Can not enumerate a powerset of alphabet size greater than 14 elements (integer = 32 bits)– Solution: use java class BigInteger

High CPU and memory usage– Solultion: upgrade computer! hack

Page 39: 1 Powerset Explorer: A Datamining Application Jordan Lee.

39

Current Status

Reduced database8680433 3 0 7 5 2768129 2 6 4 6385608 5 1 9 10 9 11 147924 5 5 2 9 5 2 234140 3 11 4 8 4331093 4 4 6 0 0 3158394 5 12 1 12 5 4 5797538 6 11 4 3 13 12 4 6243191 1 5 5872060 4 3 8 9 6

Page 40: 1 Powerset Explorer: A Datamining Application Jordan Lee.

40

Current Status

Property file– 0 CPSC 325 75.0 3

1 PHIL 327 84.0 1 2 ANTH 329 45.0 2 3 MATH 327 0.0 3 4 PSYC 328 0.0 1 5 ENGL 329 0.0 2 6 APSC 540 0.0 1 7 MECH 541 0.0 1 8 STAT 543 0.0 1 9 SPAN 201 71.0 1 10 FREN 258 76.0 2 11 ECON 260 84.0 1 12 LING 295 42.0 1 13 EECE 302 73.0 1

Page 41: 1 Powerset Explorer: A Datamining Application Jordan Lee.

41

Page 42: 1 Powerset Explorer: A Datamining Application Jordan Lee.

42

Page 43: 1 Powerset Explorer: A Datamining Application Jordan Lee.

43

Page 44: 1 Powerset Explorer: A Datamining Application Jordan Lee.

44

Page 45: 1 Powerset Explorer: A Datamining Application Jordan Lee.

45

Page 46: 1 Powerset Explorer: A Datamining Application Jordan Lee.

46

Page 47: 1 Powerset Explorer: A Datamining Application Jordan Lee.

47

Page 48: 1 Powerset Explorer: A Datamining Application Jordan Lee.

48

Page 49: 1 Powerset Explorer: A Datamining Application Jordan Lee.

49

Page 50: 1 Powerset Explorer: A Datamining Application Jordan Lee.

50

Page 51: 1 Powerset Explorer: A Datamining Application Jordan Lee.

51

Page 52: 1 Powerset Explorer: A Datamining Application Jordan Lee.

52

Page 53: 1 Powerset Explorer: A Datamining Application Jordan Lee.

53

Page 54: 1 Powerset Explorer: A Datamining Application Jordan Lee.

54

Page 55: 1 Powerset Explorer: A Datamining Application Jordan Lee.

55

Page 56: 1 Powerset Explorer: A Datamining Application Jordan Lee.

56

Page 57: 1 Powerset Explorer: A Datamining Application Jordan Lee.

57

Page 58: 1 Powerset Explorer: A Datamining Application Jordan Lee.

58

Page 59: 1 Powerset Explorer: A Datamining Application Jordan Lee.

59

Page 60: 1 Powerset Explorer: A Datamining Application Jordan Lee.

60

Page 61: 1 Powerset Explorer: A Datamining Application Jordan Lee.

61

Page 62: 1 Powerset Explorer: A Datamining Application Jordan Lee.

62

Page 63: 1 Powerset Explorer: A Datamining Application Jordan Lee.

63

Page 64: 1 Powerset Explorer: A Datamining Application Jordan Lee.

64

Page 65: 1 Powerset Explorer: A Datamining Application Jordan Lee.

65

Page 66: 1 Powerset Explorer: A Datamining Application Jordan Lee.

66

Page 67: 1 Powerset Explorer: A Datamining Application Jordan Lee.

67

Questions?