EINDHOVEN UNIVERSITY OF TECHNOLOGYmpechen/projects/pdfs/Boer2010.pdf · Table 3.12: Data with...

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Table 1.1: Goals of the data mining on educational data.

Table 3.1: Distribution of students with their course count.

Table 3.2: Validation and cleaning with validation window (2* standard

deviation).

3 4 5 6 7 8

Figure 3.5: Distribution of students over study time.

Table 3.3: Validation and cleaning with 3* standard deviation window.

1 2 3 4 5 6 7 8 9 10 11

Figure 3.6: Distribution of students over

study time.

Table 3.4: Validation and cleaning with 3* standard deviation window on

bootstrapped data.

2 3 4 5 6 7 8 9

study time.

Table 3.5: Validation and cleaning with 2* standard deviation window on

bootstrapped data.

4 5 6 7

Figure 3.8: Distribution of students over study

4 5 6 7 8

study time

2 3 4 5 6 7 8 9 10

study time

4 5 6 7 8

study time

3 4 5 6 7 8 9 10

study time

!767096.4

767096.4

"5.832469

Figure 3.13: Determining of short, normal and long classes

!5.107548

5.107548

"6.34147

!4.554752

4.554752

"6.374105

!109692.5

5.109692

"7.164413

Figure 3.19: Courses with amount of students

Figure 3.20: Students with amount of courses

Figure 3.25: Distribution of results over the years

Figure 3.26: Amount of new students over the years.

Figure 3.27: Distribution of results over the years

Figure 3.28: Amount of new students over the years.

Table 3.10: Attributes with possible values

Table 3.11: Data with categorized attributes

Table 3.12: Data with ordinal attributes

TPpecision

TPrcall

Definitions of accuracy metrics

Table 4.1: JRIP results on 2std, categorical attributes

Table 4.2: JRIP results on 3std, categorical attributes

Table 4.3: JRIP results on 2std, ordinal attributes

Table 4.4: JRIP results on 3std, ordinal attributes

Table 4.5: Class distribution over instances of 3std, ordinal attributes

Table 4.6: Rules with cost sensitive learning.

Table 4.7: Rules with cost sensitive learning.

Figure 4.1: ROC curve of long class, table A15

Figure 4.2: ROC curve of long class, table A16

Table 4.8: JRIP results on binary class with SMOTE.

Figure 4.3: ROC curve of class long from JRIP results on binary class with

SMOTE.

Table 4.9: Ridor results on binary class with SMOTE.

Wiskunde 2 Wiskunde 1 0.699 232 5.08

Wiskunde 1 Wiskunde 2 0.718 232 5.08

Inleiding functioneel progragrammeren

Systeemmodelleren 1

0.682 227 3,565

Wiskunde 2 Databases 1 0.81 269 3.542

Wiskunde 1 Databases 1 0.724 234 3.168

Systeemmodelleren 1 Databases 1 0.639 287 2.795

Operating systems Compilers 0.609 255 2.775

Compilers /\ Programmeren 1 Programmeren 2 0.769 250 2.334

Wiskunde 2 Programmeren 1 0.798 265 2.172

Automatentheorie en formele talen /\ Programmeren 2

Programmeren 1

0.763 267 2.076

Inleiding functioneel programmeren

Programmeren 2 0.679 226 2.059

Systeemmodelleren 1 Programmeren 2 0.675 303 2.047

Wiskunde 1 Programmeren 1 0.749 242 2.038

Compilers Programmeren 2 0.67 345 2.032

Basiswiskunde 3 Programmeren 2 0.654 231 1.985

Compilers /\ Programmeren 2 Programmeren 1 0.725 250 1.972

Automatentheorie en formele talen

Automatentheorie en formele talen /\ Programmeren 1

Implementatie Programmeren 2 0.628 245 1.906

Operating systems Programmeren 2 0.628 263 1.904

Databases 1 Programmeren 1 0.657 353 1.788

Programmeren 2 Programmeren 1 0.65 502 1.768

Compilers Programmeren 1 0.627 323 1.706

Systeemmodelleren 1 Programmeren 1 0.619 278 1.685

Table 4.10: Association rules found in results of students that where

insufficient on the first time they tried to pass the course.

Figure 4.4: Clustergram with study length and courses.

Table 4.11: Courses for which can be said that almost all students have a

good result.

Table 4.12: Course of the blue rectangle in figure 4.4.

Figure 4.5: JRIP classification rule used for emerging patterns.

Table 4.13: Support per year the rule of figure 4.5.

Figure 5.1: Process extracted from the first year of all students.

Figure 5.2: Process extracted from the second year of all students.

Figure 5.3: Process extracted from the third year of all students.

Figure 5.4: Process extracted from the fourth year of all students.

Figure 5.5: Process extracted from the fifth year of all students.

Figure 5.6: Process extracted from the sixth year of all students.

Algebra 2 (1.3 and 2.1)

Basiswiskunde 3 (1.3)

Algebra 1 (1.2)

Implementatie (1.3 and 2.1)

Programmeren 3 (2.1)

Operating systems (2.3)

Compilers (2.2)

Automatentheorie en formele talen (1.3)

Table 5.1: Result of Frequent Itemset Mining on all courses

Node filter:

Significance cutoff: 0.430

Edge filter:

Cutoff: 0.042

Utility rt: 0.582

Node filter:

Edge filter:

Cutoff: 0.042

Utility rt: 0.583

Node filter:

Edge filter:

Cutoff: 0.032

Utility rt: 0.217583

Figure 5.7: Fuzzy models of the 3 different study times on the most frequent

courses.

Heuristic model of students with a short study time.

Relative to best threshold: 0.05

Positive observations: 10

Dependency threshold: 0.9

Connected: yes

Heuristic model of students with a short study time.

Connected: no

Figure 5.8: Heuristic model of students with a short study time.

Heuristic model of students with a normal study time.

Figure 5.9: Heuristic model of students with a normal study time.

Heuristic model of students with a long study time.

Connected: yes

Heuristic model of students with a long study time.

Connected: no

Figure 5.10: Heuristic model of students with a long study time

Figure 5.11: Petri net of the courses given to students with the start year

Figure 5.12: Result of conformance checking the students that start in 2004.

Figure 5.13: Sequence mining results for short study time students on the

courses of table “Result of Frequent Itemset Mining on all courses”.

Figure 5.14: Sequence mining results for normal study time students on the

Figure 5.15: Sequence mining results for long study time students on the

Table A1: Table names of the data with English translation.

Table A2: Fields of table Address with English translation.

Table A3: Fields of table exams with English translation.

Table A4: Fields of table personal details with English translation.

Table A5: Fields of table results with English translation.

Table A6: Fields of table study packages with English translation.

Table A7: Fields of table study package participants with English translation.

Table A8: Fields of table preparatory educations with English translation.

Table A9: Fields of the table preparatory education courses with English

translation.

Figure A2: The amount of exams per year.

Table A10: Different exam assessments with English translation.

Table A11: JRIP statistics after mining on 2std, categorical attributes

Table A12: JRIP statistics after mining on 3std, categorical attributes

Table A13: JRIP statistics after mining on 2std, ordinal attributes

Table A14: JRIP statistics after mining on 3std, ordinal attributes

Table A 15: JRIP statistics after cost sensitive mining on 3std, ordinal

attributes. V1

Table A16: JRIP statistics after cost sensitive mining on 3std, ordinal

attributes. v2

Table A17: Statistics of JRIP results on binary class with SMOTE.

Table A18: Statistics of Ridor results on binary class with SMOTE.

=== Run information ===

Scheme: weka.classifiers.trees.Id3

Relation: test-weka.filters.supervised.instance.SMOTE-C0-K5-P400.0-S1

Instances: 974

Attributes: 5745

[list of attributes omitted]

Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

startYear = 1980: null

startYear = 1983: short-normal

startYear = 1984: long

startYear = 1985: long

startYear = 1986

| 2L500_firstResult<4 = -: long

| 2L500_firstResult<4 = 0

| | 1B053_firstResult<7 = -: null

| | 1B053_firstResult<7 = 0: long

| | 1B053_firstResult<7 = 1: short-normal

| | 0Z060_trialCount>1 = -: long

| | 0Z060_trialCount>1 = 0: short-normal

| | 0Z060_trialCount>1 = 1: long

startYear = 1987

| 5F040_firstResult<7 = -

| | 0K060_trialCount>0 = -: long

| | 0K060_trialCount>0 = 0: null

| | 0K060_trialCount>0 = 1: short-normal

| 5F040_firstResult<7 = 0

| | 2K700_highestResult<8 = -: null

| | 2K700_highestResult<8 = 0: short-normal

| | 2K700_highestResult<8 = 1

| | | 2N010_highestResult<8 = -: null

| | | 2N010_highestResult<8 = 0: short-normal

| | | 2N010_highestResult<8 = 1

| | | | 1A060_trialCount>0 = -: short-normal

| | | | 1A060_trialCount>0 = 0: null

| | | | 1A060_trialCount>0 = 1: long

| 5F040_firstResult<7 = 1

| | 2F550_firstResult<3 = -: null

| | 2F550_firstResult<3 = 0: short-normal

| | 2F550_firstResult<3 = 1: long

startYear = 1988

| 2L711_firstResult<4 = -: short-normal

| | 2M240_firstResult<8 = -: null

| | 2M240_firstResult<8 = 0

| | | 2M227_trialCount>1 = -: long

| | | 2M227_trialCount>1 = 0: short-normal

| | | 2M227_trialCount>1 = 1: long

| | 2M240_firstResult<8 = 1: short-normal

| | 1B040_trialCount_np = -: long

| | 1B040_trialCount_np = 0: null

| | 1B040_trialCount_np = 1: short-normal

startYear = 1989

| | 2L140_trialCount>0 = -: long

| | 2L140_trialCount>0 = 0: null

| | 2L140_trialCount>0 = 1

| | | 0K060_trialCount>2 = -: null

| | | 0K060_trialCount>2 = 0: short-normal

| | | 0K060_trialCount>2 = 1: long

| 2L670_firstResult<7 = 1: short-normal

startYear = 1990

| 2L530_trialCount>0 = -

| | 2R707_firstResult<7 = -: short-normal

| | 2R707_firstResult<7 = 0

| | | 0L800_trialCount>0 = -: short-normal

| | | 0L800_trialCount>0 = 0: null

| | | 0L800_trialCount>0 = 1: long

| | 2R707_firstResult<7 = 1

| | | 2WS13_trialCount>1 = -: long

| | | 2WS13_trialCount>1 = 0: short-normal

| | | 2WS13_trialCount>1 = 1: long

| 2L530_trialCount>0 = 0: null

| 2L530_trialCount>0 = 1: short-normal

startYear = 1991

| 2Y420_trialCount>1 = -: short-normal

| 2Y420_trialCount>1 = 0

| | 2L340_firstResult<8 = -: short-normal

| | 2L340_firstResult<8 = 0: short-normal

| | 2L340_firstResult<8 = 1

| | | 2L500_trialCount>1 = -: null

| | | 2L500_trialCount>1 = 1: short-normal

| 2Y420_trialCount>1 = 1

| | 2L085_firstResult<8 = -: null

| | 2L085_firstResult<8 = 0: short-normal

| | 2L085_firstResult<8 = 1

| | | 2L060_trialCount>2 = -: null

| | | 2L060_trialCount>2 = 1: short-normal

startYear = 1992

| 2L060_trialCount>3 = -: null

| 2L060_trialCount>3 = 0: short-normal

| 2L060_trialCount>3 = 1: long

startYear = 1993

| 1B170_trialCount>6 = -: null

| 1B170_trialCount>6 = 0

| | 1B050_trialCount_np = -: short-normal

| | 1B050_trialCount_np = 0: null

| | 1B050_trialCount_np = 1: long

| 1B170_trialCount>6 = 1: long

startYear = 1994

| | 0K060_trialCount>1 = -: null

| | 0K060_trialCount>1 = 0: short-normal

| | 0K060_trialCount>1 = 1: long

| 2L711_firstResult<3 = 1: long

startYear = 1995

| 1J210_firstResult<6 = -: short-normal

| 1J210_firstResult<6 = 0

| | 1Z340_trialCount>2 = -: short-normal

| | 1Z340_trialCount>2 = 0: short-normal

| | 1Z340_trialCount>2 = 1: long

| 1J210_firstResult<6 = 1: long

startYear = 1996

| 2R237_highestResult<8 = -: null

| 2R237_highestResult<8 = 0

| | 2M004_trialCount>0 = -: long

| | 2M004_trialCount>0 = 0: null

| | 2M004_trialCount>0 = 1: short-normal

| 2R237_highestResult<8 = 1

| | 2Y380_highestResult<8 = -: short-normal

| | 2Y380_highestResult<8 = 0

| | | 2IN40_trialCount_np = -: short-normal

| | | 2IN40_trialCount_np = 0: null

| | | 2IN40_trialCount_np = 1: long

| | 2Y380_highestResult<8 = 1

| | | 2R077_firstResult<7 = -

| | | | 1A350_trialCount>1 = -: null

| | | | 1A350_trialCount>1 = 0: long

| | | | 1A350_trialCount>1 = 1: short-normal

| | | 2R077_firstResult<7 = 0

| | | | 1C200_trialCount_np = -: short-normal

| | | | 1C200_trialCount_np = 0: null

| | | | 1C200_trialCount_np = 1: long

| | | 2R077_firstResult<7 = 1: long

startYear = 1997

| 2io60_trialCount>0 = -

| | 1B170_firstResult<10 = -: null

| | 1B170_firstResult<10 = 0: long

| | 1B170_firstResult<10 = 1: short-normal

| 2io60_trialCount>0 = 0: null

| 2io60_trialCount>0 = 1: long

startYear = 1998

| 2IH20_trialCount>2 = -: null

| 2IH20_trialCount>2 = 0

| | 2M204_trialCount>0 = -

| | | 2M090_trialCount>1 = -: short-normal

| | | 2M090_trialCount>1 = 0: long

| | | 2M090_trialCount>1 = 1: short-normal

| | 2M204_trialCount>0 = 0: null

| | 2M204_trialCount>0 = 1: short-normal

| 2IH20_trialCount>2 = 1

| | 2M980_firstResult<7 = -: long

| | 2M980_firstResult<7 = 0

| | | 2M927_firstResult<6 = -: short-normal

| | | 2M927_firstResult<6 = 0: short-normal

| | | 2M927_firstResult<6 = 1: long

| | 2M980_firstResult<7 = 1: long

startYear = 1999

| 1C200_firstResult<5 = -: short-normal

| 1C200_firstResult<5 = 0: short-normal

| 1C200_firstResult<5 = 1

| | 2Y345_firstResult<8 = -: long

| | 2Y345_firstResult<8 = 0: short-normal

| | 2Y345_firstResult<8 = 1

| | | 2F540_trialCount>2 = -: long

| | | 2F540_trialCount>2 = 0: long

| | | 2F540_trialCount>2 = 1: short-normal

startYear = 2000

| 2M227_highestResult<7 = -: null

| 2M227_highestResult<7 = 0: short-normal

| 2M227_highestResult<7 = 1

| | 0L800_trialCount>0 = -: long

| | 0L800_trialCount>0 = 0: null

| | 0L800_trialCount>0 = 1: short-normal

startYear = 2001: short-normal

Time taken to build model: 11.38 seconds

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 858 88.0903 %

Incorrectly Classified Instances 112 11.499 %

Kappa statistic 0.769

Mean absolute error 0.1155

Root mean squared error 0.3398

Relative absolute error 23.2354 %

Root relative squared error 68.1697 %

UnClassified Instances 4 0.4107 %

Total Number of Instances 974

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

0.87 0.099 0.906 0.87 0.887 0.884 short-normal

0.901 0.13 0.863 0.901 0.882 0.884 long

Weighted Avg. 0.885 0.114 0.885 0.885 0.885 0.884

=== Confusion Matrix ===

a b <-- classified as

441 66 | a = short-normal

46 417 | b = long

Table A19: Statistics of ID3 results on binary class with SMOTE.

Table A20: Statistics of NaïveBayes results on binary class with SMOTE.

EINDHOVEN UNIVERSITY OF TECHNOLOGYmpechen/projects/pdfs/Boer2010.pdf · Table 3.12: Data with...

Documents

Transcript of EINDHOVEN UNIVERSITY OF TECHNOLOGYmpechen/projects/pdfs/Boer2010.pdf · Table 3.12: Data with...

Hoofdstuk 6 – Objectgeoriënteerd Programmeren: Overerving

Programmeren in C: practicum handleiding

Abstraction Oriented Programming: Functioneel programmeren op de JVM met Clojure (presentatie)

Carrier FSMA Compliance - Mo Trucking · 4/5/2016 · Sanitary condition . 1.908(b)(3) A . shipper must . develop and implement written procedures, subject to the records requirements

Joomla 3 Component programmeren met RAD - Joomladagen 2014

Dose dependent effects of estradiol benzoate for ......Estradiol concentrations did not differ between low and high AFC groups (31.66 ± 2.51 vs 28.86 ± 5.00 pg/ml; P = 0.628). We

CLASSIFICATION: UNCLASSIFIED...(EGI), HARM VI, Low Band Transmitter, Band 7/8, Night Vision Devices (NVDs) and data fusion with national assets. ... JAX 5.227 0.628 10/02 0.675 10/03

PROS2 Les 11 Programmeren en Software Engineering 2.

November 2001Eiffel sessie 1p. 1 OBJECTGERICHT PROGRAMMEREN Elk object behoort tot een klasse REKENING rek01 rek02 PRODUCT prod01 prod02 prod03.

Programmeren in C: 3. Arrays, pointers en strings

'The making of the new Mark Zuckerberg' - Programmeren in het onderwijs - Pauline Maas - OWD14

AutoLISP voor beginners : een beknopte handleidingvoorkennis van funktioneel programmeren of LISP niet noodzakelijk. 4.2 LISP en AutoLISP AUToLISP is de programmeertaal waarin applikaties

Microcontroller Programmeren in C

Joomla 2.5 / 3.0 module programmeren

213500 Programmeren 1 6 september 2010 HOORCOLLEGE 2: INTERACTIE EN CONDITIES 213500 PROGRAMMEREN 1 6 SEPTEMBER 2009 Software Systems - Programming1 Programming.

Programmeren (Ectrie) - Lecture 1: Introduction

Inleiding Programmeren - win.tue.nlkeesh/dokuwiki/lib/exe/... · Inleiding Programmeren Kees Huizing instructeurs & studentassistenten 2IP65 Fall10 1. Overview 8 lectures 15 instructions

Inleiding Programmeren in C++

PROGRAMMEREN IN Cicozct.tudelft.nl/TUD_CT/software/programmeren/... · 12.1.2 Object files 74 12.1.3 Libraries 74 12.1.4 Hoofdprogramma 74 12.2 Compilatie en Linken van objectcode

OOP in Joomla 2.5 (modules programmeren)