Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers.

19
Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers

Transcript of Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers.

Predicting Students Drop Out: a Casestudy

Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers

The Case Study

• Educational Data Mining in a practical setting• Directed to a student advice procedure• Eindhoven University of Technology, Electrical

Engineering department

The Case Study: advice procedure

PAGE 3July 2009

Exam results

Pre-university student

information

September

October

November

December

January

EXAMS

EXAMS

HOLIDAY

EXAMS

Exam results

ADVICE

STUDENTS

30% 70%

DEADLINE

Talks w

ith stu

den

ts etc.

Outline

• CRISP-DM Framework• Understanding of context

• Data understanding

• Data preparation

• Modeling

• Evaluation

• Deployment

• Conclusions and further work

PAGE 4July 2009

CRISP-DM Framework

• Understanding of context• Data understanding• Data preparation• Modeling• Evaluation• Deployment

PAGE 5July 2009

Understanding of context

• Situation at Electrical Engineering, Eindhoven University of Technology

• 40% dropout rate, small inflow

• Decision to dropout preferably before end of January

• Study advice by student counselor

• Objective for the department:• More robust and objective advices

PAGE 6July 2009

Understanding of context

• In data mining terms:• Build model for academic success of a student

• Based on the currently available information

• Only information until December of year of enrollment.

• Objective for research:• Try out applicability EDM in this context:

− Enough data (amount)?

− Enough data (type)?

PAGE 7July 2009

Data understanding

• Data source• Institutions’ database

− Pre-university data

− University data

• Resulting data• Data from 648 students, from 2001-2009

PAGE 8July 2009

Data preparation (pre-university data)

• Standard preparatory education:• # courses

• Type of courses taken

• Average grades for total, science, and math

• Non-standard previous education:• Type

• Grade

PAGE 9July 2009

Data preparation (university data)

• Courses, grades, # attempts• Many transformations needed:

• Reorganizations

• Partial exams

• Example: Calculus• 2000-2001: 1 examination

• 2001-2006: 2 partial examinations

• 2007-2008: 5 partial examinations, or 1 examination.

PAGE 10July 2009

Modeling (general)

• Classification task• 2 class classification

• Criterion: finish all courses of first year in three years

• Several mining techniques applied• Decision trees (+ensembles), bayesian

classifiers, association rules

• Separate university/pre-university data first

PAGE 11July 2009

Modeling (pre-university data)

• Base line model• One rule classifier

• 68% accuracy using Science_mean

• No significant improvement using other classification techniques

PAGE 12July 2009

Modeling (university data)

• Base line model• One rule classifier

• 75% accuracy using Linear algebra AB

• Significant improvements using other models (80%)

• Decision trees slightly better than other models

PAGE 13July 2009

Modeling (total set)

• Accuracies 80%, using attributes from both subsets

• Improvements using cost matrices• Shape misclassification

• Small trade-offs accuracy and misclassification:

• Accuracy 79%, 52% of errors FP

• Accuracy 76%, 41% of errors FP

• Similarities between models• Linear Algebra AB always root node• Science Mean always high in tree

PAGE 14July 2009

Modeling (decision tree)

LinAlgAB

< 5.5

1

> 5.5

CalcA

< 5.15

1

> 5.15

VWO_Sc_mean

1

{good, excellent}{n/a, poor, avg,

above avg}0

79% Accuracy

PAGE 15July 2009

Evaluation

• Detailed manual analysis by student counselor:

• Review the classification measure:

− 25% of False Negatives should be true negatives

− How to classify skilled people who leave?

• Improve data transformations

PAGE 16July 2009

Deployment

• Objectives• More robust and objective advices:

− 80% accuracy is possible, clear directions for improvements.

• Try out applicability EDM in this context:

− Enough data (amount)?

− Yes, and more is not easily obtainable

− Enough data (type)?

− Would probably be very useful, but costly.

• Deployment possible after improvements

PAGE 17July 2009

Conclusions and further work

• EDM can help in a study advice process: 80% accuracy is possible, clear directions for improvements.

• EDM can work using small datasets and a limited amount of data categories

• Further work:• Improve data transformations

• Improve classification measure: better two-class, move to three-class

• Review use of additional data

PAGE 18July 2009

Questions?