Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers.
-
Upload
jayden-score -
Category
Documents
-
view
213 -
download
0
Transcript of Predicting Students Drop Out: a Casestudy Gerben Dekker, Mykola Pechenizkiy and Jan Vleeshouwers.
The Case Study
• Educational Data Mining in a practical setting• Directed to a student advice procedure• Eindhoven University of Technology, Electrical
Engineering department
The Case Study: advice procedure
PAGE 3July 2009
Exam results
Pre-university student
information
September
October
November
December
January
EXAMS
EXAMS
HOLIDAY
EXAMS
Exam results
ADVICE
STUDENTS
30% 70%
DEADLINE
Talks w
ith stu
den
ts etc.
Outline
• CRISP-DM Framework• Understanding of context
• Data understanding
• Data preparation
• Modeling
• Evaluation
• Deployment
• Conclusions and further work
PAGE 4July 2009
CRISP-DM Framework
• Understanding of context• Data understanding• Data preparation• Modeling• Evaluation• Deployment
PAGE 5July 2009
Understanding of context
• Situation at Electrical Engineering, Eindhoven University of Technology
• 40% dropout rate, small inflow
• Decision to dropout preferably before end of January
• Study advice by student counselor
• Objective for the department:• More robust and objective advices
PAGE 6July 2009
Understanding of context
• In data mining terms:• Build model for academic success of a student
• Based on the currently available information
• Only information until December of year of enrollment.
• Objective for research:• Try out applicability EDM in this context:
− Enough data (amount)?
− Enough data (type)?
PAGE 7July 2009
Data understanding
• Data source• Institutions’ database
− Pre-university data
− University data
• Resulting data• Data from 648 students, from 2001-2009
PAGE 8July 2009
Data preparation (pre-university data)
• Standard preparatory education:• # courses
• Type of courses taken
• Average grades for total, science, and math
• Non-standard previous education:• Type
• Grade
PAGE 9July 2009
Data preparation (university data)
• Courses, grades, # attempts• Many transformations needed:
• Reorganizations
• Partial exams
• Example: Calculus• 2000-2001: 1 examination
• 2001-2006: 2 partial examinations
• 2007-2008: 5 partial examinations, or 1 examination.
PAGE 10July 2009
Modeling (general)
• Classification task• 2 class classification
• Criterion: finish all courses of first year in three years
• Several mining techniques applied• Decision trees (+ensembles), bayesian
classifiers, association rules
• Separate university/pre-university data first
PAGE 11July 2009
Modeling (pre-university data)
• Base line model• One rule classifier
• 68% accuracy using Science_mean
• No significant improvement using other classification techniques
PAGE 12July 2009
Modeling (university data)
• Base line model• One rule classifier
• 75% accuracy using Linear algebra AB
• Significant improvements using other models (80%)
• Decision trees slightly better than other models
PAGE 13July 2009
Modeling (total set)
• Accuracies 80%, using attributes from both subsets
• Improvements using cost matrices• Shape misclassification
• Small trade-offs accuracy and misclassification:
• Accuracy 79%, 52% of errors FP
• Accuracy 76%, 41% of errors FP
• Similarities between models• Linear Algebra AB always root node• Science Mean always high in tree
PAGE 14July 2009
Modeling (decision tree)
LinAlgAB
< 5.5
1
> 5.5
CalcA
< 5.15
1
> 5.15
VWO_Sc_mean
1
{good, excellent}{n/a, poor, avg,
above avg}0
79% Accuracy
PAGE 15July 2009
Evaluation
• Detailed manual analysis by student counselor:
• Review the classification measure:
− 25% of False Negatives should be true negatives
− How to classify skilled people who leave?
• Improve data transformations
PAGE 16July 2009
Deployment
• Objectives• More robust and objective advices:
− 80% accuracy is possible, clear directions for improvements.
• Try out applicability EDM in this context:
− Enough data (amount)?
− Yes, and more is not easily obtainable
− Enough data (type)?
− Would probably be very useful, but costly.
• Deployment possible after improvements
PAGE 17July 2009
Conclusions and further work
• EDM can help in a study advice process: 80% accuracy is possible, clear directions for improvements.
• EDM can work using small datasets and a limited amount of data categories
• Further work:• Improve data transformations
• Improve classification measure: better two-class, move to three-class
• Review use of additional data
PAGE 18July 2009