Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.
Transcript of Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.
![Page 1: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/1.jpg)
Educational Data Mining
March 3, 2010
![Page 2: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/2.jpg)
Today’s Class
• EDM• Assignment#5• Mega-Survey
![Page 3: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/3.jpg)
Educational Data Mining
• “Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.” – www.educationaldatamining.org
![Page 4: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/4.jpg)
Classes of EDM Method(Romero & Ventura, 2007)
• Information Visualization• Web mining– Clustering, Classification, Outlier Detection– Association Rule Mining/Sequential Pattern
Mining– Text Mining
![Page 5: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/5.jpg)
Classes of EDM Method(Baker & Yacef, 2009)
• Prediction• Clustering• Relationship Mining• Discovery with Models• Distillation of Data For Human Judgment
![Page 6: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/6.jpg)
Prediction
• Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables)
• Which students are using CVS?• Which students will fail the class?
![Page 7: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/7.jpg)
Clustering
• Find points that naturally group together, splitting full data set into set of clusters
• Usually used when nothing is known about the structure of the data– What behaviors are prominent in domain?– What are the main groups of students?
![Page 8: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/8.jpg)
Relationship Mining
• Discover relationships between variables in a data set with many variables– Association rule mining– Correlation mining– Sequential pattern mining– Causal data mining
• Beck & Mostow (2008) article is a great example of this
![Page 9: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/9.jpg)
Discovery with Models
• Pre-existing model (developed with EDM prediction methods… or clustering… or knowledge engineering)
• Applied to data and used as a component in another analysis
![Page 10: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/10.jpg)
Distillation of Data for Human Judgment
• Making complex data understandable by humans to leverage their judgment
• Text replays are a simple example of this
![Page 11: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/11.jpg)
Focus of today’s class
• Prediction• Clustering• Relationship Mining• Discovery with Models• Distillation of Data For Human Judgment
• There will be a term-long class on this, taught by Joe Beck, in coordination with Carolina Ruiz’s Data Mining class, in a future year– Strongly recommended
![Page 12: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/12.jpg)
Prediction
• Pretty much what it says
• A student is using a tutor right now.Is he gaming the system or not?
• A student has used the tutor for the last half hour.How likely is it that she knows the knowledge component in the next step?
• A student has completed three years of high school.What will be her score on the SAT-Math exam?
![Page 13: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/13.jpg)
Two Key Types of Prediction
This slide adapted from slide by Andrew W. Moore, Google http://www.cs.cmu.edu/~awm/tutorials
![Page 14: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/14.jpg)
Classification
• General Idea• Canonical Methods• Assessment• Ways to do assessment wrong
![Page 15: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/15.jpg)
Classification
• There is something you want to predict (“the label”)
• The thing you want to predict is categorical– The answer is one of a set of categories, not a number
– CORRECT/WRONG (sometimes expressed as 0,1)– HELP REQUEST/WORKED EXAMPLE
REQUEST/ATTEMPT TO SOLVE– WILL DROP OUT/WON’T DROP OUT– WILL SELECT PROBLEM A,B,C,D,E,F, or G
![Page 16: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/16.jpg)
Classification
• Associated with each label are a set of “features”, which maybe you can use to predict the label
Skill pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….
![Page 17: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/17.jpg)
Classification
• The basic idea of a classifier is to determine which features, in which combination, can predict the label
Skill pknow time totalactions rightENTERINGGIVEN 0.704 9 1 WRONGENTERINGGIVEN 0.502 10 2 RIGHTUSEDIFFNUM 0.049 6 1 WRONGENTERINGGIVEN 0.967 7 3 RIGHTREMOVECOEFF 0.792 16 1 WRONGREMOVECOEFF 0.792 13 2 RIGHTUSEDIFFNUM 0.073 5 2 RIGHT….
![Page 18: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/18.jpg)
Classification
• Of course, usually there are more than 4 features
• And more than 7 actions/data points
• I’ve recently done analyses with 800,000 student actions, and 26 features
![Page 19: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/19.jpg)
Classification
• Of course, usually there are more than 4 features
• And more than 7 actions/data points
• I’ve recently done analyses with 800,000 student actions, and 26 features
• 5 years ago that would’ve been a lot of data• These days, in the EDM world, it’s just a
medium-sized data set
![Page 20: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/20.jpg)
Classification
• One way to classify is with a Decision Tree (like J48)
PKNOW
TIME TOTALACTIONS
RIGHT RIGHTWRONG WRONG
<0.5 >=0.5
<6s. >=6s. <4 >=4
![Page 21: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/21.jpg)
Classification
• One way to classify is with a Decision Tree (like J48)
PKNOW
TIME TOTALACTIONS
RIGHT RIGHTWRONG WRONG
<0.5 >=0.5
<6s. >=6s. <4 >=4
Skill pknow time totalactions rightCOMPUTESLOPE 0.544 9 1 ?
![Page 22: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/22.jpg)
Classification
• Another way to classify is with step regression
• Linear regression (discussed later), with a cut-off
![Page 23: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/23.jpg)
And of course…
• There are lots of other classification algorithms you can use...
• SMO (support vector machine)
• In your favorite Machine Learning package
![Page 24: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/24.jpg)
And of course…
• There are lots of other classification algorithms you can use...
• SMO (support vector machine)
• In your favorite Machine Learning package– WEKA
![Page 25: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/25.jpg)
And of course…
• There are lots of other classification algorithms you can use...
• SMO (support vector machine)
• In your favorite Machine Learning package– WEKA– RapidMiner
![Page 26: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/26.jpg)
And of course…
• There are lots of other classification algorithms you can use...
• SMO (support vector machine)
• In your favorite Machine Learning package– WEKA– RapidMiner– KEEL
![Page 27: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/27.jpg)
And of course…
• There are lots of other classification algorithms you can use...
• SMO (support vector machine)
• In your favorite Machine Learning package– WEKA– RapidMiner– KEEL– RapidMiner
![Page 28: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/28.jpg)
And of course…
• There are lots of other classification algorithms you can use...
• SMO (support vector machine)
• In your favorite Machine Learning package– WEKA– RapidMiner– KEEL– RapidMiner– RapidMiner
![Page 29: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/29.jpg)
And of course…
• There are lots of other classification algorithms you can use...
• SMO (support vector machine)
• In your favorite Machine Learning package– WEKA– RapidMiner– KEEL– RapidMiner– RapidMiner– RapidMiner
![Page 30: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/30.jpg)
Comments? Questions?
![Page 31: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/31.jpg)
How can you tell if a classifier is any good?
![Page 32: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/32.jpg)
How can you tell if a classifier is any good?
• What about accuracy?
• # correct classifications total number of classifications
• 9200 actions were classified correctly, out of 10000 actions = 92% accuracy, and we declare victory.
![Page 33: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/33.jpg)
What are some limitations of accuracy?
![Page 34: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/34.jpg)
Biased training set
• What if the underlying distribution that you were trying to predict was:
• 9200 correct actions, 800 wrong actions• And your model predicts that every action is
correct
• Your model will have an accuracy of 92%• Is the model actually any good?
![Page 35: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/35.jpg)
What are some alternate metrics you could use?
![Page 36: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/36.jpg)
What are some alternate metrics you could use?
• Kappa
(Accuracy – Expected Accuracy) (1 – Expected Accuracy)
![Page 37: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/37.jpg)
What are some alternate metrics you could use?
• A’
• The probability that if the model is given an example from each category, it will accurately identify which is which
![Page 38: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/38.jpg)
Comparison
• Kappa– easier to compute– works for an unlimited number of categories– wacky behavior when things are worse than
chance– difficult to compare two kappas in different data
sets (K=0.6 is not always better than K=0.5)
![Page 39: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/39.jpg)
Comparison
• A’– more difficult to compute– only works for two categories (without
complicated extensions)– meaning is invariant across data sets (A’=0.6 is
always better than A’=0.55)– very easy to interpret statistically
![Page 40: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/40.jpg)
Comments? Questions?
![Page 41: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/41.jpg)
What data set should you generally test on?
• A vote…– Raise your hands as many times as you like
![Page 42: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/42.jpg)
What data set should you generally test on?
• The data set you trained your classifier on• A data set from a different tutor• Split your data set in half (by students), train on
one half, test on the other half• Split your data set in ten (by actions). Train on
each set of 9 sets, test on the tenth. Do this ten times.
• Votes?
![Page 43: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/43.jpg)
What data set should you generally test on?
• The data set you trained your classifier on• A data set from a different tutor• Split your data set in half (by students), train on
one half, test on the other half• Split your data set in ten (by actions). Train on
each set of 9 sets, test on the tenth. Do this ten times.
• What are the benefits and drawbacks of each?
![Page 44: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/44.jpg)
The dangerous one(though still sometimes OK)
• The data set you trained your classifier on
• If you do this, there is serious danger of over-fitting
![Page 45: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/45.jpg)
The dangerous one(though still sometimes OK)
• You have ten thousand data points. • You fit a parameter for each data point.• “If data point 1, RIGHT. If data point 78,
WRONG…”• Your accuracy is 100%• Your kappa is 1
• Your model will neither work on new data, nor will it tell you anything.
![Page 46: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/46.jpg)
The dangerous one(though still sometimes OK)
• The data set you trained your classifier on
• When might this one still be OK?
![Page 47: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/47.jpg)
K-fold cross validation (standard)
• Split your data set in ten (by action). Train on each set of 9 sets, test on the tenth. Do this ten times.
• What can you infer from this?
![Page 48: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/48.jpg)
K-fold cross validation (standard)
• Split your data set in ten (by action). Train on each set of 9 sets, test on the tenth. Do this ten times.
• What can you infer from this?– Your detector will work with new data from the
same students
![Page 49: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/49.jpg)
K-fold cross validation (student-level)
• Split your data set in half (by student), train on one half, test on the other half
• What can you infer from this?
![Page 50: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/50.jpg)
K-fold cross validation (student-level)
• Split your data set in half (by student), train on one half, test on the other half
• What can you infer from this?– Your detector will work with data from new
students from the same population (whatever it was)
![Page 51: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/51.jpg)
A data set from a different tutor
• The most stringent test• When your model succeeds at this test, you
know you have a good/general model• When it fails, it’s sometimes hard to know why
![Page 52: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/52.jpg)
An interesting alternative
• Leave-out-one-tutor-cross-validation (cf. Baker, Corbett, & Koedinger, 2006)– Train on data from 3 or more tutors– Test on data from a different tutor– (Repeat for all possible combinations)
– Good for giving a picture of how well your model will perform in new lessons
![Page 53: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/53.jpg)
Comments? Questions?
![Page 54: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/54.jpg)
Regression
![Page 55: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/55.jpg)
Regression
• There is something you want to predict (“the label”)
• The thing you want to predict is numerical
– Number of hints student requests– How long student takes to answer– What will the student’s test score be
![Page 56: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/56.jpg)
Regression
• Associated with each label are a set of “features”, which maybe you can use to predict the label
Skill pknow time totalactions numhintsENTERINGGIVEN 0.704 9 1 0ENTERINGGIVEN 0.502 10 2 0USEDIFFNUM 0.049 6 1 3ENTERINGGIVEN 0.967 7 3 0REMOVECOEFF 0.792 16 1 1REMOVECOEFF 0.792 13 2 0USEDIFFNUM 0.073 5 2 0….
![Page 57: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/57.jpg)
Regression
• The basic idea of regression is to determine which features, in which combination, can predict the label’s value
Skill pknow time totalactions numhintsENTERINGGIVEN 0.704 9 1 0ENTERINGGIVEN 0.502 10 2 0USEDIFFNUM 0.049 6 1 3ENTERINGGIVEN 0.967 7 3 0REMOVECOEFF 0.792 16 1 1REMOVECOEFF 0.792 13 2 0USEDIFFNUM 0.073 5 2 0….
![Page 58: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/58.jpg)
Linear Regression
• The most classic form of regression is linear regression– Alternatives include Poisson regression, Neural
Networks...
![Page 59: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/59.jpg)
Linear Regression
• The most classic form of regression is linear regression
• Numhints = 0.12*Pknow + 0.932*Time – 0.11*Totalactions
Skill pknow time totalactions numhintsCOMPUTESLOPE 0.544 9 1 ?
![Page 60: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/60.jpg)
Linear Regression
• Linear regression only fits linear functions (except when you apply transforms to the input variables, which RapidMiner can do for you…)
![Page 61: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/61.jpg)
Linear Regression
• However…
• It is blazing fast
• It is often more accurate than more complex models, particularly once you cross-validate– Data Mining’s “Dirty Little Secret”
• It is feasible to understand your model(with the caveat that the second feature in your model is in the context of the first feature, and so on)
![Page 62: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/62.jpg)
Example of Caveat
• Let’s study a classic example
![Page 63: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/63.jpg)
Example of Caveat
• Let’s study a classic example• Drinking too much prune nog at a party, and
having an emergency trip to the Little Researcher’s Room
![Page 64: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/64.jpg)
Data
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of drinks of prune nog
Num
ber o
f em
erge
ncie
s
![Page 65: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/65.jpg)
Data
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of drinks of prune nog
Num
ber o
f em
erge
ncie
s Some people are resistent to the deletrious effects of prunes and can safely enjoy high quantities of prune nog!
![Page 66: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/66.jpg)
Learned Function
• Probability of “emergency”=0.25 * # Drinks of nog last 3 hours- 0.018 * (Drinks of nog last 3 hours)2
• But does that actually mean that (Drinks of nog last 3 hours)2 is associated with less “emergencies”?
![Page 67: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/67.jpg)
Learned Function
• Probability of “emergency”=0.25 * # Drinks of nog last 3 hours- 0.018 * (Drinks of nog last 3 hours)2
• But does that actually mean that (Drinks of nog last 3 hours)2 is associated with less “emergencies”?
• No!
![Page 68: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/68.jpg)
Example of Caveat
• (Drinks of nog last 3 hours)2 is actually positively correlated with emergencies!– r=0.59
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of drinks of prune nog
Num
ber o
f em
erge
ncie
s
![Page 69: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/69.jpg)
Example of Caveat
• The relationship is only in the negative direction when (Drinks of nog last 3 hours) is already in the model…
0 1 2 3 4 5 6 7 8 9 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of drinks of prune nog
Num
ber o
f em
erge
ncie
s
![Page 70: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/70.jpg)
Example of Caveat
• So be careful when interpreting linear regression models (or almost any other type of model)
![Page 71: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/71.jpg)
Comments? Questions?
![Page 72: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/72.jpg)
Discovery with Models
![Page 73: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/73.jpg)
Why do Discovery with Models?
• Let’s say you have a model of some construct of interest or importance– Knowledge– Meta-Cognition– Motivation– Affect– Inquiry Skill– Collaborative Behavior– Etc.
![Page 74: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/74.jpg)
Why do Discovery with Models?• You can use that model to– Find outliers of interest by finding out where the model
makes extreme predictions– Inspect the model to learn what factors are involved in
predicting the construct– Find out the construct’s relationship to other constructs
of interest, by studying its correlations/associations/causal relationships with data/models on the other constructs
– Study the construct across contexts or students, by applying the model within data from those contexts or students
– And more…
![Page 75: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/75.jpg)
Most frequently
• Done using prediction models
• Though other types of models (in particular knowledge engineering models) are amenable to this as well!
![Page 76: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/76.jpg)
Boosting
![Page 77: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/77.jpg)
Boosting• Let’s say that you have 300 labeled actions randomly sampled
from 600,000 overall actions – Not a terribly unusual case, in these days of massive data sets, like
those in the PSLC DataShop
• You can train the model on the 300, cross-validate it, and then apply it to all 600,000
• And then analyze the model across all actions– Makes it possible to study larger-scale problems than a human could
do without computer assistance– Especially nice if you have some unlabeled data set with nice
properties• For example, additional data such as questionnaire data
(cf. Baker, Walonoski, Heffernan, Roll, Corbett, & Koedinger, 2008)
![Page 78: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/78.jpg)
However…
• To do this and trust the result,• You should validate that the model can
transfer across students, populations, and to the learning software you’re using– As discussed earlier
![Page 79: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/79.jpg)
A few examples…
![Page 80: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/80.jpg)
Middle School Gaming Detector
HARDEST SKILLS (pknow<20%)
EASIEST SKILLS (pknow>90%)
GAMEDHURT
12% of the time
2% of the time
GAMED NOT HURT
2% of the time
4% of the time
![Page 81: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/81.jpg)
Skills from the Algebra Tutor
skill L0 T
AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01
ApplyExponentExpandExponentsevalradicalE 0.333 0.497
CalculateEliminateParensTypeinSkillElimi 0.979 0.001
CalculatenegativecoefficientTypeinSkillM 0.953 0.001
Changingaxisbounds 0.01 0.01
Changingaxisintervals 0.01 0.01
ChooseGraphicala 0.001 0.306
combineliketermssp 0.943 0.001
Initial probability of knowing skill
Probability of learning skill at each opportunity
![Page 82: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/82.jpg)
Which skills could probably be removed from the tutor?
skill L0 T
AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01
ApplyExponentExpandExponentsevalradicalE 0.333 0.497
CalculateEliminateParensTypeinSkillElimi 0.979 0.001
CalculatenegativecoefficientTypeinSkillM 0.953 0.001
Changingaxisbounds 0.01 0.01
Changingaxisintervals 0.01 0.01
ChooseGraphicala 0.001 0.306
combineliketermssp 0.943 0.001
![Page 83: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/83.jpg)
Which skills could use better instruction?
skill L0 T
AddSubtractTypeinSkillIsolatepositiveIso 0.01 0.01
ApplyExponentExpandExponentsevalradicalE 0.333 0.497
CalculateEliminateParensTypeinSkillElimi 0.979 0.001
CalculatenegativecoefficientTypeinSkillM 0.953 0.001
Changingaxisbounds 0.01 0.01
Changingaxisintervals 0.01 0.01
ChooseGraphicala 0.001 0.306
combineliketermssp 0.943 0.001
![Page 84: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/84.jpg)
Comments? Questions?
![Page 85: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/85.jpg)
A lengthier example (if there’s time)
• Applying Baker et al’s (2008) gaming detector across contexts
![Page 86: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/86.jpg)
Research Question• Do students game the system because of state or trait
factors?
• If trait factors are the main explanation, differences between students will explain much of the variance in gaming
• If state factors are the main explanation, differences between lessons could account for many (but not all) state factors, and explain much of the variance in gaming
• So: is the student or the lesson a better predictor of gaming?
![Page 87: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/87.jpg)
Application of Detector
• After validating its transfer
• We applied the gaming detector across 35 lessons, used by 240 students, from a single Cognitive Tutor
• Giving us, for each student in each lesson, a gaming frequency
![Page 88: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/88.jpg)
Model
• Linear Regression models
• Gaming frequency = Lesson + a0
• Gaming frequency = Student + a0
![Page 89: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/89.jpg)
Model• Categorical variables transformed to a set of
binaries
• i.e. Lesson = Scatterplot becomes• 3DGeometry = 0• Percents = 0• Probability = 0• Scatterplot = 1• Boxplot = 0• Etc…
![Page 90: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/90.jpg)
Metrics
![Page 91: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/91.jpg)
r2
• The correlation, squared• The proportion of variability in the data set
that is accounted for by a statistical model
![Page 92: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/92.jpg)
r2
• The correlation, squared• The proportion of variability in the data set
that is accounted for by a statistical model
![Page 93: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/93.jpg)
r2
• However, a limitation
• The more variables you have, the more variance you should be expected to predict, just by chance
![Page 94: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/94.jpg)
r2
• We should expect• 240 students • To predict gaming better than• 35 lessons
• Just by overfitting
![Page 95: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/95.jpg)
So what can we do?
![Page 96: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/96.jpg)
BiC
• Bayesian Information Criterion(Raftery, 1995)
• Makes trade-off between goodness of fit and flexibility of fit (number of parameters)
![Page 97: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/97.jpg)
Predictors
![Page 98: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/98.jpg)
The Lesson
• Gaming frequency = Lesson + a0
• 35 parameters
• r2 = 0.55• BiC’ = -2370– Model is significantly better than chance would
predict given model size & data set size
![Page 99: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/99.jpg)
The Student
• Gaming frequency = Student + a0
• 240 parameters
• r2 = 0.16• BiC’ = 1382– Model is worse than chance would predict given
model size & data set size!
![Page 100: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/100.jpg)
Standard deviation bars, not standard error bars
![Page 101: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/101.jpg)
Comments? Questions?
![Page 102: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/102.jpg)
EDM – where?Holistic
Entitative
Existentialist Essentialist
![Page 103: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/103.jpg)
Today’s Class
• EDM• Assignment#5• Mega-Survey
![Page 104: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/104.jpg)
Any questions?
![Page 105: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/105.jpg)
Today’s Class
• EDM• Assignment#5• Mega-Survey
![Page 106: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/106.jpg)
Mega-Survey
• I need a volunteer to bring these surveys to Jim Doyle after class
• *NOT THE REGISTRAR*
![Page 107: Educational Data Mining March 3, 2010. Today’s Class EDM Assignment#5 Mega-Survey.](https://reader035.fdocuments.in/reader035/viewer/2022062421/56649ce15503460f949abe35/html5/thumbnails/107.jpg)
Mega-Survey Additional Questions(See back)
• #1: In future years, should this class be given 1: In half a semester, as part of a unified semester class, along with Professor Skorinko’s Research Methods class3: Unsure/neutral5: As a full-semester class, with Professor Skorinko’s class as a prerequisite
#2: Are there any topics you think should be dropped from this class? [write your answer in the space to the right]
#3: Are there any topics you think should be added to this class? [write your answer in the space to the right]