2.3 Methods for Big Data What is “Big Data”? Summarizing Big Data.
Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline....
Transcript of Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline....
![Page 1: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/1.jpg)
Big Data in Education
Alex J. Bowers, Ph.D.
Associate Professor of Education Leadership
Teachers College, Columbia University
April 24, 2014
![Page 2: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/2.jpg)
![Page 3: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/3.jpg)
Schutt, R., & O'Neil, C. (2013). Doing Data Science:
Straight Talk from the Frontline. Cambridge, MA:
O'Reilly.
![Page 4: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/4.jpg)
Schutt, R., & O'Neil, C. (2013). Doing Data Science:
Straight Talk from the Frontline. Cambridge, MA:
O'Reilly.
You’re dealing with Big Data when you’re
working with data that doesn’t fit into your
computer unit. Note that makes it an evolving
definition: Big Data has been around for a
long time. . . . Today, Big Data means
working with data that doesn’t fit in one
computer. p.24
![Page 5: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/5.jpg)
“Big” Analysis in Education Leadership
![Page 6: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/6.jpg)
Schutt, R., & O'Neil, C. (2013). Doing Data Science:
Straight Talk from the Frontline. Cambridge, MA:
O'Reilly.
You’re dealing with Big Data when you’re
working with data that doesn’t fit into your
computer unit. Note that makes it an evolving
definition: Big Data has been around for a
long time. . . . Today, Big Data means
working with data that doesn’t fit in one
computer. p.24
A data scientist is someone who knows how
to extract meaning from and interpret data,
which requires both tools and methods from
statistics and machine learning, as well as
being human… Once she gets the data into
shape, a crucial part is exploratory data
analysis, which combines visualization and
data sense… She’ll communicate with
team members, engineers, and leadership in
clear language and with data visualizations
so that even if her colleagues are not
immersed in the data themselves, they will
understand the implications. p.16
![Page 7: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/7.jpg)
Statistician
Data Scientist
http://flowingdata.com/2013/12/18/data-scientist-surpasses-statistician-on-google-trends/
December 18, 2013
Data scientist surpasses statistician on Google
Trends
![Page 8: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/8.jpg)
Mike Bostock & D3:
github.com/mbostock
github.com/mbostock/d3/wiki/Gallery
![Page 9: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/9.jpg)
Kris Erickson http://ericksondata.com/wp/2012/venn-diagram-of-data-science/
![Page 10: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/10.jpg)
But what do school practitioners (the users)
actually say that they need?
![Page 11: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/11.jpg)
But what do school practitioners (the users)
actually say that they need?
Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal Data Use: Ideas for
District, Building and Classroom Leaders. In A.J. Bowers, A.R. Shoho, B. G.
Barnett (Eds.) Using Data in Schools to Inform Leadership and Decision
Making (forthcoming). Charlotte, NC: Information Age Publishing Inc.
![Page 12: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/12.jpg)
Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal
Data Use: Ideas for District, Building and Classroom
Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett
(Eds.) Using Data in Schools to Inform Leadership and
Decision Making (forthcoming). Charlotte, NC:
Information Age Publishing Inc.
![Page 13: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/13.jpg)
Brocato, K., Willis, C., Dechert, K. (2014) Longitudinal
Data Use: Ideas for District, Building and Classroom
Leaders. In A.J. Bowers, A.R. Shoho, B. G. Barnett
(Eds.) Using Data in Schools to Inform Leadership and
Decision Making (forthcoming). Charlotte, NC:
Information Age Publishing Inc.
What components of a statewide longitudinal
data system are needed for that system to best
meet the respective needs of superintendent,
principal and teacher leaders?
![Page 14: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/14.jpg)
Superintendents
Teachers Principals
Longitudinal Data System
What components of a longitudinal data system are needed for that system
to best meet the respective needs of superintendent, principal, and teacher
leaders?
![Page 15: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/15.jpg)
Superintendents (n=55)
Individual Student/Teacher
• Teacher data: performance, attendance, past experience
• Who rides the bus
• Record of parent conferences
• Test scores for students as well as whether or not they have met growth
Comparative
• All district’s comparative test data, district demographics
• Access test data, budget information, comparing per pupil expenditures, demographic data for the local area as compared to the region and state
• Student growth from year to year, demographic group growth
![Page 16: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/16.jpg)
Principals (n=45)
Teacher Data
• Teachers’ past performance and years of experience
• Background of teachers and records showing student test scores broken down by teacher
• Student and teacher achievement and growth data, teacher evaluation information, student attendance and dropout data, staff attendance, accountability information
Student Data
• Achievement information on teachers based on student performance
• Background of teachers being hired; teacher performance data
• Are we able to keep high performing teachers in our district?
![Page 17: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/17.jpg)
Teachers (n=193)
Brocato, Willis & Dechert (2014) p.5
![Page 18: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/18.jpg)
Brocato, Willis & Dechert (2014) p.6
Teachers continued
![Page 19: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/19.jpg)
Figure 1. Summary of eight themes and
corresponding frequency percentage in the
“more frequent” level of category
What components of a longitudinal data system are needed for that system
to best meet the respective needs of superintendent, principal, and teacher
leaders?
![Page 20: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/20.jpg)
Which stages of the Data Science Process do we do well in education?
Where are the challenges?
Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly.
![Page 21: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/21.jpg)
K-12 Schooling Outcomes
• Dropout risk
– Accuracy, precision, sensitivity & specificity
• Longitudinal patterns in teacher assigned grades
• Data driven decision making (3DM)
• Targeted and data-informed resource allocation
to improve schooling outcomes
Bowers, A.J., Sprott, R., Taff, S.A. (2013) Do we Know Who Will Drop Out? A Review of
the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The
High School Journal.96(2), 77-100. http://hdl.handle.net/10022/AC:P:21258
![Page 22: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/22.jpg)
Issues of Accuracy in Predicting High School Dropout
• Dropping out of high school in the U.S. is associated with a multitude of negative outcomes.
• However, other than a select number of demographic and background variables, we know little about the accuracy of current dropout predictors that are school-related.
• Current predictions accurately predict only about 50-60% of students who actually dropout.
• According to Gleason & Dynarski (2002), accurate prediction of who will dropout is a resource and efficiency issue:
– A large percentage of students are mis-identified as at-risk.
– A large percentage of students who are at-risk are never identified.
• Current reporting of dropout “flags” across the literature is haphazard. – Almost none report accuracy
– Many report specificity or sensitivity, but rarely both
• A dropout flag may be highly precise, in that almost all of the students with the flag dropout, but may not be accurate since the flag may identify only a small proportion of the dropouts.
Balfanz et al. (2007); Bowers (2010); Gleason & Dynarski (2002); Pallas (2003); Rumberger (2004)
![Page 23: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/23.jpg)
Re-analyzing Past Dropout Flags for Accuracy,
Precision, Sensitivity and Specificity
• Literature search. • Queried multiple databases:
– JSTOR, Google Scholar, EBSCO, Educational Full Text Wilson Web
• Studies were included that: – Were published since 1979 – Examined High School dropout – Examined school-wide characteristics and included all students – A focus on the student level – Reported frequencies for re-analysis
• Initially yield 6,434 overlapping studies – 301 studies were read in full – 140 provided school-wide samples and quantifiable data – 36 articles provided enough data for accuracy re-calculations – Yield 110 separate dropout flags
• Relative Operating Characteristic (ROC) – Hits versus False-Alarms
![Page 24: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/24.jpg)
Dropout Graduate
Dropout a
True-positive
(TP)
Correct
b
False-positive
(FP)
Type I Error
a+b
Graduate c
False-negative
(FN)
Type II Error
d
True-negative
(TN)
Correct
c+d
a+c b+d a+b+c+d=N
Event
Pre
dic
tor
Precision = a/(a + b) Positive Predictive Value
True-Positive Proportion = a/(a + c) Sensitivity
True-Negative Proportion = d/(b + d) Specificity
False-Positive Proportion = b/(b + d) 1-Specificity
Event table for calculating dropout contingency proportions
Fawcett (2004); Hanley & McNeil (1982); Swets (1988); Zwieg & Campbell (1993)
“Hits”
“False Alarms”
![Page 25: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/25.jpg)
0
0.2
0.4
0.6
0.8
1.0
Tru
e-p
ositiv
e p
roport
ion (
Sensitiv
ity)
0 0.2 0.4 0.6 0.8 1.0
False-positive proportion (1-Specificity)
Perfect prediction
Better prediction
Worse prediction
Low attendance
An example of the true-positive proportion plotted against the false-positive
proportion for Balfanz et al. (2007) comparing the relative operating characteristics
(ROC) of each dropout flag.
Bowers, A.J., Sprott, R., Taff, S.A. (2013)
![Page 26: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/26.jpg)
0
0.2
0.4
0.6
0.8
1.0
Tru
e-p
ositiv
e p
roport
ion (
Sensitiv
ity)
0 0.2 0.4 0.6 0.8 1.0
False-positive proportion (1-Specificity)
Relative operating characteristics (ROC) of all dropout flags reviewed, plotted
as the true-positive proportion against the false-positive proportion. Numbers
refer to dropout indicator IDs.
Bowers, A.J., Sprott, R., Taff, S.A. (2013)
![Page 27: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/27.jpg)
![Page 28: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/28.jpg)
0
0.2
0.4
0.6
0.8
1.0
Tru
e-p
ositiv
e p
roport
ion (
Sensitiv
ity)
0 0.2 0.4 0.6 0.8 1.0
False-positive proportion (1-Specificity)
Relative operating characteristics (ROC) of all dropout flags reviewed, plotted
as the true-positive proportion against the false-positive proportion. Numbers
refer to dropout indicator IDs.
Bowers, A.J., Sprott, R., Taff, S.A. (2013)
![Page 29: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/29.jpg)
0
0.2
0.4
0.6
0.8
1.0
Tru
e-p
ositiv
e p
roport
ion (
Sensitiv
ity)
0 0.2 0.4 0.6 0.8 1.0
False-positive proportion (1-Specificity)
Relative operating characteristics (ROC) of all dropout flags reviewed, plotted
as the true-positive proportion against the false-positive proportion. Numbers
refer to dropout indicator IDs.
Bowers, A.J., Sprott, R., Taff, S.A. (2013)
![Page 30: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/30.jpg)
http://itp.wceruw.org/documents/KnowlesDropoutEarlyWarningSystemforITPJan2013.pdf
Knowles, J. (2014)
![Page 31: Big Data in Education - Harvard University · Doing Data Science: Straight Talk from the Frontline. Cambridge, MA: O'Reilly. You’re dealing with Big Data when you’re working with](https://reader034.fdocuments.in/reader034/viewer/2022050323/5f7d07c4a1492c42244637b1/html5/thumbnails/31.jpg)
Knowles, J. (2014)
http://itp.wceruw.org/documents/KnowlesDropoutEarlyWarningSystemforITPJan2013.pdf