Concept Graph Learning from Educational Datahanxiaol/slides/wsdm2015-liu.pdf · 2016. 6. 8. ·...
Transcript of Concept Graph Learning from Educational Datahanxiaol/slides/wsdm2015-liu.pdf · 2016. 6. 8. ·...
Concept Graph Learning from Educational Data
Yiming Yang, Hanxiao Liu, Jaime Carbonell and Wanli Ma
School of Computer ScienceCarnegie Mellon University
February 3, 2015
WSDM 2015 Concept Graph Learning from Educational Data 1
Outline of the Talk
MotivationConcept Representation SchemesConcept Graph LearningExperiments & Empirical ResultsFuture Work
WSDM 2015 Concept Graph Learning from Educational Data 2
IntroductionMotivation
Scenario: Massive course materials are online available fromdifferent course providers
Universities, Coursera, Edx, MIT OpenCourseWare ...Challenge: How to integrate the scattered information?
A CMU graduate: “After completing courses A, B on Coursera,what course shall I take next at CMU?”
Lack a method to measure the course overlapping and thecourse prerequisite relations across institutions.
We address this by putting cross-institutional courses under acanonical language—concept.
WSDM 2015 Concept Graph Learning from Educational Data 3
IntroductionConcept Graph Learning
E&M
Differential Eq
Algorithms
Num Analysis
Matrix AQuantum
Calculus
Mechanics
Java Prog
Matrix A Topology
Scalable Algs
Courses in University 1 Courses in University 2
Universal Concepts (e.g. Wikipedia Topics)
Goal: Learning a universal graph of concepts based on1 Course-level prerequisite relations2 Concept representation of courses
WSDM 2015 Concept Graph Learning from Educational Data 4
Outline of the Talk
MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work
WSDM 2015 Concept Graph Learning from Educational Data 5
Representation SchemesWord-based Representation
⇓ crawling and parsing<course><id>6.080</id><name>Great Ideas in Theoretical Computer Science</name><tag>Electrical Engineering and Computer Science</tag><description>This course provides a challenging introduction to some of the central ideas of
theoretical computer science. It attempts ... </description><keywords>computer science,theoretical computer science,logic,turing machines,computability,
finite automata,godel,complexity,polynomial time,efficient algorithms ... </keywords><calendar>Introduction # Logic # Circuits and finite automata # Turing machines # Reducibility
and Godel # Minds and machines # Complexity # Polynomial time # P and NP # NP-completeness # NP-completeness in practice # Space complexity and more ... </calendar>
</course>
Concepts def= WordsVocabularies not controlled
CMU 10-715: shattering coefficient; MIT 15.097: growth function
Words are in multiple granularities =⇒ interpretability ↓WSDM 2015 Concept Graph Learning from Educational Data 6
Representation SchemesCategory-based Representation
Bag of Words Bag of Wikipedia Categories
Wikipedia Classifier=⇒
Concepts def= Wikipedia CategoriesImproved interpretabilityControlled vocabulary at the right granularity
Shattering number, Growth function Wikipedia Classifier=⇒Computational Learning Theory
Leverage oceans of knowledge in Wikipedia
WSDM 2015 Concept Graph Learning from Educational Data 7
Representation SchemesLatent Space Representation
Schemes based on dimensionality reductionSparse Coding of Words
Trained on the given courses—purely unsupervisedDistributed Word Embedding
Trained on Wikipedia articles—leverages exterior info
Concepts def= Dimensionality-reduced vectorsControlled “vocabulary”
Words are mapped onto a unified latent spaceConcept granularity can be controlled by latent dimensionalityHard to interpret
WSDM 2015 Concept Graph Learning from Educational Data 8
Outline of the Talk
MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work
WSDM 2015 Concept Graph Learning from Educational Data 9
Problem Formulation
E&M
Differential Eq
Algorithms
Num Analysis
Matrix AQuantum
Calculus
Mechanics
Java Prog
Matrix A Topology
Scalable Algs
Courses in University 1 Courses in University 2
Universal Concepts (e.g. Wikipedia Topics)
Observed course-level relations OConcept representation of courses X n by p
Concept graph A p by p
How to evaluate A?1 Map A to an estimated course graph Θ (n by n) via X .2 Then, evaluate the quality of Θ with O.
WSDM 2015 Concept Graph Learning from Educational Data 10
Problem Formulation
How to map the concept graph A to an estimated course graph Θ?
—through a bilinear form:
Θ def= XAX>
Explanation:θij
def=∑u,v
auvxiuxjv
θij : strength from course j to course iauv : strength from concept v to concept u
Each course-level prerequisite θij is defined as the cumulative effectof multiple concept-level prerequisites
∑u,v auvxiuxjv
WSDM 2015 Concept Graph Learning from Educational Data 11
Problem Formulation
How to evaluate the estimated course graph Θ with our observedcourse-level relations O?
i.e. how to define the loss function over Θ w.r.t. O?
ProblemOnly positive examples are availableTreating unobserved course relations as negative examplesleads to highly skewed label set
Solution: rankingWe hope θij > θik if j ∈ prereq (i) and k 6∈ prereq (i)
WSDM 2015 Concept Graph Learning from Educational Data 12
OptimizationOptimization Objective
CGL objective with p2 variables
minA∈Rp×p
C∑
(i,j,k)` (θij − θik) + 1
2‖A‖2F
s.t. Θ = XAX>
ProblemA can be a huge dense matrix (e.g. is 15,396 by 15,396 forwords-based concept representation)Dual space? #dual variables = O
(n3)
Solution: derive an equivalent optimization problem with only n2
(n2 � p2) variables.
WSDM 2015 Concept Graph Learning from Educational Data 13
OptimizationVariable Reduction
Introduce slack variable S ∈ Rn×n for constraint Θ = XAX>.The Lagrangian is
L = C∑
(i,j,k)` (θij − θik) + 1
2‖A‖2F +
⟨S,Θ−XAX>
⟩∂L∂A should vanish at the stationary point
=⇒ A∗ = X>S∗>XA∗ ∈ Rp×p only has n2 (n2 � p2) degrees of freedom!
Equivalent CGL objective with n2 variables
minS∈Rn×n
C∑
(i,j,k)` (θij − θik) + 1
2tr(ΘS>
)s.t. Θ = KSK
WSDM 2015 Concept Graph Learning from Educational Data 14
OptimizationLoss Function & Optimization Solver
We choose the squared hinge loss ` (x) = (max (1− x, 0))2
large-margin property: strong generalization abilitysmoothness: allows Nesterov’s accelerated GD
GD: 37.3min & 1490 iterations on MITaccelerated GD: 3.08 min & 103 iterations on MIT
An alternative of GD: Inexact Newton MethodTo avoid the huge Hessian—use a matrix-free ConjugateGradient to compute the Newton direction
WSDM 2015 Concept Graph Learning from Educational Data 15
Outline of the Talk
MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work
WSDM 2015 Concept Graph Learning from Educational Data 16
ExperimentsDatasets & Evaluation
Table : Datasets Statistics1
University Department # Courses # Prerequisites # WordsMIT2 * 2322 1173 15396
Caltech * 1048 761 5617CMU CS, STATS 83 150 1955
Princeton MATH 56 90 454
Metrics for evaluation: MAP and AUC
1available at http://nyc.lti.cs.cmu.edu/teacher/dataset/2MIT OpenCourseWare http://ocw.mit.edu/index.htm
WSDM 2015 Concept Graph Learning from Educational Data 17
ExperimentsComparison among Concept Representation Schemes
0.00
0.20
0.40
0.60
0.80
1.00
Word Cat. SCW DWE
(a) CGL.Rank on MIT Data
AUC MAP
0.00
0.20
0.40
0.60
0.80
1.00
Word Cat. SCW DWE
(b) CGL.Rank on Caltech Data
AUC MAP
0.00
0.20
0.40
0.60
0.80
1.00
Word Cat. SCW DWE
(c) CGL.Rank on CMU Data
AUC MAP
0.00
0.20
0.40
0.60
0.80
1.00
Word Cat. SCW DWE
(d) CGL.Rank on Princeton Data
AUC MAP
Words � Categories � Sparse Coding � Distributed Word Embedding
WSDM 2015 Concept Graph Learning from Educational Data 18
ExperimentsCross-institutional Prerequisite Prediction
A good concept graph should be universal, thus should betransferable across different institutions
0.00
0.20
0.40
0.60
0.80
1.00
MIT
.MIT
Calte
ch.M
IT
Calte
ch.C
alte
ch
MIT
.Calte
ch
CMU.C
MU
MIT
.CMU
Calte
ch.C
MU
Prin
ceto
n.Pr
ince
ton
MIT
.Prin
ceto
n
Calte
ch.P
rince
ton
MA
P
<Traning Set>.<Test Set>
There is always a performance loss if we go across institutions.We do get good transfer.
WSDM 2015 Concept Graph Learning from Educational Data 19
Empirical ResultsConcept Graph for MIT
Randomness
Integral_calculus
Functional_analysis
Probability_theoryLinear_algebra
Numerical_analysis
Applied_mathematics_stubs
Cybernetics
Mathematical_analysis_stubs
Mathematical_optimization
Differential_geometry
Computational_scienceDynamical_systems
Signal_processing
Fourier_analysis
Operations_research
WSDM 2015 Concept Graph Learning from Educational Data 20
Outline of the Talk
MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work
WSDM 2015 Concept Graph Learning from Educational Data 21
Future Work
Deploying the induced concept graph for personalizedcurriculum planning (on-going work)
Student’s academic background/goal def= bag-of-conceptsFind an optimal sequence of courses?
Cross-language transfer learning by using Wikipedia categories(concepts) as the interlingua.
WSDM 2015 Concept Graph Learning from Educational Data 22