Visualizing Topic Flow in Students’ Essays
description
Transcript of Visualizing Topic Flow in Students’ Essays
Intelligent Database Systems Lab
Presenter : WU, MIN-CONG
Authors : STEPHEN T. O’ROURKE , RAFAEL A. CALVO
and Danielle S. McNamara
2011, EST
Visualizing Topic Flow in Students’ Essays
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation• Writing is an important learning activity, essays
Visualizing is important that can help people
assess and improve the quality of essays.
Intelligent Database Systems Lab
Objectives• This paper presents a novel document visualization
technique and a measure of quality based on the
average semantic distance between parts of a
document.
Intelligent Database Systems Lab
Methodology-Mathematical Framework In order to Visualization, so
need to reduce dimension :
term-by- paragraphs
matrix
topic model is created
topic model is projected
visualization of the document’s
paragraphs
Use NMFstop-wordslow frequency words
stemming is applied
2-dimensional space
identify features in the topic
model of the document.
Visualizing Topic Flow
Quantifying Topic Flow
term-by- sentence
matrix
topic model is created
Intelligent Database Systems Lab
Methodology-Visualizing Topic Flow(term-by- paragraphs matrix)
p1 …… pn
i1
.
in
i(term)
j(paragraphs)
If Log-Entropy is large, this word is more import
Term’s Entropy in document
Term’s frequencyIn paragraphs
Intelligent Database Systems Lab
Methodology-Visualizing Topic Flow(NMF dimensionality reduction technique)
Term-by-topic martix(m*r)
Topic-by-paragraphs martix(r*n)
Term-by-paragraphs martix (m*n)
≈
Ex.X(6,2)=w(6,3)*H(3,2)
which can be approximated by minimizing the squared error of the Frobenius norm of X−WH.
number of latent topics
Intelligent Database Systems Lab
Methodology-Visualizing Topic Flow(2-dimensional representation)
P1 .. PjP1 1. 1 ..Pi 1 2 3
paragraph-paragraph triangular distance table
Multidimensional Scaling use in Similarity comparison
iterative majorization algorithm (least-squares)
minimize a loss function(Stress)
between the vector dissimilarities
approximated distances in the low dimensional
Intelligent Database Systems Lab
Methodology-Visualizing Topic Flow(Visualizing Flow )
the diameter of the grid equal to the maximum possible distance between any two paragraphs
Paragraphs
Next paragraphs
node-link
introduction
conclusion
Low grade High grade, Why?Because:1. paragraphs appear close,2. ‘introduction’ and‘conclusion’ is similar
The degree of deviate from a circle
Intelligent Database Systems Lab
Methodology-Quantifying Topic Flow
Semantic distances between consecutive pairs of sentences or paragraphs
Double average over all the pairs of sentences or paragraphs
DI <=0, indicates a random topic flowDI> 0, indicates the presence of topic flow.
Intelligent Database Systems Lab
Experiment - Evaluation 1: Flow and Grades(Experiment Dataset)
Dataset:120 essays written for assignments by undergraduate students at Mississippi State University
Essay grades :1-6 level
Subset:High:67(1-3)Low:53(3.2-6)
k(number of topic):5
Average word Averagesentence
Average paragraphy
Each essay 726.60(114.37) 40.03(8.29)
5.55(1.32)
Intelligent Database Systems Lab
Experiment - Evaluation 1: Flow and Grades (Measuring Topic Flow )
less present using either of the dimensionality reduction techniques
P<0.05
P>0.05
Intelligent Database Systems Lab
Experiment - Evaluation 1: Flow and Grades (Measuring Topic Flow )
Measure the correlation
Intelligent Database Systems Lab
Experiment - Evaluation 2: Supporting Assessment(Methodology)1.inter-rater agreement that the tutors had with two expert raters.2. The two tutors independently marked assignments with map and no map
hypothesized : Essay’s agreement can be subjectively assessed faster, more accurately, and more consistently with map.
answer
Intelligent Database Systems Lab
Experiment - Evaluation 2: Supporting Assessment(Essay Subset Preparation )The 40 essays remaining were divided into two subsets of 20 essays eachaccording to the MASUS procedure to assess
subest1 subest2
Intelligent Database Systems Lab
Experiment - Evaluation 2: Supporting Assessment(Results) Rater1:native English speaker
Rater2: non-native English speaker
In order to eliminate the effect of essay length
Intelligent Database Systems Lab
Conclusions• Tutors assess the essays faster and more accurately
and consistently with the aid of topic flow
visualization.
Intelligent Database Systems Lab
Comments• Advantages– effectively discover market intelligence (MI) for
supporting decision-makers.• Applications– Document visualizations.