Automated Writing Evaluation: Enough about reliability...
Transcript of Automated Writing Evaluation: Enough about reliability...
Automated Writing Evaluation: Enough about reliability! What really matters for students and
teachers?
Jooyoung Lee, Zhi Li, Stephanie Link, Hyejin Yang, Volker Hegelheimer
Saturday, September 22, 2012
Beyond reliability
Automated essay scoring (AES)
PEG IEA Intelli-Metric
E-rater
Correspondence with human
rating
61-87% 77-89%
70-85% 85-91% 96-98%
87-97% (Agreement)
45-59% (Exact agreement)
User-centric evaluation
Development of NLP tools for writing-AES
1966
AES for Testing context
late-1990s
AWE for Classroom use
2003
System-centric evaluation
Literature Review: Needs Analysis
• Academic writing for international graduate students (U of Hawaii)
• Process skills • Pre-writing, editing/revising
• Computer skills
• Getting help, finding & using resources
• Discourse/rhetorical skills • Field-specific research paper (sections of the paper)
• Posing research questions (finding niche)
• Grammar “patches”, hedges, connectors
• Style/appropriacy
• Bibliographies/citing/plagiarism (Negretti, 2001)
What research says about AWE • Automated Writing Evaluation tools provide both numerical
scores and formative feedback
• Positive findings: • Motivation (Grimes & Warshauer, 2006)
• Grammar (Chodorow et al., 2010)
• Rhetorical development (Cotos, 2011)
• Negative findings: • Great focus on grammatical and mechanical aspects
• Losing sense of audience (CCCC, 2004)
Motivation (Gap) for Study • Lack of previous studies that investigated stakeholders’ actual
needs
• Inconsistent opinions between AWE users
• Goal: to investigate what the students and teachers actually need in ESL writing classes and how AWE can meet their needs
Research Questions
1. What are the needs of students and teachers in the ESL writing curriculum?
2. What are stakeholders’ views of the current status of AWE?
3. In what ways can AWE improve to meet the needs of students and teachers?
Methodology: setting
§ ESL curriculum at Iowa State University
§ The purpose of English 101 curriculum is: • To prepare undergraduate non-native speakers of English for success in
various written assignments in academic context
• To prepare them for English 150: first-year composition
Methodology: participants
• Coordinators (N = 3)
• One coordinator was also a 101 teacher
• Teachers (N = 6)
• Experienced & inexperienced users of AWE
• Students (N = 167)
• Experienced: 72 Inexperienced: 95
Methodology: data collection and analysis
§ Diverse participants
§ Questionnaires (Descriptive statistics) • 1 questionnaires for experienced students • 1 questionnaire for new students • 1 questionnaire for teachers
§ Interviews (A priori à inductive coding)
• 3 interviews w/ coordinators- 30-60 minutes
§ Feedback tool analysis
(Long, 2005)
RQ1: Needs for students
Students’ view Teachers’ views Coordinators’ view
Local features
q Expressions q Grammar q Organization q Content q Process writing
Global features
q Learner autonomy q Skills in applying
feedback q Skills for writing
improvement (esp. in content development and organization)
Global features
q Learner autonomy q Skills in applying
feedback q Strategies for process
writing q Access to ample
amount of feedback q Genre awareness q Focused feedback
Using reference material, reading for information, synthesizing it to their own essay, and then making their judgment, that is independent learning and if they
can do that on their own [that would be the ultimate goal] (Teacher 2)
“I would say that [students] need little pieces that they need to learn and to not
learn everything at once.” (Coordinator 1)
RQ1: Needs for teachers
Teachers’ views Coordinators’ view
q Evaluating writing q Helping students with grammar q Assisting students in becoming
independent learners. q Other practical needs (platform for
learner community and peer review, integration w/ course management system)
q Reducing workload q Understanding how to provide
feedback q Knowing what feedback to provide q Providing feedback and in a
manageable fashion q Effectively implementing and
integrating technology into classrooms
That fit it really nicely with my preconception of Criterion removing some of the workload of the teachers, so that teachers end up reading better papers.
That’s how I envisioned it initially. (Coordinator 3)
Teachers need help with:
RQ2: Student Views of Criterion
Q: Do you think Criterion helped you write your argumentative paper?
N (=72) %
Yes 63 87.5
No 9 12.5
“I think Criterion cannot really give me some suggestion, so I hope my instructor can give me more suggestions after he/she finish reading my
paper.”
RQ2: Student-Teacher Views of Criterion
Major functions on Criterion
Experienced Teacher (N=3)
M (SD)
Experienced Student (N=72)
M (SD)
Inexperienced Student (N=95)
M (SD)
Feedback on Grammar 5 (1) 4.66 (1.00) 4.74 (1.03)
Feedback on Usage 5 (1) 4.40 (0.87) 4.59 (1.09)
Feedback on Mechanics
5.33 (0.58) 4.19 (0.87) 4.76 (1.08)
Feedback on Style 4 (1) 3.76 (1.20) 4.38 (1.19)
Feedback on Organization and Development
2.67 (0.58) 3.63 (1.22) 4.52 (1.08)
RQ2: Teacher Views of AWE
Overall positive view with some inconveniences
§ Positive • “The reason that I give high ratings to Grammar, Usage, and
Mechanics is that I believe students can benefit from them if they pay attention to them.” (teacher 1)
• “As Criterion provides feedback repetitively, I hope it can help students learn how to improve their writing skills by themselves”. (teacher 3)
§ Negative • “Although Criterion enables students to save their drafts, one
pitfall is that students can only save the very first and the last drafts, which are not good for students and teachers.” (teacher 4)
RQ2: Teacher Coordinators’ Views of AWE
Likert scale items (1 = not useful, 6=very useful)
Coordinator and Teacher
1
Coordinator 2
Coordinator 3
Grading students' essay 1 1 5
Giving feedback to students 4 5 3
Setting up assignments 1 3 5
Receiving and collecting students' essay 1 3 5
Tracking students' progress 1 4 2
Reducing workload in terms of grading and feedback giving 1.5 3 2
RQ2: Coordinator Views of AWE
• Changes in his attitudes
“My thought was Criterion should be able to alleviate some of the pressure on teachers…hoping that it would remove or take a way some...of the grading burden on the side of teachers. Based
on some of the things we’ve looked at, some of the problems that students had with, or teachers had with Criterion…some of
the inconsistency in terms of grading, recognizing some mistakes, and some of your recent findings .. I’m beginning to
doubt as to whether or not it really helps instructors. I don’t know yet. I’d like to learn more about it. I’m not convinced as I
once was about utility of it. I still think there is..but I think I have to take a deeper look at it.” (Coordinator 3)
RQ3: Suggestions for future AWE Based on current needs and stakeholders’ views
Student Teachers Coordinators
Suggestions q Organization/
style
q Feedback is not comprehensible
q Utility (e.g. save draft / feedback; pop-up notes)
q Learner / Teacher training (tech support / material support)
q Focus more on focused feedback (treatable errors)
“I wish they could see the submissions of other students. I wish they had a feature for peer review.” (Coordinator 1)
RQ3: Suggestions for future AWE
“[Students] need other entity to tell them about their writing to make them look at their writing
again; there’s some good feedback that ESL students can benefit from; it’s not perfect but pretty
good at it.” (Coordinator 2)
Chodorow’s study (2010) citation -> “articles / prepositions” HYEJIN please revise
Implications
• Feedback Categories (Ferris, 2001)
AWE Feedback Treatability
fragment, missing comma treatable
run-on sentences treatable
garbled sentences treatable
SV agreement treatable
ill-formed verb treatable
pronoun errors possessive errors
AWE Feedback Treatability
article errors less treatable
confused words wrong/missing words wrong form of word treatable
faulty comparison nonstandard word form negation error preposition error less treatable
Stakeholders’ needs and AWE Students’ needs
q Expressions q Grammar q Organization q Content q Process writing q Learner autonomy q Skills in applying feedback q Skills for writing improvement
(esp. in content development and organization)
q Access to ample amount of feedback q Genre awareness q Focused feedback
Teachers’ Needs
q Evaluating writing q Helping students with grammar q Assisting students in becoming
independent learners. q Other practical needs q Reducing workload q Understanding how to provide
feedback q Knowing what feedback to provide q Providing feedback and in a
manageable fashion q Effectively implementing and
integrating technology into classrooms
Implications – please write a short note to connect the checklist ….
• “I think [AWE] should be used and we should figure out how to best use it. It may not be perfect for everybody but there is a better way of using it. We just have to find out...so I’m not ready to give up on it.” (Coordinator 3)
References • CCCC. (2004). Position statement on teaching, learning, and assessing writing in digital environments. Retrieved from
http://www.ncte.org/cccc/resources/positions
• Chodorow, M., Gamon, M., & Tetreault, J. (2010). The utility of article and preposition error correction systems for English language learners: Feedback and assessment. Language Testing, 27(3), 419-436.
• Cotos, E. (2011). Potential of automated writing evaluation feedback. CALICO Journal, 28(2), 420-459.
• Grimes, D., & Warschauer, M. (2006, April). Automated essay scoring in the classroom. Paper presented at the American Educational Research Association, San Francisco, California.
• Grimes, D., & Waschauer, M. (2006). Automated essay scoring in the classroom, Paper presented at the American Educational Research Association.
• Grimes, D. & Warschauer, M. (2010). Utility in a Fallible Tool: A Multi-Site Case Study of Automated Writing Evaluation. Journal of Technology, Learning, and Assessment, 8(6). Retrieved [date] from http://www.jtla.org.James, C. L. (2006 ). Validating a computerized scoring system for assessing writing and placing students in composition courses. Assessing Writing, 11(3), 167-178.
• Long, M. (2005). Methodological issues in learner needs analysis research. In H. Long (Ed.), Second Language Needs Analysis. Cambridge: Cambridge University Press.
• Vann, R. J., Meyer, D. E., & Lorenz, F. O. (1984). Error gravity: A study of faculty opinion of ESL errors. TESOL Quarterly, 18(3), 427–440.