Language Engineering for Human-Computer Collaborative Assessment
description
Transcript of Language Engineering for Human-Computer Collaborative Assessment
![Page 1: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/1.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Language Engineering for Human-Computer Collaborative Assessment
Mary McGee Wood
John SargeantPhil Reed, Craig Jones
School of Computer Science
![Page 2: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/2.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
The Assess by Computer (ABC) project
• Tools for setting, taking, and marking exams and for admin tasks
• Internally funded by the University of Manchester
• In use for diagnostic, formative, and “high stakes” summative tests, locally and remotely
• HCCA philosophy throughout
• Started as a pragmatic development; gradually turning into a research project.
![Page 3: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/3.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
The problem
• “Every hour of marking is an hour less of life”
• But: we mostly want students’ answers to be constructions, not selections…
• … and accurate autonomous marking of constructed answers (for content) is infeasible.
• And… we also need to improve the quality and accountability of assessment.
![Page 4: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/4.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Current systems
• Commercially available tools (e.g. QMP) have little or no support for constructed answers (and have other disadvantages)
• Substantial work on Automated Essay Scoring in the States, especially Educational Testing Service (ETS), Princeton, USA
• E-rater – “Essay rater” – concentrates on style and language use.
• C-rater – “Concept rater” – looks at the factual content of answers (85% CH agreement, 92% HH agreement).
![Page 5: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/5.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
HCC – the key idea
• Fully Automatic High Quality Machine Translation (FAHQMT) was never realistic
• FAHQM Anything is probably neither possible nor reasonable
• Aim to exploit the complementary strengths of the system and the user
![Page 6: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/6.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
HCC assessment
• Assessment is a collaborative process where human and program each do what they are good at.
• Answer Representation (AR) grows dynamically during the marking process.
• Aim is to improve both speed and quality of assessment.
![Page 7: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/7.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
HHC software development
• Software development can be a collaborative process where developers and users each do what they are good at.
• System functionality grows dynamically during the use-and-development process.
• Initial aim is to optimise the suitability and habitability of the system.
• Real aim is to improve both speed and quality of carrying out the task in hand.
![Page 8: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/8.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
A marking tool
![Page 9: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/9.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Answer types
• Multiple choice - useful where appropriate.
• Text – single word to essay. Most common type, can include structured text, e.g. programs, simple maths.
• Slots/fill-in-the-blanks.
• Simple diagrams (experimental).
• Formatted maths – next phase.
Can be used in any combination, structured using composite questions.
![Page 10: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/10.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Text answer types
• Traditionally: “short answers” vs. “essays”
• Maybe better: “factual” vs “discursive”…
• … or “objective” vs. “subjective”
Hypothesis: Objective answers can usefully be semi automatically marked using simple statistical clustering and matching techniques, while subjective answers require some amount of “natural language understanding”.
![Page 11: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/11.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
What students really say
• Spelling mistakes
• Word variants
• Context-dependent synonyms
• Original answers
![Page 12: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/12.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Spelling mistakes
• interpretor, interperetor, interaper (not sure about spelling), …
• hierarchial, hierachical, hirarachical, …
• defieciency, deficency, defiency, defficiency, definciency, dificiency, defciency, defficiency, dfficiency, …
• But: modal / model casual / causal
![Page 13: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/13.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Word variants
• “rhesus positive”: 281 students produced 52 forms, with six parameters of variation:
• Upper / lower case: rhesus / Rhesus / RHESUS
• Hyphenation: RH-positive / Rh positive
• Spacing: Rh +ve / RH+ve
• Parentheses: +ve / (+ve)
• "D": Rh positive / Rh D positive
• "positive": positive/ pos./ pos / + / +ve / +ive
![Page 14: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/14.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Context-dependent synonyms
• Working memory, Rule memory, Inference Engine
• Rule Memmory - which rules are avilable, Main Memory - the current state of the world , Interpretor - decides which rule fires
• The knowledge, the rules that operate on the knowledge, and the Intepreter that links the these two.
![Page 15: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/15.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Original answers
• “Give an original example of an exception to default inheritance.”
• 9 penguins, 6 ostriches; 20 non-flying birds in total
• 8 non-walking mammals, 30 other anomalous animals, 31 disabled animals
• 5 plants, 28 artefacts
![Page 16: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/16.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Really original answers
I am a fish. I am a fish. I am a fish. I am a fish. I am a fish. I am a fish. I am a fish. I am a fish. I am a fish. I am a fish. I am a fish. I am a fish.
In other words i dont know the answer, sorry, hope u can have a good laugh at my expense though!!!! :) p.s. If you havent seen red dwarf then you'll think im odd for the i am a fishj bit, but if you have seen it dont you think its cool!!!
![Page 17: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/17.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Simple tools can do a lot
• General problem is hard
• Simple tools can save significant marking time compared to paper exams
• Can display all answers to one part-question together
• Order, e.g. by length, highlight keywords (with optional fuzzy matching) etc..
• ..and you don’t have to read their handwriting!
![Page 18: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/18.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Clustering
• Each answer (including the model answer) abstracted into a numerical form to enable measurement of similarity with other answers
• Similarity of each answer with each other answer measured and stored in an answer-by-answer similarity matrix
• Clustering algorithm applied to the matrix
![Page 19: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/19.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
![Page 20: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/20.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Abstraction
• Vector Space Model
• Vectors refined by:
Spelling correction
Stoplist removal
Stemming
Term weighting
![Page 21: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/21.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
![Page 22: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/22.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Similarity measurement
• Cosine distance (standard)
• Stored in an answer-by-answer similarity matrix
• Generic: can handle many other question types, eg diagrams
![Page 23: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/23.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Clustering algorithm
• Agglomerative Hierarchical Clustering
• Number of clusters not known in advance
• “Average Within Cluster Similarity” a clue to reliability as a basis for marking
![Page 24: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/24.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
![Page 25: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/25.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Example 1: Production systems
• “What are the three components of a production system?”
• 151 student answers
Cluster 1 50 working memory, rule memory, interpreter
Cluster 2 8 1. working memory, 2. rule memory, 3. Interpreter
Cluster 3 6 working memory, rule memory, interpreter (inference engine)
Cluster 4 5 working memory, rule memory, interpretor
Cluster 5 3 working memory, rule memory, interpretter
Cluster 6 3 include the phrase “the three components”
Cluster 7 3 working memory, rule memory, inference engine
…
Outliers 65
![Page 26: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/26.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Outliers
• Unique mis-spellings:
Rule memory, working memory and interperetor
• Correct answers uniquely expressed:
Working memory - contains state; Rule memory - contains rules; Interpreter - decides which to fire
• Unique wrong answers:
I am a fish. I am a fish. I am a fish.
![Page 27: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/27.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Example 2: Iron deficiency
• “Name one deficiency which would give rise to a microcytic anaemia.”
• 279 student answers
• 17 clusters, 79 outliers Cluster 1 132 iron deficiency and minor variants
collapsed by pre-processing
Cluster 2 15 iron
Cluster 3 8 iron deficiency, diet
Cluster 4 7 iron deficiency anemia
&c
![Page 28: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/28.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
• “What single measurement would you make to confirm that an individual is anaemic?.”
• 22 clusters, 85 outliers Cluster 1 67 haemoglobin concentration
Cluster 2 42 red blood cell count
Cluster 3 15 packed cell volume
Cluster 4 13 minor variants on haemoglobin
concentration in the blood
&c
![Page 29: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/29.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Example 3: The frame problem
• “What, in Artificial Intelligence, is the Frame Problem?.”
• 104 student answers
• AWCS relaxed to 0.90, giving 12 clusters, 39 outliers
Cluster 1 19 real world, chang- (change, changes,
changing, &c)
Cluster 2 14 world, chang-
Cluster 3 8 frame
Cluster 4 6 exceptions, inheritance
Cluster 5 4 chang-, repres-
&c
![Page 30: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/30.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Benefits
• Marking time reduced by factor of 2-3 compared to paper scripts
• Can include MCQs where appropriate - they’re not always bad
• Answers genuinely anonymous
• Consistency likely to improve
• Clerical checking eliminated
• Detailed analysis of results possible – good for “drilling down”.
• Lots of data generated for further research
![Page 31: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/31.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Collaboration: software development
• Initial system did little more than replace paper exam books
• Gradually extending real (diagnostic, formative, and summative) use
• Gradually extending functionality
• Priorities for development influenced by users and would-be users, e.g.current top priority is formatted maths…
• …and by real student answers.
![Page 32: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/32.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Conclusions: assessmentIn decreasing order of confidence:
• HCCA, even with very simple tools, is very effective, at least in some cases.
• There are many issues of usability, procedures, education…
• Simple keyword-based answers are easy for HCCA but hard for machines alone.
• Discursive / subjective answers probably require a range of NLE techniques.
![Page 33: Language Engineering for Human-Computer Collaborative Assessment](https://reader035.fdocuments.in/reader035/viewer/2022062323/5681556b550346895dc3369b/html5/thumbnails/33.jpg)
Combining the strengths of UMIST andThe Victoria University of Manchester
Conclusions: NLE
• Applications of NLE don’t have to be “all or nothing” …
• … which is just as well, because even “simple” real data is complicated.
• HCC gets the best from both machine and user…
• … and means that very simple techniques can be Really Useful.