2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics!...
Transcript of 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics!...
![Page 1: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/1.jpg)
Tool CriticismMyriam C. Traub, Jacco van Ossenbruggen, Lynda Hardman!Information Access
![Page 2: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/2.jpg)
“Amsterdam”
1642
“Amfterdam”
1624
“?”
16xx?
2
First mention of …
1618
… in the OCRed newspaper archive of the KB?
“Amslerdam”
1673
earliest !document
![Page 3: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/3.jpg)
01
Digital Humanities
✤ digital archives / large data collections!
✤ collection bias / digitization policy!
✤ digital representation ≠ physical object!
✤ tools are imperfect !
✤ source cannot be considered independent from tools!
✤ need methods to detect tool-induced bias
3
![Page 4: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/4.jpg)
01
Source criticism
✤ well established method in the humanities to detect bias in a source!✤ Who is the author?!✤ Is the information current?!✤ Is the information objective
and credible?…!
✤ need a similar approach for tool-induced bias: !! tool criticism
4
![Page 5: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/5.jpg)
01
Methodology
✤ interviews with humanities scholars!
✤ classification of common research tasks!
✤ lack of trust blocks progress!
✤ use case: digital newspaper archive of KB !
✤ no formal OCR evaluation!
✤ useful for scholars?!
✤ mismatch between two perspectives
5
![Page 6: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/6.jpg)
We care about average performance
on representative subsets for generic
cases.
I care about actual performance
on my non-representative subset
for my specific query.
6
Two different perspectives of quality evaluation
![Page 7: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/7.jpg)
No silver bullet
✤ we propose novel strategies that solve part of the problem:!✤ critical attitude !(awareness and better support)!
✤ transparency !(provenance, open source, documentation, …)!
✤ alternative quality metrics!(taking research context into account)
7
M. Traub, J. van Ossenbruggen, L. Hardman, !Impact Analysis of the OCR Quality Problem in Digital Archives, TPDL2015 (under review)
![Page 8: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/8.jpg)
8
“Amsterdam”
Context-aware quality indicatorsquery
data
task
![Page 9: 2015 05 29 ScientificMeeting - Centrum Wiskunde & Informatica · alternative quality metrics! (taking research context into account) 7 M. Traub, J. van Ossenbruggen, L. Hardman, !](https://reader033.fdocuments.in/reader033/viewer/2022060409/5f100b177e708231d44729b1/html5/thumbnails/9.jpg)
01
Future work
✤ What strategies should a tool support to help scholars discover and dealing with bias?!
✤ What is a good way of estimating uncertainty for a specific task?!
✤ Can we crowdsource (part of the data for) better estimates?!
✤ What is a good way of conveying the estimated impact to scholars?
9
… questions?