
New Tools in Digital Humanities UDHIG June 13 2006 Zoe Borovsky


text analysis tools

Transcript of Udhig0613

Page 1: Udhig0613

New Tools in Digital Humanities

UDHIG June 13 2006Zoe Borovsky

Page 2: Udhig0613

New tools

Text:JuxtaTAPoR, HyperPo WordHoardImages:

Image Markup Tool

Page 3: Udhig0613

Why digitize text?

Text analysis: discovering new knowledge by linking information together in interesting ways, not just showing overall trends.

“I think discovering new knowledge vs. showing trends is like the difference between a detective following clues to find the criminal vs. analysts looking at crime statistics to assess overall trends in car theft.” (Marti Hearst, 2003)

Page 4: Udhig0613

The verb “look” occurs more often near words & names of giantesses than giants.

Three volumes of sagas:

Hundreds of giants and giantesses

Page 5: Udhig0613

Types of tools

Concordance, comparison, corpus, critical editions (Juxta)

Search (TAPoR, HyperPo, WordHoard)Key words in context (KWIC)Collocates (associations)Markup: Lemma, Parts of speech,


Page 6: Udhig0613


Produces critical editions, comparing and collating multiple witnesses of a single work


Page 7: Udhig0613


Desktop Application: Mac, Windows and Unix/Linux (open source)

Input: plain text (UTF-8), or XMLOutput: HTML critical apparatus

Page 8: Udhig0613

The darker color, the more variants that differ

Page 9: Udhig0613

Toggle between texts

Page 10: Udhig0613

Generate HTML

Page 11: Udhig0613
Page 12: Udhig0613

TAPoRWeb-based text analysis portalSearch and display using online tools


Input: XML, HTML, TEI, plain text

Page 13: Udhig0613


Mostly English, some western European languages

Word ListsKWIC (key word in context)Collocates/co occurrences - words

that occur in the proximity

Page 14: Udhig0613

Word ListHyperPo

Page 15: Udhig0613

Key word in context, HyperPo

Page 16: Udhig0613

co occurrences“white”add secondary corpus

Page 17: Udhig0613

WordHoardDesktop application/server versiontexts are annotated or tagged by

morphological, lexical, semantic, prosodic, and narratological criteria.


Page 18: Udhig0613

The downloadable version comes with texts

Open source version can be installed on your own server with your texts

Page 19: Udhig0613

Sample WordHoard query

Shakespeare’s use of the word “love” over time

Page 20: Udhig0613


Page 21: Udhig0613

Image Markup Tool


Windows only

Page 22: Udhig0613

Image Markup toolInput: an image that you want to

make available on a web page with annotations directly on the image

Ex, Robert Watson’s

Back to Nature

Page 23: Udhig0613
Page 24: Udhig0613

Image Markup Tool

Output: sample A copy of your XML data file with an added XSL stylesheet

declaration A copy of the image file you're marking up (usually

reduced to a size suitable for a Web page -- you can control this size in the Options / Web view preferences window).

An XSLT file (copied from the web_view folder in the program folder, with some variables modified to suit your data).

A JavaScript file (copied from the web_view folder in the program folder).

A CSS stylesheet file (copied from the web_view folder in the program folder).