AmericAn HeritAge DictionAries Getting to Know Dictionaries from A
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz...
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz...
![Page 1: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/1.jpg)
Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian
LanguagesKevin Jansz
Department of Computer Science, University of Sydney, Australia
Dr. Christopher ManningComputer Science and Linguistics, Stanford University, USA
Dr. Nitin IndurkhyaSchool of applied Science, Nanyang Technological University, Singapore
![Page 2: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/2.jpg)
Project ObjectivesAims of the project: examining the richness of lexical structure, in
particular the connotational and figurative use of words
providing innovative ways for representing a dictionary, through creative use of the medium of computers
augmenting dictionaries from corpora to be able to provide practical educationally useful
programs as a result (at low labor cost)
Main initial target: an interactive front end for exploring or using the Warlpiri dictionary.
![Page 3: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/3.jpg)
Talk Outline The research agendas Kirrkirr: A Warlpiri dictionary browser The Lexical Database
– exploiting the strengths of XML
– indexing XML data
User interface and visualization User studies
![Page 4: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/4.jpg)
Research Program: Lexicon A lexicon is not just words but a vast network of
associations between words and within and across the concepts represented by words
The aim of this work is to provide people with a better understanding of this conceptual map.
Traditional paper dictionaries offer very limited ways for making such networks visible
On a computer, one can imagine all sorts of ways of bringing out such relationships
![Page 5: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/5.jpg)
MRD Structure The internal structures of current Machine Readable
Dictionaries usually merely mimic the structure of the printed form (Boguraev 1990)
Some work, notably WordNet (Miller 1995) has involved a fundamental rethinking of dictionary content and organization (here, organization via “synsets” which are related via links of part, subkind, opposite)
There has been little in the way of software to make them truly usable by different communities of users.
![Page 6: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/6.jpg)
Initial focusKirrkirr: a Warlpiri browser
Warlpiri is an Australian Aboriginal language spoken in the Tanami desert (NW of Alice)
A computer interface for browsing the Warlpiri dictionary.
Rich lexical materials have been collected by linguists over decades (Hale’s fieldwork from 1959 on, MIT Lexicon Project in the 1980s)
Before Kirrkirr, results still haven’t been produced in a format usable by the community (only printouts)
![Page 7: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/7.jpg)
Our educational goals There are a number of reasons for focusing on Warlpiri
for this electronic bilingual dictionary – There has been a large amount of research on Warlpiri creating one
of the most comprehensive lexical databases for any Australian Language
– There is a relatively large community of people interested in learning their traditional language
– The low level of literacy in the region makes an e-dictionary potentially more useful than a paper edition as it is less dependent on good knowledge of spelling and alphabetical order. Making it fun and easy to use is a considerable help as well.
Features such as being able to point and click, and hear the words take the emphasis away from knowing the written form of the word before the system is used
![Page 8: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/8.jpg)
Target user community
![Page 9: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/9.jpg)
![Page 10: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/10.jpg)
Kirrkirr: A Warlpiri dictionary browser
(Jansz 1998; Jansz, Manning and Indurkhya 1999)
An environment for the interactive exploration of dictionaries.
Although our current work has just been with Warlpiri, the design is general (Arrernte coming soon!)
Attempts to more fully utilize graphical interfaces, hypertext, multimedia, and different ways of indexing and accessing information
Written in Java, it can either be run over the web [high bandwidth] or run locally (here Java’s main advantage is cross-platform support).
![Page 11: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/11.jpg)
Specific goals An interactive environment that encouraged
exploration: easy and fun to use Reduction of the dependence on alphabetical order Catering to the needs of different user groups (kids,
teachers, professionals) Flexible enough to display appropriate information in
appropriate ways depending on user level
![Page 12: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/12.jpg)
OverviewKirrkirr provides various modules Graph layout of word relationships Formatted dictionary entries Semantic domain browsing A notes facility for ‘jotting in the margin’ Multimedia: audio, pictures Advanced searching interfaces others in planning: formatting (XSL) editing, figuration
patterns
These attempt to cater to users with different competence levels
![Page 13: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/13.jpg)
(Kirrkirr screen shot)
![Page 14: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/14.jpg)
![Page 15: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/15.jpg)
The lexical database Existing materials are stored in an ad hoc format of
markup using backslash codes with some (rather odd) nesting of structural tags
These were converted to XML using an error-correcting stack-based parser (written in PERL).– The inconsistency and flexibility of dictionary entries actually made
this a surprisingly difficult task.
– But parser tries to impose data integrity
Use of XML gives a clear structure to the data, and makes available many (free) tools
![Page 16: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/16.jpg)
XML XML separates the structure of the data from its
presentation Much of the recent enthusiasm for XML has centered
around representing simple and rigid structures such as database records
The rich hierarchical and variable structure of dictionary entries is really more what something like XML excels at!
Result remains a portable, tangible text file
![Page 17: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/17.jpg)
Alternative: a database The obvious thing for storing a lot of data Has clear advantages: structure, indexing, query
language, relationships, integrity. Many people have suggested using a database for
lexical data and some have actually done it (IITLEX, Austin and Nathan)
But in general lexicographers oppose the rigidity, and, in practice, standard relational databases are quite ill-suited to dictionaries
![Page 18: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/18.jpg)
Problems with Relational Database
Dictionary entries vary enormously Data is fragmented Dictionaries are only loosely structured Same element can appear at many levels (dialect,
cross-reference, …) Database model is inflexible to extending the
dictionary structure Lessens portability
![Page 19: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/19.jpg)
XML indexing - challenges Despite the various XML parsers available, it is
surprising that there has been little consideration in making single entries retrivable from the file
Present XML Parsers tend to put the entire XML document in memory (or some form of it) in memory, before the data extraction process begins
This is not practical when parsing significant XML databases (eg. the Warlpiri dictionary is approx. 10Mb).
![Page 20: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/20.jpg)
XML Indexing - solutions The heirachial structure of XML lends itself to
indexing, as each separate entry in the XML file can be considered as a separate entity
To make the Warlpiri dictionary usable for Kirrkirr an ad hoc indexing system was developed– Uses a slightly modified Ælfred parser
– Entries are indexed by headword
The system returns an XML document object containing the single dictionary entry, facilitating– processing for related words (Graph layout)
– XSL processing to HTML
![Page 21: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/21.jpg)
XML Indexing - solutions (2) The use of the XML indexing process considerably
improves effeciency as only requested entries are parsed, hence consering time and bandwidth– Once whole entries are parsed, they are kept temporarily in a cache
Thus the System uses XML as a median between the structure and indexing of a relational database, with the freedom and functionality of XML.
![Page 22: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/22.jpg)
XML document
object
Kirrkirr
5
Kirrkirr’s XML Index ProcessIndex in Memory
![Page 23: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/23.jpg)
XQL - Potential
![Page 24: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/24.jpg)
![Page 25: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/25.jpg)
Visualization of dictionary information
For applications with simple textual content behind them, there is little that can be done but an on-line reflection of a printed page
But we want more than just definitions of words: we want to know their relationships to other words, and the patterning in these relationships
In a computational approach, can mediate between the lexical data and the user
The interface can select from and choose how to present information (according to the user’s preferences) – in many different ways
![Page 26: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/26.jpg)
Previous work Current systems present the search-dominated
interface of classic Information Retrieval systems: you type a word in a search box
Results try to mimic, but are generally inferior to, the printed version of the dictionary
Good feature: rapid searching These systems do little to utilize the captivating
qualities of computers: interactivity, user control and adaptability (Brown 1985).
![Page 27: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/27.jpg)
Previous work (2) Only effective when user has a clearly specified
information need – even here, we are ignoring the distinction between information gained and knowledge sought (Sharpe 1995)
Lack browsing, and chances for incidental or curiosity driven learning
Lack tangibility and situatedness of paper: ineffective for getting an idea of a collection
We wish to exploit the essence of hypertext, which is “click to explore” browsing
![Page 28: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/28.jpg)
Previous work (3) Little research work (in corpus linguistics,
visualization etc.) on dictionary visualization WordNet built a rich network of relationships, which
fundamentally departed from the paper dictionary tradition, and has been used in many computational projects
However very little has been done in the way of interfaces that make these relationships visible and intelligible to users.
Graphical representations seem particularly important given our target users.
![Page 29: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/29.jpg)
Graph-based visualization There is a little previous work on graphical
representations of dictionaries For instance, the visual-thesaurus by plumbdesign
derived from WordNet But it is also a good demonstration of how chaotic
and confusing graphical interfaces can become.
![Page 30: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/30.jpg)
Graph-based visualization
(Jansz 1998; Jansz, Manning and Indurkhya 1999) Classic graph layout problem Adapts work by Eades et al. (1998) and Huang et al.
(1998) on visualization and navigation of WWW document linkages
Uses the spring algorithm. Big advantage is that it is an iterative updating algorithm, and so gives an easy interactivity:– it wiggles and people can play with it.
Clarity and simplicity of graph: Software maintains a set of focus nodes to prevent overcrowding
![Page 31: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/31.jpg)
Educational advantages Alphabetical order is important, but A web of words offers other effective opportunities for
learning A student can opportunistically explore words that are
related in various ways Important semantic relationships can be understood
![Page 32: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/32.jpg)
Kirrkirr network display
![Page 33: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/33.jpg)
Kirrkirr network display
![Page 34: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/34.jpg)
![Page 35: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/35.jpg)
Formatted dictionary entries Are produced automatically from the XML by using
XSL (James Clark’s XT) XSL allows easy modeling of some user preferences. Most trivially, one can leave out information such as
part of speech, or detailed definitions This is useful as many users find information
overload quite confusing and demotivating Can produce bilingual or monolingual dictionary Opportunities for various output styles, and formats
such as RTF or TeX for printing.
![Page 36: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/36.jpg)
Formatted dictionary entries
![Page 37: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/37.jpg)
Rich typology of link types The semantically rich types of linkages present in a
dictionary (synonym, antonym, hyponym, subheadword, variant, coverbs, …) solves one of the major problems of the web: we have many link types with a clear semantic interpretation
Use consistent color-coded text and edges to show these link types
Gives a richer browsing experience Can tell where you are going before clicking
![Page 38: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/38.jpg)
Browsing Work (at PARC and elsewhere: Pirolli et al. 1996) has
stressed role for browsing as well as searching in information access
It provides a context for learning We provide browsing in several ways:
– conventional hypertext• but with rich semantically-interpreted links• their color-coding matches network edges
– network-based display of words– browsing through semantic domains
Such cultural information is hard to learn, and not normally in dictionaries or thesauri
Question: can terminology sets be derived automatically from appropriate corpora?
![Page 39: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/39.jpg)
![Page 40: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/40.jpg)
User study problems Since at present there is no dictionary available
except the printed out ‘database’ [complete with markup codes], it was hard for many people to judge the use of the interface, since there was no point of comparison.
First impressions only: It would have been good to let people try it out at their leisure, but unfortunately not possible (NT Ed all Macintosh, MRJ 2.1 shipping deadlines slipped past our study date…)
![Page 41: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/41.jpg)
User studyMim Corris (Yuendumu, Willowra) User testing with primary and (lower) secondary
students Comments from teachers, other adults etc. Purely qualitative observational study of dictionary
use (Doing anything much else would be difficult.) Initial reactions are very enthusiastic Could use as a basis for classroom activities (better
with some further development: games and puzzles)
![Page 42: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/42.jpg)
![Page 43: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/43.jpg)
Conclusions Kirrkirr is just a prototype of what one can do to
visualize dictionaries We have addressed the challenge of making
dictionary information usable in the creation of an application which mediates between well-structured data and users’ needs for searching/browsing and presentation
While we have focused our research on Warlpiri, the system can be easily applied to other languages
![Page 44: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/44.jpg)
Conclusions (cont.) “... The best future applications of MRDs in
education will be those most able to respond to the insights and needs of their users” (Kegl 1995)
Kirrkirr can be seen as a step towards the future of e-dictionaries
![Page 45: Kirrkirr: A Java-based visualisation tool for XML dictionaries of Australian Languages Kevin Jansz Department of Computer Science, University of Sydney,](https://reader036.fdocuments.in/reader036/viewer/2022062516/56649d4e5503460f94a2d597/html5/thumbnails/45.jpg)