KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from...

96
Knowl. Org. 37(2010)No.4 KO KNOWLEDGE ORGANIZATION Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444 International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation Contents Articles Margaret E.I. Kipp and D. Grant Campbell. Searching with Tags: Do Tags Help Users Find Things? ................................ 239 Zhonghong Wang, Abdus Sattar Chaudhry, and Christopher Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy for Navigation: Use of a General Classification Scheme and Domain Thesauri ...................................................... 256 Papers from Classification at a Crossroads: Multiple Directions to Usability: International UDC Seminar 2009—Part 2 Vanda Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC ........... 270 Gordon Dunsire and Dennis Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability............................. 280 Ceri Binding and Douglas Tudhope. Terminology Web Services .............................................. 287 Veslava Osińska. Visual Analysis of Classification Scheme........................299 Alenka Šauperl. UDC and Folksonomies..................................................307 Report Nancy J. Williamson. Classification Issues in 2008. - Tenth International ISKO Conference, August 2008, Montreal, Canada ..................................318 - IFLA Section on Classification and Indexing ............326 - International UDC Seminar 2009 ...............................327

Transcript of KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from...

Page 1: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4

KO KNOWLEDGE ORGANIZATION Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

Contents

Articles Margaret E.I. Kipp and D. Grant Campbell. Searching with Tags: Do Tags Help Users Find Things? ................................ 239 Zhonghong Wang, Abdus Sattar Chaudhry, and Christopher Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy for Navigation: Use of a General Classification Scheme and Domain Thesauri...................................................... 256 Papers from Classification at a Crossroads: Multiple Directions to Usability: International UDC Seminar 2009—Part 2 Vanda Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC ........... 270 Gordon Dunsire and Dennis Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability............................. 280 Ceri Binding and Douglas Tudhope. Terminology Web Services .............................................. 287

Veslava Osińska. Visual Analysis of Classification Scheme........................299 Alenka Šauperl. UDC and Folksonomies..................................................307 Report Nancy J. Williamson. Classification Issues in 2008. - Tenth International ISKO Conference,

August 2008, Montreal, Canada ..................................318 - IFLA Section on Classification and Indexing ............326 - International UDC Seminar 2009 ...............................327

Page 2: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4

KNOWLEDGE ORGANIZATION KO Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation

KNOWLEDGE ORGANIZATION This journal is the organ of the INTERNATIONAL SOCIETY FOR KNOWLEDGE ORGANIZATION (General Secretariat: Vivien PETRAS, Humboldt-Universität zu Berlin, Institut für Bibliotheks- und Informationswissenschaft, Unter den Linden 6, 10099 Berlin, Germany. E-mail: [email protected].

Editors

Dr. Richard P. SMIRAGLIA (Editor-in-Chief), School of Infor-mation Studies, University of Wisconsin, Milwaukee, Bolton Hall 5th Floor, 3210 N. Maryland Ave., Milwaukee, WI 53211 USA. E-mail: [email protected]

Dr. Joseph T. TENNIS (Book Review Editor Designate), The In-formation School of the University of Washington, Box 352840, Mary Gates Hall Ste 370, Seattle WA 98195-2840 USA. E-mail: [email protected] Dr. Ia MCILWAINE (Literature Editor), Research Fellow. School of Library, Archive & Information Studies, University College London, Gower Street, London WC1E 6BT U.K. Email: [email protected]

Dr. Nancy WILLIAMSON (Classification Research News Edi-tor), Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6 Canada. Email: [email protected]

Hanne ALBRECHTSEN, Institute of Knowledge Sharing, Bu-reauet, Slotsgade 2, 2nd floor DK-2200 Copenhagen N Denmark. Email: [email protected]

David J. BLOOM (Editorial Assistant), School of Information Studies, University of Wisconsin, Milwaukee, Bolton Hall 5th Floor, 3210 N. Maryland Ave., Milwaukee, WI 53211 USA.

Consulting Editors

Dr. Clare BEGHTOL, Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6, Canada. Email: [email protected]

Dr. Gerhard BUDIN, Dept. of Philosophy of Science, University of Vienna, Sensengasse 8, A-1090 Wien, Austria. Email: [email protected]

Prof. Jesús GASCÓN GARCÍA, Facultat de Biblioteconomia i Documentació, Universitat de Barcelona, C. Melcior de Palau, 140, 08014 Barcelona, Spain. Email: [email protected]

Claudio GNOLI, University of Pavia, Mathematics Department Library, via Ferrata 1, I-27100 Pavia, Italy. Email: [email protected]

Dr. Rebecca GREEN, Assistant Editor, Dewey Decimal Classifi-cation, Dewey Editorial Office, Library of Congress, Decimal Classification Division , 101 Independence Ave., S.E., Washington, DC 20540-4330, USA. Email: [email protected]

Dr. José Augusto Chaves GUIMARÃES, Departamento de Ciên-cia da Infromação, Universidade Estadual Paulista–UNESP, Av. Hygino Muzzi Filho 737, 17525-900 Marília SP Brazil. Email: [email protected]

Dr. Birger HJØRLAND, Royal School of Library and Informa-tion Science, Copenhagen Denmark. Email: [email protected]

Dr. Barbara H. KWASNIK, Professor, School of Information Studies, Syracuse University, Syracuse, NY 13244 USA, (315) 443-4547 voice, (315) 443-4506 fax. Email: [email protected]

Dr. Jens-Erik MAI, Faculty of Information Studies, University of Toronto, 140 St. George Street, Toronto, Ontario M5S 3G6, Canada. Email: [email protected]

Ms. Joan S. MITCHELL, Editor in Chief, Dewey Decimal Classi-fication, OCLC Online Computer Library Center, Inc., 6565 Frantz Road, Dublin, OH 43017-3395 USA. Email: [email protected]

Dr. Widad MUSTAFA el HADI, URF IDIST, Université Charles de Gaulle Lille 3, BP 149, 59653 Villeneuve D’Ascq, France

H. Peter OHLY, GESIS – Leibniz Institute for the Social Sciences, Lennestr. 30, 53113 Bonn, Germany. eMail: [email protected]

Dr. Hope A. OLSON, School of Information Studies, 522 Bolton Hall, University of Wisconsin-Milwaukee, Milwaukee, WI 53201 USA. Email: [email protected]

Dr. M. P. SATIJA, Guru Nanak Dev University, School of Library and Information Science, Amritsar-143 005, India

Dr. Otto SECHSER, In der Ey 37, CH-8047 Zürich, Switzerland

Dr. Winfried SCHMITZ-ESSER, Salvatorgasse 23, 6060 Hall, Ti-rol, Austria.

Dr. Dagobert SOERGEL, Department of Library and Infor- mation Studies, Graduate School of Education, University at Buffalo, 534 Baldy Hall, Buffalo, NY 14260-1020. E-mail: [email protected]

Dr. Eduard R. SUKIASYAN, Vozdvizhenka 3, RU-101000, Mos-cow, Russia.

Dr. Martin van der WALT, Department of Information Science, University of Stellenbosch, Private Bag X1, Stellenbosch 7602, South Africa. Email: [email protected]

Prof. Dr. Harald ZIMMERMANN, Softex, Schmollerstrasse 31, D-66111 Saarbrücken, Germany

Founded under the title International Classification in 1974 by Dr. Ingetraut Dahlberg, the founding president of ISKO. Dr. Dahl-berg served as the journal's editor from 1974 to 1997, and as its publisher (Indeks Verlag of Frankfurt) from 1981 to 1997.

Page 3: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

239

Searching with Tags: Do Tags Help Users Find Things?†

Margaret E. I. Kipp* and D. Grant Campbell**

* School of Information Studies, University of Wisconsin-Milwaukee, Bolton Hall Rm: 510, 3210 N Maryland Ave, Milwaukee, WI, USA 53211 <[email protected]>

** Associate Professor, Faculty of Information and Media Studies, University of Western Ontario, London, ON, Canada N6A 5B7 519-661-2111 ext.88483 <[email protected]>

Margaret E. I. Kipp is an Assistant Professor and member of the Information Organization Research Group, School of Information Studies, University of Wisconsin-Milwaukee. She has a background in computer science and worked as a programmer/analyst. She has a PhD in Library and Information Sci-ence from the University of Western Ontario. Her research interests include social tagging, informa-tion organisation on the web, classification systems, information retrieval, collaborative web tech-nologies and the creation and visualisation of structures in information organisation systems.

D. Grant Campbell is an Associate Professor in the Faculty of Information and Media Studies at the University of Western Ontario. His research interests include bibliographic description, information and literary theory, metadata and the Semantic Web. He has published in the Journal of Academic Li-brarianship, Cataloging & Classification Quarterly, Journal of Internet Cataloging, Epilogue, and Papers of the Bibliographical Society of Canada.

† A preliminary version of this study was published as: Kipp, Margaret E.I. 2008. Searching with tags: do tags help users find things? In Arsenault, Clément and Tennis, Joseph T. eds. Culture and identity in knowledge organization: proceedings of the Tenth International ISKO Conference 5-8 August 2008 Montréal, Canada. Advances in knowledge organizationvol. 11. Würzburg: Ergon Verlag, pp. 320-35.

Kipp, Margaret E. I. and Campbell, D. Grant. Searching with Tags: Do Tags Help Users Find Things? Knowledge Organiza-tion, 37(4), 239-255. 28 references. ABSTRACT: The question of whether tags can be useful in the process of information retrieval was examined in this pilot study. Many tags are subject related and could work well as index terms or entry vocabulary; however, folksonomies also in-clude relationships that are traditionally not included in controlled vocabularies including affective or time and task related tags and the user name of the tagger. Participants searched a social bookmarking tool, specialising in academic articles (CiteULike), and an online journal database (Pubmed) for articles relevant to a given information request. Screen capture software was used to collect participant actions and a semi-structured interview asked them to describe their search process. Preliminary results showed that participants did use tags in their search process, as a guide to searching and as hyperlinks to potentially useful arti-cles. However, participants also used controlled vocabularies in the journal database to locate useful search terms and links to related articles supplied by Pubmed. Additionally, participants reported using user names of taggers and group names to help select resources by relevance. The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and was reported as an asset by a number of participants. This study suggests that while users value social and subjective factors when searching, they also find utility in objective factors such as subject headings. Most im-portantly, users are interested in the ability of systems to connect them with related articles whether via subject access or other means.

Page 4: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

240

1.0 Introduction In traditional subject access systems, the indexer is an intermediary: an individual trained in the rules of information organisation to assign important infor-mation about the physical media and the subject matter of the content. On the web, the indexer has typically been the creator of the item, or an auto-mated system collecting basic word frequency in-formation to determine approximate topics. More recently, there has been a growing move to classify materials manually using consensus classifications created on the web by large groups of users tagging material on social bookmarking sites.

Information retrieval research traditionally has been concerned with the efficiency with which in-formation systems retrieve information that is rele-vant and useful, concerning itself with matters of precision, recall, and system effectiveness. Such stud-ies contain an implicit evaluation of the categorisa-tion of the material (since this affects retrieval) but do not often make this implicit (Cleverdon 1967). This pilot study aims to explore questions pertaining to resource discovery in a new context, that of social tagging. Proponents of tagging and social bookmark-ing often suggest that tags could provide at worst an adjunct to traditional classification systems and at best a complete replacement for such systems (Shirky 2005). The user-created nature of these or-ganisational schemes suggests that tagging systems may be able to function as a new method for resolv-ing the gap between a user’s information need and its translation into a search query by increasing the user’s involvement in the categorisation process and combining it with elements of personal information management.

The ability to discover useful resources is of in-creasing importance where web searches return 300,000 (or more) sites of unknown relevance and is equally important in the realm of digital libraries and article databases. The question of the ability to locate information is an old one and led directly to the creation of cataloguing and classification systems for the organisation of knowledge. However, such sys-tems have not proven to be truly scalable when deal-ing with digital information and especially informa-tion on the web. Can the user-created categories and classification schemes of tagging be used to enhance search in these new environments? Much speculation has been advanced on the subject but so far no stud-ies have examined user perceptions of the utility of tags in a mediated search process.

Social bookmarking tools allow users to store their favourite bookmarks in a publicly accessible manner on the web. Users are encouraged to add de-scriptive terms or tags to each bookmark. Tagging is the process of assigning a label (whether classifica-tory or otherwise) to an item and is often combined with social bookmarking or the organisation of other information on the web, for example organising pic-tures on Flickr.com (Hammond et al. 2005). While other groups have been involved in creating index terms (for example, journal article authors who are asked to provide keywords with their submitted arti-cles), these keywords generally have a small circu- lation and are not widely used (see Kipp 2005). Small-scale indexing is common but generally covers a narrow range of topics and is specific to the article. Collaborative tagging systems such as CiteULike (http://www.citeulike.org) or Connotea (http://www. connotea.org) allow users to participate in the classi-fication of journal articles by encouraging them to assign useful labels to the articles they bookmark.

With traditional indexing systems and tagging be-ginning to coexist, this raises the question: what is the relationship between tagging and traditional indexing systems? Could tags provide a more interactive, mu-tually-determining relationship, when combined with traditional subject access, that could evolve over time? Or, have systems that have begun to include tags in-corporated nothing more than a fad which could lower user expectations of retrieval using traditional indexing systems without providing a similar or better retrieval or indexing performance? While users have assigned many tags to items in social bookmarking systems, there has been little research into how well these tags serve in their suggested function of helping people to re-find the items they had previously located or to enable others to find these items through the use of meaningful tags. 2.0 Related Studies Previous research in classification suggests that there is a distinct difference between user-created or naive classification systems, on the one hand, and those created by professional indexers on the other (Begh-tol 2003). While both systems employ subject-based terms, users tend to employ terms that remind them of current or past projects and tasks, and terms which could have little meaning to those outside their circle of friends and acquaintances, but are very meaningful to the user (Malone 1983; Kwasnik 1991; Jones et al. 2005).

Page 5: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

241

End-user and search thesauri using user-centred and user-generated terminology were developed in the 1980s (Nielsen 2004, 60) to enable users to ex-pand their searches and make connections to thesau-rus vocabulary while searching, but many systems still do not offer thesaurus enhanced search (Nielsen 2004, 60). Scholars have also examined usability and user perceptions of thesaurus enhanced search tools and found that these tools enhance the search proc-ess, but research into user interactions with such sys-tems is limited (Shiri and Revie 2005; Blocks, Cunliffe, and Tudhope 2006; Shiri and Revie 2006).

Mathes proposes that librarians embrace user as-signed tags as a third alternative to traditional library classifications and author-assigned keywords (Mathes 2004), a suggestion which builds on earlier work in end-user and search thesaurii. He and others also sug-gest that user tagging systems would allow librarians to see what vocabulary users actually use to describe concepts and that this could then be incorporated into the system as entry vocabulary to the standard thesau-rus subject headings (Mathes 2004; Hammond et al. 2005). Preliminary research has been undertaken in the area of using tagging to generate user centred terms for a thesaurus (Schwartz 2008; Yoon 2009) building on this earlier work with search thesauri. Some libraries and museums have developed systems that attempt to combine the benefits of professional classifications with those of naive classifications by adding tagging to their existing systems. The Steve museum project (Trant 2006), the University of Penn-sylvania PennTags project (Allen and Winkler 2007), and Facetag (Quintarelli, Resmini, and Rosati 2006) are all examples of this phenomenon.

Studies comparing the terminology used in tag-ging journal articles to indexer-assigned controlled vocabulary terms suggest that many tags are subject related and could work well as index terms or entry vocabulary (Kipp 2005; Hammond et al. 2005; Kipp and Campbell 2006); however, the world of folkso-nomies includes relationships that would never ap-pear in a library classification or thesaurus, including time and task related tags, affective tags, and the user name of the tagger (Kipp 2005; Kipp and Campbell 2006; Kipp 2007). These short term and highly spe-cific tags suggest important differences between user tagging systems and author or intermediary classifi-cation systems which must be considered.

Although users searching online catalogues and databases often express admiration for the idea of controlled vocabularies and knowledge organisation systems, they may find it difficult to accommodate

their vocabulary to the thesaurus and often find the process of searching frustrating (Fast and Campbell 2004). Users also tend not to perform the sort of systematic search process common to expert search-ers thus limiting their ability to gain the necessary experience with the controlled vocabulary of a sys-tem (Markey 2007). Additionally, controlled vocabu-lary indexing has proven costly and has not proven to be truly scalable when dealing with digital infor-mation, especially information on the web (Shirky 2005). Can the user-created categories and classifica-tion schemes of tagging be used to enhance resource discovery in these new environments? Much specula-tion has been advanced on the subject but so far few empirical studies have been done. Heymann, Koutri-ka, and Garcia-Molina (2008) analyse tags with re-spect to the pages to which they are assigned. Their research finds that in over 50% of cases, the tags ap-pear in the text of the pages to which they have been assigned. In fact, in 80% of cases, the tags appear somewhere in the text of the page or in the backlink or forward link text from which they were located. They suggest that this positive result means that tags will indeed be a potential asset to improving search (Heymann, Koutrika, and Garcia-Molina 2008), but do users actually use tags when they are present?

3.0 Research Questions The following exploratory study offers a comparison of the usefulness of a social bookmarking tool and of a traditional online database in an exercise of medi-ated resource discovery through keyword search. It seeks preliminary answers to the following research questions:

1. Do tags appear to enhance the subjective experi-

ence of resource discovery? Do users feel that they have found what they are looking for?

2. How do apprentice librarians find searching social bookmarking sites compared to searching more classically organised sites? How do tags work when searchers are undergoing a learning process with a problem that is not necessarily familiar?

3. Do tagging structures appear to facilitate resource discovery? How does this compare to traditional structures of supporting resource discovery?

4.0 Methodology Exploratory studies of emerging social phenomena are particularly amenable to qualitative inquiry, thus quali-

Page 6: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

242

tative techniques were employed in the present study. A total of 10 participants were recruited for this study. These participants were recruited from current and former students in library and information science. Current and former students in library and informa-tion science were recruited for the following reasons: 1. They may be recent graduates from undergraduate

programs, and have retained a memory of their in-formation use in an academic context, or they may have worked for years in an information re-lated field;

2. They have an interest in information issues, which makes them familiar with many online search tools that are popular within the broader online community;

3. As librarians or information scientists, they have become exposed to the vocabulary used to articu-late problems that are typically encountered in broader user populations and to empathise with typical user problems in information searching.

Participants were encouraged to compare their ex-periences with the on-line database and social book-marking site to their experiences using web search engines in order to increase the volume of data col-lected about how users select keywords for search. They were also encouraged to talk about their search experiences in the study in relation to past search experiences.

While the use of information science students for this study may suggest a potential bias in the results, there is no reason to assume that all information sci-ence students are particularly well versed in the phe-nomenon of tagging, and there is greater reason for assuming that participants with some experience searching would be able to make the transition be-tween search systems with minimal training, thus removing some of the issues involved with differing interfaces. Library and information science students are expected to learn and become comfortable with a variety of different search systems with varying in-terfaces. Students in an LIS programme are typically exposed to a variety of search interfaces as part of their education, as opposed to working professionals who may have grown used to a small suite of fre-quently-used tools on the job.

There have been no empirical studies on the ex-perience of users using tagging systems in an LIS context. Given the increasing interest in such quali-tative data as user relevance judgements (Tang and Sun 2003; Oppenheim, Morris, and McKnight 2000),

this study will examine the qualitative dimension that shows how controlled vocabularies, user index terms and tags relate to each other. Because of the emphasis on the qualitative dimensions of this ex-ploratory study, the study is limited to a small num-ber of participants. The results of the study involved the triangulation of three primary data sources: in-terviews, search terms and screen captures of search sessions.

The searchers were asked to search PubMed (an electronic journal database of articles for use by re-searchers and practitioners in the health sciences) and CiteULike (a social bookmarking site specialised for academics with a wide range of health sciences articles already tagged by users) for information on a specific assigned topic (see Table 1). The topic was provided as a paragraph describing an information need:

You are a reference librarian in a science library. A patron approaches the reference desk and asks for information about the application of knowledge management or information organi-sation techniques in the realm of health infor-mation. The patron is looking for five articles discussing health information management and is especially interested in case studies, but will accept more theoretical articles as well.

This topic was chosen by the researcher after sear-ches showed that there were sufficient articles on the subject of information management techniques used in health information in both databases that partici-pants would be able to find far more than the num-ber of relevant articles requested.

Screen capture software (specifically CamStudio and Xvidcap), a “think aloud” protocol (Krug 2006), and a semi-structured exit interview were used to capture the impressions of the users when faced with traditional classification or user tags and their use-fulness in the search process.

Each participant searched for information using both the traditional on-line database with assigned descriptors and a social bookmarking site. Partici-pants were asked to perform the searches in the or-der specified so that their use of a social bookmark-ing site first versus an on-line database could be al-ternated to compensate for order effects.

Participants selected their own keywords for sear-ches on both tools after having read the paragraph description of the information need. They were then asked to provide a list of terms they would use to

Page 7: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

243

start their search. Participants were asked to search until they had located approximately five articles that appeared to match the query and assign rele-vance scores to articles based on an examination of available metadata. At the end of the search process, participants were asked to make a second list of terms they would now use if asked to search for this information again. Participants did not have access to their initial set of search terms at this time to eliminate the learning effect. Participants’ actions were recorded using screen capture software and a microphone. Additionally, participants were inter-viewed after the search process in order to allow them to articulate their impressions of the search process.

The following questions were used as a guide in the semi-structured interview: 1. Did you find the user assigned tags were a better

match for the keywords you chose initially? If not, were they useful in locating the relevant arti-cles? (Also ask this question with respect to sub-ject headings.)

2. Did you find the subject headings useful? Would you have used any of the subject headings or tags to index the document? Would you use any of the subject headings or tags to search for this docu-ment again?

3. Now that you have performed the search, what do you think of the differences/similarities between your initial and final sets of keywords? (Depend-ing on the responses, it may also be useful to dis-cuss individual keywords, especially keywords

that may have been dropped from the search process or that were dropped during the search process only to reappear in the participant’s final list.)

4. What are your thoughts on keywords or tags which you chose not to use in your search?

One issue that might have had an effect on data col-lection is that of differing user interfaces; however, both CiteULike and PubMed offer search by key-word and participants were given a brief introduc-tion to searching with both systems (including an in-troduction to the MeSH browser in Pubmed and the tags in CiteULike). Participants with a library and information science background were specifically chosen for the study because of prior experience with searching multiple systems with different inter-faces, so that they would be better able to handle dif-ferences in interfaces. The design of this study is based on common information retrieval research de-signs with an emphasis on the collection of key-words used in the search (as in web log analysis) in addition to the collection of a ranked set of docu-ments judged relevant by the participant.

Three sets of data were thus available for analysis: sets of initial and final keywords selected by the user, the recording of the search session and think aloud, and recorded exit interviews after the search session. These three data sets were examined to balance the users’ perceptions of the search (interviews) with their search strategies (terms) and their behaviour while implementing those strategies (screen cap-tures). Keywords and tags chosen by users were

Activity Description Length

Welcome initial greeting and welcome 2-3 minutes

Introduction to session Introduction to the study discussing the session itself and the tasks they will be asked to perform.

5-7 minutes

First search task (CiteULike or PubMed)

The first of two tasks consisting of: 1) the user’s generation of keywords for search, 2) collection of articles, 3) analysis of retrieved articles for relevance, and 4) assign-ment of relevance judgements to the articles, 5) assignment of new set of keywords for search

15 minutes

Second search task (PubMed or CiteULike)

same as first task 15 minutes

Post search discussion A semi-structured interview involving a discussion of the participant’s results and their own thoughts as to the usefulness of the terms they used to search and the terms used to describe the documents they retrieved.

15 minutes

Conclusion Final comments and a thank you for participating. 3-5 minutes

Table 1. Preliminary Timeline for Sessions

Page 8: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

244

compared and examined to see how or whether they were related and participant’s recorded video ses-sions were transcribed along with the interviews in order to provide a deep analysis of the search process of the study participants. These transcripts were then analysed using a grounded theory approach (Strauss and Corbin 1990) based on initial insights while transcribing the video sessions, beginning early in the observation process. This coding was then used to aid in choosing what search behaviours to look for in the transcripts. Trustworthiness of the results was ensured through a triangulation of par-ticipant experiences, deep analysis of the results, and discussion between the researchers. 5.0 Results 5.1 Demographics A total of 10 participants were recruited for this stu-dy. Four of the participants were male and six were female. Participants were between 23 and 40 years of age and generally self-identified as intermediate level computer users (80%) while the remaining partici-pants (20%) self-identified as expert users. All but one of the participants listed previous educational backgrounds in the humanities (English and French) or social sciences (Political Science, Sociology, etc.). The final participant gave an educational background in the fields of mathematics and education. Profes-sional backgrounds were generally in the areas of teaching or librarianship/archives; however, 3 of the participants did not include a professional back-ground.

Number of years using a computer ranged from 6 years to 22, with a median of 19 years of experience using a computer. Participants were chosen from amongst users who have some experience searching the Internet, so it is reasonable that all participants would have some experience with computers. Par-ticipants’ use of specific Internet tools was mixed. Only 20% of participants reported having a website, and 40% a blog. However, one of the users with a blog also maintained a webpage. Half the partici-pants maintained neither a blog nor a website. Par-ticipants were generally frequent users of both web search engines and journal databases, and therefore were reasonably conversant with both searching and web use; but, they were relative novices at tagging systems. Ninety percent (90%) of participants used search engines often or frequently and 70% of par-ticipants used journal databases often or frequently.

While participant use of search engines and journal databases was high, few participants reported using social bookmarking tools on a regular basis. Fully 70% of participants reported using them rarely or never. Social bookmarking tools are still relatively new, especially in comparison to journal databases, and heavy users are still less common. 5.2 Participant Keyword Usage All users used multi word keywords initially, sug-gesting that the users are indeed experienced search-ers who are aware of methods which can be used to improve precision or recall in search. At the end of the search process, when users were asked to gener-ate a new list of keywords they would now use for the search, half the users separated their list of final keywords by tool, despite the fact that they were asked for only one list.

A total of 28 unique keywords or keyword phrases were listed initially by the participants. These keywords and keyword phrases were entered into the system by participants according to the pat-terns discussed later in this paper. Each participant listed between 1 and 9 keywords initially, with the median value being 6 keywords. Keyword Frequency

knowledge management 7

information organisation/information organization 6

health information 6

case studies/case study/“case stud” 4

health information management/health info mgt 3

Table 2. Initial Keywords The four most commonly chosen terms were: know-ledge management, information organisation, health information, and case studies (Table 2). Each of the-se terms is directly from the initial text of the infor-mation need. Users reported that their use of knowl-edge management versus information organisation during the search process was determined by the types of results they found when searching with each tool. The fifth search phrase is a reasonably obvious contraction of health information and knowledge management.

Participants produced 46 unique keywords for their final lists (Table 3). They used between 3 and

Page 9: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

245

16 keywords in their final lists, with the median be-ing 6. Participants who separated their final lists by tool used between 3 and 8 terms for CiteULike (me-dian 5) and between 1 and 8 for PubMed (median 3). One participant chose the term “Information Man-agement” which is a MeSH descriptor as the only keyword for searching PubMed. Keywords Frequency

knowledge management/km 9

case studies/case study 6

health information 5

information management 5

health care 3

health information management 2

informatics 2

health 2

Table 3. Final Keywords The most commonly used keyword, by far, was knowledge management. This term comes directly from the information need as described above and is in keeping with previous information retrieval studies where users tended to select terms from the given text of information need for search (Oppenheim, Morris, and McKnight 2000). Information management was also a commonly-used term; this term could be seen as a modification of knowledge management to fit the terminology of a different group of users who prefer the term information management. Another com-monly-chosen term was health information, also from the information need. Both information management and health information were tied for third most popu-lar for the two tools. While users often mentioned that they considered their initial keyword sets to have been incomplete, they tended to choose the same or very similar terms as their suggestions for good search terms to use in order to produce better results. This suggests that their initial search terms were well cho-sen and matched closely those chosen by users tagging articles in CiteULike, but also came close enough to terms used in the Medical Subject headings used in PubMed (or its entry vocabulary) or terms used by authors whose works are published in PubMed for good results to be retrieved.

Half the participants separated their final key-words lists by tool (Table 4). Again, knowledge man-agement was the clear favourite, having been chosen 6 times in total and 4 times for CiteULike. Opinion

was more split on whether knowledge management or information management were best for PubMed. Participants who discovered that information man-agement was a MeSH descriptor were more likely to suggest this as the preferred term while other par-ticipants found that knowledge management was useful for free text searching of abstracts. Keywords CiteULike PubMed

knowledge management 4 2

information management 1 3

case studies 3 1

Table 4. Most common terms separated by tools.

Case studies is not a descriptor in PubMed, but it is an entry term for the descriptor “case reports” that includes case studies. Since this term is an entry term for a MeSH descriptor, it will allow the user to con-nect directly to the MeSH vocabulary without hav-ing to search for a specific term as was the case with information management.

The other popular term, knowledge management, is not a descriptor or an entry term in MeSH, but it can be used to retrieve articles through free text searching of abstracts. Knowledge management was not as fre-quently chosen for use in PubMed because many par-ticipants found that it was not as useful a search term since it is not a MeSH descriptor. Knowledge man-agement and information management are very similar concepts since they both deal with the organisation of information into a form usable by others, but the terms tend to be used in different fields. The high use of knowledge management in this study and on CiteULike suggests that MeSH would be well advised to consider how the term would fit into their descrip-tors as an entry term, at minimum.

In all, participants suggested 20 unique terms for use in searching CiteULike (18 were used by only one person) and 17 unique terms for use in searching PubMed (15 were used by only one person). This wide spread of suggested terms used by only one person is additional evidence for the existence of the long tail in tagging and searching and supports stud-ies showing that searchers do not use the same ter-minology when tagging (Kipp 2005; Kipp 2007). 5.3 Participant Search Experiences Participants tended to prefer the search experience on the system used first, regardless of previous experi-

Page 10: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

246

ence with either system or similar systems. Further in-terviews may be required to determine whether this trend continues although it might simply be the case that any frustration with the system used second would still have been uppermost in the participant’s mind.

“PubMed just didn’t seem as useful. Though I don’t know whether these articles [CiteULike articles] are going to be as academic as some-thing in PubMed. If they’re from the core jour-nals or not.” – Participant 1 (used CiteULike first and had prior experience with Ovid and Medline, but not PubMed)

In contrast to participant 1, participant 9 did not like the CiteULike interface and was much more im-pressed with the PubMed interface and its features, but “would have liked to have subject headings visi-ble along with [the] abstract.” Participant 10 explic-itly stated that the PubMed search was easier than the CiteULike search and that CiteULike’s lack of an advanced search box and a search history made it much less useful.

Other participants found that the interface was providing too much rather than too little informa-tion. Participant 1 felt that the PubMed interface was overwhelming and preferred a simpler interface with slightly less information upfront.

“I think if I knew how to use PubMed better I might have been able to get better results but I don’t have the experience. It was just a little overwhelming. Too many results. … Like in a Google search. … I can’t really tell how many results I was finding with CiteULike. I did find it useful in PubMed how they linked to related articles. That was useful.”– Participant 1

Participants expressed frustration with the interface and the use of keywords in the systems. In general, participants expressed the impression that their use of both systems was hindered by the problems of learning different and complex interfaces including: the locations of search boxes, identification of con-trolled vocabulary terms, different sets of metadata displayed in the results, and other features of each system.

“I found it a lot easier to search CiteULike for some reason. I’m not sure. I think with PubMed I could find some better keywords, keywords

that might be indexed. It looked like with CiteULike I could just type in things like health care, health organisation.”– Participant 1

Participant 7 expressed a similar view and stated that the PubMed search was frustrating because it was difficult to figure out which terms to use. In con-trast, Participant 2 explicitly stated a preference for Google after the search process. The participant de-scribed significant search experience on Google and felt that this experience did not translate directly de-spite the familiar interface of the search box.

“I found that it was sort of frustrating because I wasn’t familiar with the databases. If I had been more familiar, if I had more experience, maybe I would have been able to narrow the keywords faster. Um, yeah, that was it and also being limited to those two databases, um, I would have tried Google. I love Google. I just go onto Google and then what I would do is I would—when I do information searches it’s more scatter brained. I would find one article and I might read through it and then it might suggest something in the article that would lead me to another source and I would look at that and... so it’s more of a, um, following the breadcrumbs sort of way to do things.” – Par-ticipant 2 (participant describes favoured cita-tion pearl growing search strategy on Google)

This is an interesting finding because many search systems seem to be explicitly assuming that users will be comfortable with basic searches since Inter-net searching is so common. This comment, how-ever, suggests that users may be assuming that there is considerable complexity in other search systems that they do not understand and therefore are unable to access. Additionally, these users appear to be con-cerned that this complexity is keeping them from making full use of the system, this despite the fact that Google’s organisation is equally complex and it is almost impossible to be sure one is making full use of Google.

“I really should have looked more closely into how their [CiteULike’s] search function worked, because I know I included health, but I’m not sure if it’s assuming the AND opera-tor. So I was getting a lot of stuff that was on knowledge management but not necessarily anything to do with health.”– Participant 6

Page 11: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

247

The most popular form of metadata as articulated by the majority of participants in the post search inter-views was the abstract. Participants frequently lingered over abstracts and occasionally complained aloud dur-ing the search process if the abstract was missing.

Interviewer: “I’m interested in what metadata people find useful when searching. If the lack of an abstract is a huge deal...” Participant: “It is a huge deal. You can’t tell anything about the article without it.”– Par-ticipant 1

While participants listed the abstract as the most im-portant piece of information for determining rele-vance, they also stated that titles or links to related articles were just as useful as, or even more useful than, subject headings or tags.

“I mostly just looked at the titles of the article, read a little bit of the abstract and then the keyword that I used. I would give that to the user and it would be up to them to decide if the articles were in fact useful and they could con-tinue the search from there. ... I did find it use-ful in PubMed how they linked to related arti-cles. That was useful.”– Participant 1

In fact, many participants felt that the tags were most useful as links to related items rather than as guides to subjects. One participant claimed not to have used the tags, but found the related articles listed in PubMed very useful. This participant thought that if asked to repeat the search again that the tags would be useful as a form of related article search.

“[I thought] I wasn’t using the tags, but I was actually using them to look at related articles”– Participant 10 “It [the tags] might have been useful for searching but like I was looking for specific things like case studies into information man-agement in health care and uh in order to know if the article was relevant or not I had to go into the abstract and you know if the abstract seemed, um, relevant than I would look into the full article you know to get a better idea of whether it’s good or not.”– Participant 2

Participant 9 reported that it would have been help-ful to be able to “select combinations of tags by

clicking on them” a feature which has recently been implemented on another social tagging service, Del.icio.us. This would be similar to the PubMed feature whereby users can combine previous searches to create a new search.

In addition to title, author and abstracts, partici-pants also made use of keywords in PubMed. Some participants made use of various features of PubMed including the details tab which displays their query modified with automatically chosen MeSH headings where appropriate and the MeSH browser itself to select useful keywords for search.

Many participants found that searching PubMed fit with their previous search experience searching journal databases and were quite comfortable with this part of the search process. Both participants 4 and 6 stated that the PubMed interface was much more friendly since it provided a typical online database searching experience with a thesaurus while CiteULike had only user tags. Participant 10 echoed this view, and sug-gested that the tags were too narrow to be useful as opposed to the MeSH subject headings. Other par-ticipants found that their terminology did not match that used in PubMed and that the MeSH browser did not always provide an alternative.

“What I started off with, what I started off with was using some of the words in here [the initial information need] like knowledge man-agement, information organisation and so on. … And in PubMed when those words didn’t work and I was getting nothing, that’s when I started branching out and putting library and trying to figure out like different synonyms, synonyms or uh.”– Participant 2

Participant opinion was also split on the utility of the tags. Many participants felt that the tags were an excellent addition to the system, while others felt they were either too broad or too narrow for an ef-fective search.

“Um, I found that a lot of the keywords I used were already used as keywords in CiteULike, so I think they were good keywords. To use. But because they list several keywords along the bottom, I can pick up new ones as I go. And again, because they’re only one word, I can remember them. Public health, ehealth, health services, it was a kind of recurring term on a lot of the articles that I thought would be useful.”– Participant 5

Page 12: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

248

“Well, I didn’t really find these tags to be par-ticularly useful to be honest. One of the things that kind of bothered me about them is that they weren’t really grouped... you have care and health but you don’t have health care together. You have care, health and informatics. It would be useful if it was healthcare and informatics together as one tag. Instead, because if you just click on health. It’s not applicable at all, you know, and like km is a term, but then knowl-edge and management are separate, which is kind of bothersome.”– Participant 1

Participant 1 included the tag “km” in the final list of keywords, despite having found the tags to be prob-lematic when compared to the more familiar con-trolled vocabularies of traditional databases. Despite not personally deeming the tags useful, the participant must have felt that this tag could be useful to other searchers using CiteULike. A number of participants commented on the use of different terminology for different systems and as previously noted many in-sisted on dividing their final keyword lists by tool.

“Hmmm. Because this is PubMed, we probably don’t need health in here. Because everything is health. Okay, and I probably wouldn’t use km either. It might not be as, uh, common in Pub-Med”– Participant 2

Some participants expressed some confusion at the differences in the visible organisational structures used by PubMed and CiteULike. These participants showed or discussed their confusion when faced with the differences between keywords and tags and the methods used to organise and retrieve informa-tion in the two different online databases.

Interviewer: “OK, now. Which one did you like the best?” Participant: “Oh, the first one, CiteULike.” Interviewer: “What did you like about it?” Participant: “Just because there was more words, reference words. After the words I put in.... they just eventually appeared. I don’t know what I was doing.”– Participant 8

This result suggests that even library and information science students can suffer from confusion when faced with a new and unfamiliar system. Systems where the organisational structures are hidden from them, such as Google, conversely seem to offer less confusion

since users do not seem to feel they need to know anything about how the system works. This may be due to the fact that Google is almost certain to return something no matter how little knowledge a user has of a subject (Fast and Campbell 2004). As participant 7 stated, “It was easy to kind of, uh, expand my search by just clicking on tags. I felt like on PubMed I had to find that one, uh, word that they used.”

Some participants confused tags and descriptors or expressed an unfamiliarity with the concept of multiword subject headings. Participant 5 expressed such concerns stating that the tags on CiteULike were more friendly because they were shorter, ignor-ing that many CiteULike tags are in fact multiword tags joined by various punctuation marks.

“Oddly enough, CiteULike, which is totally regulated by users, I actually found to be the most similar to Library of Congress: again it picks one, short, nice, concise words as subject headings, that lead into a nice broad topic that I can move around in and play with. Um, PubMed was a little unlike anything I’m used to. Its de-scriptors were just too long. I’m sure I could make a go of it eventually, but just sitting down to try initially, it is a little more work than it should be. Even things like digg and delicious, the keywords are usually 2 words long, maybe three. And that actually might be why I find CiteULike easier to use; it’s similar to what I’m used to, like dig and delicious.”– Participant 5

A number of participants discussed issues with the in-terfaces of each system and specifically with the or-ganisational systems used in each system. As previ-ously noted, Participant 9 felt that CiteULike should support the ability to quickly combine tags by clicking on them, a form of filtering for results which is pre-sent in some journal databases and library catalogues (e.g., Endeca http://www.endeca.com/), Endeca’s ILS system allows faceted browsing and filtering.

Other participants expressed a desire for more or-der in online systems, despite often having expressed confusion when faced with this order. This juxtapo-sition of a user-defined need for order and a user-expressed confusion when faced with structured and controlled vocabularies poses significant issues for system designers.

“It would be nice if there was a coherent struc-ture to it as opposed to the way they’ve [Ci-teULike] done it here. Um, other thoughts, I

Page 13: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

249

think if I knew how to use PubMed better I might have been able to get better results but I don’t have the experience. It was just a little overwhelming.”– Participant 1

Participants suggested that CiteULike should adopt additional information organisation techniques and did not in general mention tag clouds or tag lists as options. Despite this, participants also occasionally expressed frustration with PubMed’s search and sug-gested that subject headings should be more promi-nently displayed in the search results. “[I] wanted to be able to have subject headings [in PubMed] visible along with the abstract.”– Participant 9

Participants also noted that in addition to tags, CiteULike also offers the feature that you can see who posted the article and then see other articles and other tags by this same user. “You can search by tags or you can search by people and it also shows the people who are interested in this idea... this search term that I put in.”– Participant 7

This ability to see another person’s tags and arti-cles is a feature that does not have an analogue in a traditional journal database. While tagging itself is similar to the use of controlled vocabulary headings, the association of a user or group with a set of arti-cles is not normally present in a system and such as-sociations are made much more haphazardly by, for example, a colleague’s email about an article. Often, participants seemed to be searching for recommen-dations, a personal touch, in the tags. They appeared to be figuring out that once they were in the right subject area, the tags applied by a particular user could be helpful to them and serve as an important guide to the relevance of tagged items.

While participants’ views were solicited on the search process and their use of interface features and keywords, a key component of this study was the examination of the differences between participant keyword use, statements made in interviews, and the actual search behaviour of participants. While par-ticipants were often quite articulate about their search preferences and behaviours, some inconsis-tencies were observed between participant’s ex-pressed preferences and actual behaviours. 5.4 Participant Search Behaviour

When searching, most participants started with a sin-gle keyword or keyword phrase, but quickly added additional keywords from their initial lists in order to reduce the number of results returned. Some par-

ticipants immediately made quick assessments and modifications to their initial queries, while others took more time to scan the results. Most participants showed a preference for one or the other behaviour but did show some willingness to change behaviours slightly during the search depending on the number of results.

Keywords: health km case studies Actions: scrolls slowly down then up again Keywords: knowledge management case studies Actions: scrolls more rapidly down the page

then up again Keywords: information case studies Actions: scrolls part way down then up again Keywords: library case studies Article: Realizing what’s essential : a case study

on integrating electronic journal manage-ment into a print-centric technical services department (PubMed: 17443247), does not select

Keywords: “information management” – Participant 2

Many participants showed evidence of uncertainty or frustration when searching one or the other system. Participants paused for longer periods, scrolled up and down without making a selection or hovered over items without selecting anything. Many partici-pants also appeared to be browsing the results on the first page to see if they were getting enough relevant results from their search terms before narrowing or broadening their search.

examines metadata, hovers over journal name, hovers over author name, does not select – Participant 9 Pauses for quite some time before scrolling up and down the hit list. Doesn’t go past p. 1 – Participant 5 Public health information doesn’t scroll: just clears search box Education and health care no scrolling; clears search box again – Participant 8

Participants seemed to occasionally be confused by the differences between controlled vocabularies (such as MeSH descriptors) and tags. It was fairly common for participants to use incorrect terminol-ogy to identify their use of terms when searching.

Page 14: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

250

“Um, yes. I found it difficult to actually deter-mine what’s relevant, because the subject head-ings—they’re basically a sentence. And re-membering what’s been said, if there’s 1 2 3 4 5 in each one and I have 2 or 3 up, its kinda hard to determine a pattern. … They did, in that, um if I could remember any recurring words in those sentence-long subject headings, I could write them down and try them again for the next search. It wasn’t as easy as remembering one key word on CiteULike, it was trying to read a sentence, picking what might be an ap-propriate term from that sentence, read the next sentence, and try to compare the two sen-tences for matching key words that might be useful. It was a lot more work, PubMed...” – Participant 5 health km case studies scrolls slowly down then up again “Hmmm.. Because this is PubMed, we probably don’t need health in here. Because everything is health. Okay, and I probably wouldn’t use km ei-ther. It might not be as, uh, common in PubMed so...” – Participant 2 (initial search on PubMed)

All participants used Boolean searching in both Pub-Med and CiteULike in order to narrow their search and appeared to expect it to be present as only a few of the participants asked the interviewers if Boolean search was supported. Most participants also used truncation, again expecting it to be supported. One participant even used the near operator in a search of CiteULike. Like PubMed, CiteULike does indeed support truncation, wild cards and Boolean search (though only with symbols) but it does not in fact support near as an operator (http://www.citeulike. org/search_help).

“information 2N organization” and “health in-formation” and “case stud*” – Participant 10

All participants used internet searching techniques such as quotations to indicate a phrase search and many also dropped the AND in boolean searches as expected on Google.

Many participants expressed a desire for an ab-stract with the retrieved records on PubMed and CiteULike and their searching behaviour bore out this desire. Participants selected, hovered over or scrolled slowly through abstracts and even parts of articles to determine relevance.

user examined article 561415, scrolled past other metadata to read abstract – Participant 2 scrolls up and down, locates article link and se-lects, scrolls to read first few pages of article – Participant 2

Tags were used by a number of the participants despite many claims to the contrary. However, participants may not have felt that their use was sufficiently close to the concept of “using a tag as a search term” to constitute the sort of use the interviewers wanted. A number of participants stated that they did not use the tags, although they had clicked on or otherwise exam-ined them or even used them in query lists as partici-pant 2 did in the previous excerpt. This suggests that participants may see clicking on subject terms in order to browser the results as a distinct activity from searching using a subject term.

“One of the articles used km. I wonder if that would help.” – Participant 2 Query: “health information” km “case stud” – Participant 2 selects tag labelled healthcare – Participant 10 Scrolls down list and hovers over tags momen-tarily – Participant 3 mouse hovers over tags; clicks on tag bioin-formatics – Participant 9 Selects tag “health-information” from first arti-cle in hit list Get’s “cyrille’s health information [8 articles]” – Participant 5 clicks on tag partners-in-health, but does not select article, returns to main list Health information systems: failure, success and improvisation (CiteULike 312350) pauses over abstract for a short period, then se-lects this article clicks on tag health-care, scrolls down, scrolls up and returns to main search list – Participant 1

Participants also used descriptors in PubMed. Some even selected these descriptors from the MeSH browser or the details tab after an initial search.

Page 15: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

251

“Um, really only 2 that immediately jumped out; um, managed care seems to be actually like the key term for both of them. So, if I were to continue I’d probably search that to see what else comes up.” – Participant 5 Actions: Examines details tab (“Health Inf Manag”[Journal] OR “HIM J”[Journal] OR (“health”[All Fields] AND “information”[All Fields] AND “management”[All Fields]) OR “health information management”[All Fields]) Keyword: Health information management – Participant 3 selects MeSH search to find keywords health information management clicks on Management information systems – Participant 9

Participants used a number of other features of both systems including related articles links in PubMed and group names in CiteULike. This suggests that it would be most useful to provide users with a list of other items with similar subject headings or tags and as much additional metadata as possible to allow the user to browse related items by as many different de-finitions of related as possible (see Ockerbloom 2006). A related article style feature has been imple-mented in the University of Pennsylvania’s Online Books page subject search as a test (http://online books.library.upenn.edu/subjects.htm).

Selects an article after a traditional keyword search then returns to main list, scrolls slowly. Returns to previously selected article. Clicks on user name Evidence-based-medicine (group). Scrolls slowly. Selects article: Information re-trieval and knowledge discovery utilising a bio-medical Semantic Web (CiteULike 405826) – Participant 9 Selects tag cloud for user who posted the [cur-rent] article. Hovers briefly, selects list of [this user’s] recent articles. – Participant 4

A number of participants selected articles from article lists that had been posted under a specific tag by a specific user or user group on CiteULike. While tags themselves can be seen as an analogue to subject head-ings or descriptors in a traditional journal database, there is no real analogue in traditional information or-ganisation to that of the CiteULike user or group.

This recognition that specific users may provide an additional level of information organisation is a new feature of social tagging systems. Even users who did not actively use user or group names in their search process showed recognition of the presence of users.

“Um, I found this one [CiteULike] easier to navigate, just because of having actual key one-word subjects. So, I’m looking for knowledge management, then I can just type in knowledge management, and if that user’s already book-marked lots of articles on knowledge manage-ment. I can see what they have on their list. Yeah, I found this one much easier to use.” – Partici-pant 5 selects tag cloud for user who posted the article; hovers briefly, selects list of recent articles – Participant 9 Health services (494 articles) scroll down mouse-over username groups interested in health services back to search box – Participant 8

One participant did not find anything useful on Ci-teULike using the tags by themselves; in fact that participant stated that they were too narrow, but did use user and group names to select articles, finally selecting an article from a user group on CiteULike and an article from a user’s list of articles.

In addition to subject terms such as descriptors and tags, users made use of other special terms for searching, specifically journal names.

Keywords: “Health Inf Manag”[Journal] Actions: Scrolls down slowly, selects article Article: Health online: a health information ac-tion plan for Australia (PMID: 11143002) Notes: After selecting this journal, participant selected all other articles from this list by sim-ply scrolling until an interesting article was reached, occasionally, the participant scrolled back up to an article slightly higher on the list – Participant 3 selects journal name as search term J AHMA[Journal] – Participant 2

Page 16: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

252

Additionally, participants used the related article links in PubMed to locate relevant articles. Many participants praised this feature and considered it to be just as important or possibly more important than subject headings for locating relevant articles.

Goes to this article; scrolls down (scanning ab-stract); goes to Related Links; mouseovers dif-ferent links. – Participant 5

“It’s too bad there’s no abstract.” does not se-lect article, but examines related articles on the side and selects one. – Participant 2 Action: pauses for a long time over an abstract, decides to select after all but is not sure of rele-vance, returns to main list and scrolls slowly Participant: “Would it help to use these related links?” – Participant 1

One participant suggested that the tags were actually most useful as a form of related article link, rather than as subject headings. “[I thought] I wasn’t using the tags, but I was actually using them to look at re-lated articles” – Participant 10. This participant sho-wed an awareness of the relationships between the tags assigned to the same article and tags assigned to multiple related articles and was able to suggest a way in which tags or subject headings could be used to en-hance traditional search systems by providing explicit lists of articles with similar tags or subject headings rather than just supplying a list of subject terms.

Despite the fact that participants exhibited a fair amount of thought and care in the selection of their keywords and in the use of additional features for lo-cating relevant materials, many participants spent a great deal of time scrolling through long lists of re-sults or entering minor variations on their search query and anxiously examining the size of their result sets.

Notes size of result set and tries another query without scrolling – Participant 1 “That didn’t work.” Actions: participant continually enters key-words, performs the search and does not scroll before entering new search terms Keyword: information management Keyword: information organization

Keyword: knowledge management Keyword: knowledge management case studies Keyword: “information science” Keyword: knowledge organization – Participant 2

These behaviours suggest that users were concerned with selecting good sources and did not find that searching with keywords all by itself was sufficient to help them reach this goal. Many participants praised such features as the related article lists provided in the PubMed interface, and other participants made use of tags and tag clouds, user names, and even group names in CiteULike to help them locate promising relevant articles that were related to an article they found rele-vant, a set of keywords they felt were relevant or a user who appeared to be collecting relevant articles. 6.0 Discussion This study examined the relationship between user tags and the process of resource discovery from the perspective of a traditional library reference interview in which the system was used, not by an end user, but by an information intermediary who was trying to find information on another’s behalf. Searching by an intermediary, or mediated search, is a traditional li-brary and information science task tied directly to im-portant library skills in information sources and ser-vices and information organisation. Strong LIS ele-ments were present in the search behaviour of the par-ticipants. Participants discussed the importance of learning how the search function works on a system when beginning a search and how this can affect the results. They discussed narrowing and broadening searches and selecting specific terms as search terms. They used Boolean search, truncation, and even the NEAR operator. They talked about finding different synonyms and antonyms, and were aware of the common (to librarians) paradox that in a health data-base the word “health” is so common that it could al-most be considered a stopword. Participants were able to bring a set of LIS perspectives to the search proc-ess, regardless of their relative skill or lack of skill in searching, which helped to frame their expectations for each system. Although this could be seen as a limi-tation of the study in terms of application to broader user groups, it provides real insight into how tagging systems could be adopted into library and information science systems and practices.

One issue that cannot be ignored in information retrieval studies is Google. Google’s pervasiveness,

Page 17: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

253

search techniques, assumptions, and interface have become such a large part of the common Internet experience that all search systems are judged against its apparent ease of use (Fast and Campbell 2004). Participants in this study used many Google style search techniques and assumptions, including adding additional keywords from the initial lists, in order to reduce the results returned. In many cases, partici-pants assumed the use of Google style Boolean search where the AND is simply understood as well as the use of quotes to signify a phrase search. All of these search behaviours suggest that Google style search has become a standard, thus perhaps explain-ing the confusion felt by some participants when us-ing systems with more obviously complex features. If this is true, tagging systems and library systems will need to consider the impact of the confusion caused by the fact that these systems demand more than the ubiquitous Google search box. 7.0 Conclusions The preliminary study showed that participants did use the tags to aid in the search process, selecting tags to see what articles would be returned. They also used the tags as a guide to suggest further search terms, suggesting that users do indeed pay attention to subject headings and metadata if they fit a pattern users recognise or make sense in the context of their existing knowledge on the subject. Interestingly, many participants stated that they had not used the tags, though examination of the search process showed that they had been using them as links to re-lated articles or sources of search terms. It is possible that they had not considered this to be a full use of the tags as they were not necessarily using the tags as subject headings or search terms.

Participants generally used the same number of keywords for both lists, though many insisted on di-viding the final keyword list up by tool. Despite this, the most commonly used terms tended to be the sa-me in each case and knowledge management was ge-nerally selected as a useful term for each tool despite the fact that it is not present in MeSH as a descriptor or as entry vocabulary.

Participants reported a number of interface issues which they found degraded or enhanced the search process. Items such as the presence of full metadata, abstracts, and even full text links to articles were lau-ded, while lack of vocabulary terms and, especially, missing abstracts were deemed to be impediments to search. Participants found related article links and

other newer features of systems to be a significant enhancement to the search process, and some par-ticipants reported or were seen using tags or user names in CiteULike for similar purposes.

These findings suggest that users would find di-rect access to the thesaurus or list of subject head-ings showing articles indexed with these terms to be a distinct asset in search. Many of the participants in this study made use of the related articles links pro-vided by PubMed and were intrigued by the possi-bilities of the tags on CiteULike but did not find that the structures were in place to fully support browsing of related items by keyword or combina-tion of keywords.

As shown by Ockerbloom (2006) and in previous research into end-user and search thesaurii (Nielson 2004, 60; Shiri and Revie 2005; Blocks, Cunliffe, and Tudhope 2006; Shiri and Revie 2006) these webs of re-lated items can be built automatically using existing thesaurus structures and displayed to the user. This suggests that indexing and classification structures are fertile ground for the development of newer and bet-ter interfaces to document collections as demon-strated by the interest in browsing and combining tags to create a web of related documents, a web which of-ten already exists in traditional databases but has gen-erally been hidden from the user’s view. References Allen, Laurie, and Michael, Winkler. 2007. PennTags:

Creating and using an academic social bookmark-ing tool. Proceedings of the ACRL 13th National Conference, Baltimore, MD, USA, March 29-April 1, 2007.

Beghtol, Clare. 2003. Classification for information retrieval and classification for knowledge discovery: Relationships between “professional” and “naive” classifications. Knowledge organization 30: 64-73.

Blocks, Dorothee; Cunliffe, Daniel; and Tudhope, Douglas. 2006. A reference model for user-system interaction in thesaurus-based searching. Journal of the American Society for Information Science and Technology 57: 1655–65.

Cleverdon, Cyril. 1967. The Cranfield tests on Eng-lish language devices. Aslib proceedings 19n6: 173-194.

Fast, Karl V. and Campbell, D. Grant. 2004. ‘I still prefer Google’: University student perceptions of searching OPACs and the Web. Proceedings of the 67th Annual Meeting of the American Society for Information Science and Technology, Providence,

Page 18: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

254

Rhode Island, USA, November 13-18 Vol. 41, pp. 138-146.

Hammond, Tony; Hannay, Timo; Lund, Ben; and Scott, Joanna. 2005. Social bookmarking tools (I): A general review. D-Lib magazine 11n4. Available http://www.dlib.org/dlib/april05/hammond/04hammond.html (accessed May 31, 2009).

Heymann, Paul; Koutrika, Georgia; and Garcia-Molina, Hector. 2008. Can social bookmarking improve web search? First ACM International Conference on Web Search and Data Mining (WSDM’08), February 11-12, 2008, Stanford, CA, USA. Available http://ilpubs.stanford.edu:8090/ 858/ (accessed May 31, 2009).

Jones, William; Phuwanartnurak, Ammy J.; Gill, Ra-jdeep; and Bruce, Harry. 2005. Don’t take my folders away! Organizing personal information to get things done. CHI 2005, April 2-7 2005, Port-land, Oregon, USA. Available http://hdl.handle. net/1773/2031 (accessed May 31, 2009).

Kipp, Margaret E.I. 2005. Complementary or dis-crete contexts in on-line indexing: A comparison of user, creator and intermediary keywords. Ca-nadian journal of information and library science 29: 419-36. Available http://dlist.sir.arizona.edu/ 1533/ (accessed May 31, 2009).

Kipp, Margaret E. I. 2007. @toread and cool: Tagging for time, task and emotion. Proceedings of the 8th Information Architecture Summit, Las Vegas, USA, March 22-26. Available http://dlist.sir.arizona.edu/ 1947/ (accessed May 31, 2009).

Kipp, Margaret E.I. and Campbell, D. Grant. 2006. Patterns and inconsistencies in collaborative tag-ging practices: An examination of tagging practices. Annual General Meeting of the American Society for Information Science and Technology, Austin, TX, USA, November 3-8, 2006. Available http://dlist.sir. arizona.edu/1704/ (accessed May 31, 2009).

Krug, Steve. 2006. Don’t make me think: A common sense approach to web usability. 2nd ed. Berkeley: New Riders Publishing.

Kwasnik, Barbara H. 1991. The importance of fac-tors that are not document attributes in the or-ganisation of personal documents. Journal of documentation 47: 389-98.

Malone, Thomas W. 1983. How do people organize their desks? Implications for the design of office information systems. ACM transactions on office information systems 1n1: 99-112.

Markey, Karen. 2007a. Twenty-five years of end-user searching, Part 1: Research findings. Journal of the

American Society for Information Science and Tech-nology 58: 1071-81.

Mathes, Adam. 2004. Folksonomies - Cooperative classification and communication through shared metadata. Adammathes.com. Available http://www. adammathes.com/academic/computer-mediated-communication/folksonomies.html (accessed May 31, 2009).

Nielsen, Marianne Lykke. 2004. Thesaurus construc-tion: Key issues and selected readings. Cataloging & classification quarterly 37n3: 57-74.

Ockerbloom, John Mark. 2006. New maps of the li-brary: Building better subject discovery tools using Library of Congress Subject Headings. Working Pa-per for the CNI Task Force Meeting, December 5, 2006. Available http://repository.upenn.edu/library _papers/48/ (accessed May 31, 2009).

Oppenheim, Charles; Morris, Anne; and McKnight, Cliff. 2000. The evaluation of WWW search en-gines. Journal of Documentation 56: 190-211.

Quintarelli, Emanuele; Resmini, Andrea; and Rosati, Luca. 2006. FaceTag: Integrating bottom-up and top-down classification in a social tagging system. Proceedings of the 2nd European Information Ar-chitecture Conference, Berlin, Germany, September 30-October 1. Available http://www.facetag.org/ download/facetag.pdf (accessed May 31, 2009).

Schwartz, Candy. 2008. Thesauri and facets and tags, oh my! A look at three decades in subject analysis. Library trends 56 no.4: 830-42.

Shiri, Ali and Revie, Crawford. 2005. Usability and user perceptions of a thesaurus-enhanced search interface. Journal of documentation 61: 640-56.

Shiri, Ali and Revie, Crawford. 2006. Query expan-sion behavior within a thesaurus-enhanced search environment: A user-centered evaluation. Journal of the American Society for Information Science and Technology 57: 462-78.

Shirky, Clay. 2005. Ontology is overrated: Categories, links, and tags. Shirky.com. Available http://shirky. com/writings/ontology_overrated.html (accessed May 31, 2009).

Strauss, Anselm, and Corbin, Juliet. 1990. Basics of qualitative research: Grounded theory procedures and techniques. London: Sage.

Tang, Muh-Chyun, and Sun, Ying. 2003. Evaluation of web-based search engines using user-effort meas-ures. Libres 13n2: 1-11. Available http://libres. curtin.edu.au/libres13n2/tang.htm (accessed May 31, 2009).

Trant, Jennifer. 2006. Exploring the potential for so-cial tagging and folksonomy in art museums:

Page 19: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 M. E. I. Kipp and D. G. Campbell. Searching with Tags: Do Tags Help Users Find Things?

255

proof of concept. New review of hypermedia and multimedia 12n1: 83 - 105. Available http://www. archimuse.com/papers/steve-nrhm-0605preprint. pdf (accessed May 31, 2009).

Yoon, Jungwon. 2009. Towards a user-oriented thesau-rus for non-domain-specific image collections. In-formation processing and management 45: 452-68.

Page 20: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

256

Support from Bibliographic Tools to Build an Organizational Taxonomy for Navigation:

Use of a General Classification Scheme and Domain Thesauri

Zhonghong Wang*, Abdus Sattar Chaudhry**, and Christopher Khoo ***

* 625 Meadowlark Street, Troy, IL 62294 <[email protected]> ** Department of Library and Information Science, College of Social Science,

Kuwait University, Kuwait 13060 <[email protected]> *** Wee Kim Wee School of Communication & Information, Nanyang Technological

University, 31 Nanyang Link, Singapore 637718 <[email protected]>

Dr. Wang obtained her PhD degree from the Nanyang Technological University, Singapore, in 2010. Before joining NTU, she had been a librarian in university libraries in China for more than 10 years. She currently is working in a public library in the United States. Her research interests are in the area of taxonomies, ontologies, metadata, and social tagging.

Dr. Chaudhry obtained his PhD from the University of Illinois at Urbana-Champaign in 1985. He was Head of Division of Information Studies, Wee Kim Wee School of Communication and Informa-tion, Nanyang Technological University, Singapore, from 2003 to 2008. Before joining NTU, Dr. Chaudhry had held teaching and professional positions in USA, Saudi Arabia, Pakistan, and Malaysia. He is currently in the College of Social Sciences, Kuwait University. His current research focuses on information organization and knowledge management.

Dr. Khoo obtained his PhD at Syracuse University and his MSc in Library & Information Science at the University of Illinois, Urbana-Champaign. He is the head of the Division of Information Studies at Nanyang Technological University, Singapore, where he teaches courses in knowledge organization, information architecture, data mining and Web-based information systems. He has also worked for several years as a science reference librarian, cataloger and online information searcher at the National University of Singapore Libraries. His main research interests are in text mining (information extrac-tion and text summarization), medical decision support system, knowledge organization, and human categorization behavior.

Wang, Zhonghong, Chaudhry, Abdus Sattar, and Khoo, Christopher. Support from Bibliographic Tools to Build an Organizational Taxonomy for Navigation: Use of a General Classification Scheme and Domain Thesauri. Knowledge Organization, 37(4), 256-269. 25 references. ABSTRACT: A study was conducted to investigate the capability of a general classification scheme and domain thesauri to support the construction of an organizational taxonomy to be used for naviga-tion, and to develop steps and guidelines for constructing the hierarchical structure and categories.

The study was conducted in the context of a graduate department in information studies in Singapore that offers Master’s and PhD programs in information studies, information systems, and knowledge management. An organizational taxonomy, called Information Studies Taxonomy, was built for learning, teaching and research tasks of the department using the Dewey Decimal Classification and three domain thesauri (ASIS&T, LISA, and ERIC). The support and difficulties of using the general classifi-

Page 21: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

257

cation scheme and domain thesauri were identified in the taxonomy development process. Steps and guidelines for construct-ing the hierarchical structure and categories were developed based on problems encountered in using the sources. 1.0 Introduction Taxonomies are increasingly being used to organize content within organizations and to support naviga-tion of web sites or digital repositories. Several writ-ers have advocated using a top-down approach and classification schemes and thesauri as sources for building organizational taxonomies (Iyer 1995; Ait-chison et al. 2000; Conway and Sligar 2002; Cisco and Jackson 2005). This would allow the taxonomies to leverage on the strengths and principles underly-ing existing classification schemes and thesauri (McGregor 2005; Saeed and Chaudhry 2002), and enable the taxonomies to be developed with less ef-fort than starting from scratch (Wyllie 2005). At the same time, it has been pointed out that organiza-tional taxonomies are different from classification schemes and thesauri in scope, components and roles (Wang et al. 2006). The coverage of organizational taxonomies depends more on the activities of the or-ganizations and interests of the stakeholders. The hi-erarchical structure used for navigation is expected to be more flexible and simpler; and the categories of taxonomies must be intuitive to intended users. The construction of an effective organizational taxonomy that supports navigation needs to incorporate the organizational context and take into consideration its navigational role while using components of clas-sification schemes and thesauri.

Several taxonomy projects (McGregor 2005; Saeed and Chaudhry 2002; Bertolucci 2003) have used classification schemes and thesauri to build tax-onomies. These projects demonstrated that biblio-graphic tools have the potential of providing the knowledge context and terms of categories (Saeed and Chaudhry 2002). Taxonomies built based on them would share the consistency of the classifica-tion schemes and controlled vocabularies (McGregor 2005). But these projects did not incorporate the or-ganizational context, the activities of the organiza-tions, and interests of the stakeholders, in the tax-onomy development process. The organizational context was missing in the medical taxonomy devel-opment process that used MeSH (McGregor 2005). The pilot study in the computer science domain of using Dewey Decimal Classification and IEEE Web Thesaurus (Saeed and Chaudhry 2002) did not de-fine the application scope of the taxonomy. The Snoopy taxonomy built using DDC (Bertolucci

2003) was composed of 50 categories that indicated the narrow scope of the project. In the SeSDL edu-cational taxonomy, the “subjects” facet was based on the ten main classes of DDC, and the other facets used the British Education Thesaurus as one source of categories. However, no details of the develop-ment process have been reported. The prototype of the taxonomy was accessible on the Internet. In other words, an empirical study of building an or-ganizational taxonomy by using classification schemes and thesauri is still lacking.

We conducted an empirical study of building an organizational taxonomy using a general classifica-tion scheme and domain thesauri keeping in view the previous taxonomy projects. The objectives of the study are: 1) to review the capability of a general classification scheme and domain thesauri in sup-porting an organizational taxonomy that is used for navigation; and, 2) to develop steps and guidelines for constructing the hierarchical structure and cate-gories. We hope that the report of advantages and problems we encountered in using the general classi-fication scheme and domain thesauri will provide a lesson for using sources of bibliographic tools. We also hope that the steps and guidelines we developed will be helpful for other organizations to build tax-onomies. 2.0 Research Approach The empirical study was conducted in the context of an academic organization (a graduate school) in the information studies domain, the Division of Infor-mation Studies, School of Communication and In-formation, Nanyang Technological University, Sin-gapore. The Division has 15 full-time faculty mem-bers and nearly 500 students, and offers three Mas-ter’s programs by coursework: MSc in Information Studies, Information Systems, and Knowledge Man-agement, and Master’s and PhD programs by re-search. The students in the MSc coursework pro-grams focus on courses and project reports in the Critical Inquiry course that involves small group re-search projects. The students in the research pro-grams focus on research projects and theses. The Di-vision has four main research groups: Information and Knowledge Management, Knowledge Organiza-tion and Discovery, Information Retrieval and Digi-tal Libraries, and User and Usability Studies. We se-

Page 22: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

258

lected the Division as an academic organization be-cause it has explicit goals and divisions of people that are compatible with the two essential features of “goal-oriented” and “coordinated human activities toward the common goal” in slightly different defi-nitions of organizations (Barnard 1938; Schein 1970; McAuley et al. 2007), and a considerable scale in li-brary and information science education.

Dewey Decimal Classification (DDC) and three domain thesauri, ASIS&T, LISA, and ERIC thesauri, were chosen as sources. DDC was selected because its structure makes it easy to navigate and it has been used in previous projects related to navigation (Saeed and Chaudhry 2002; Vizine-Goetz 2002). The two thesauri (ASIS&T and LISA) in the library and in-formation science area, and the ERIC education the-saurus, were selected based on their relevance to the subject coverage of the taxonomy.

An organizational taxonomy, called Information Studies Taxonomy, was built for the Division. We de-signed three phases to develop the organizational tax-onomy keeping in view guidelines suggested in the lit-erature (Roberts-Witt 2000; Conway and Sligar 2002; Choksy 2006, Lambe 2007; Sharma et al. 2008). The first phase is taxonomy needs identification. We exam-ined the goals, major tasks of the Division, and the major stakeholders across the tasks. Interviews with 17 stakeholders were conducted to investigate the stakeholders’ tasks, created knowledge assets, and problems encountered in locating information re-sources for performing tasks. The use of existing knowledge organization systems in the Division intranet was also examined. The second phase is tax-onomy design. We determined the taxonomy objec-tives, roles, target users, organization scheme (facets), subject coverage, and target content based on prob-lems that the taxonomy aimed to address, the activi-ties of the Division, and the needs of stakeholders. The last phase is the taxonomy construction.

We constructed the taxonomy with a focus on the subject facet, keeping in view the research objectives. The hierarchical structure and categories of the sub-ject facet was manually constructed via a combina-tion of top-down and bottom-up approach and with a focus on the top-down. The construction of the hierarchical structure started from the top-level, the main categories, high-level categories (level 2, 3), and low-level categories (level 4, 5). In addition to DDC and the domain thesauri, sources related to the tasks of the stakeholders such as course materials and staff publications, sources from the community of library and information science schools (LIS) (e.g., course

descriptions on websites of LIS schools), sources from relevant professional associations (e.g., IFLA Guidelines for Professional Library/Information Educational Programs 2000), and relevant domain taxonomies (Hawkins et al. 2003; Mentzas 1994; Doke and Barrier 1994; Cheung et al. 2005) in in-formation systems and knowledge management were also selected. Keeping in view suggestions in the lit-erature (Cheuk 2002; Hunter 2005; Pack 2002; Bat-ley 2005; Raschen 2006; Dickson 2008), we made an effort to incorporate the stakeholders’ interests and perspectives in the construction of the hierarchical structure and categories by employing relevant sour-ces. We employed those additional sources when we found that DDC and the domain thesauri were not adequate. As recommended by Wyllie (2005), Lambe (2007), Dickson (2008), Singhal and Nath (2008), in the last construction step, we delivered the taxon-omy draft to 11 stakeholders for review. 3.0 Information Studies Taxonomy Figure 1 illustrates the objectives, roles, and intended users of the taxonomy. The taxonomy was expected to support the learning/teaching and research tasks in the Division. It focuses on two groups of users: graduate students and instructors. The taxonomy will support navigation and knowledge discovery of a digital re-pository that is relevant to the students and instruc-tors’ learning/teaching and research tasks, such as course materials, research project reports, disserta-tions, and so on.

The taxonomy draft comprises 7 facets and about 540 categories. The subject facet comprises about 440 categories ranging from 2 to 5 levels. A full list-ing of the taxonomy prototype with references and sources of labels has been reported in a previous pa-per (Wang et al. 2008). A brief display of the 7 facets and the 12 main categories (top-level) of the subject facet is shown in Fig. 2. 4.0 Support of a General Classification Scheme

and Domain Thesauri 4.1 A General Classification Scheme (DDC) We had assumed that most of the main categories (top-level of the subject facet) could be selected or adapted from DDC. But it is not often the case that the subject coverage of an organizational taxonomy falls in the main classes or sub-classes of a general classification scheme, and the main categories repre-

Page 23: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

259

Figure 1. Objectives, roles, and intended users of the Information Studies Taxonomy

Figure 2. Overview of the Information Studies Taxonomy

Page 24: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

260

senting the subject coverage can be selected. The subject coverage of this taxonomy that was deter-mined based on the programs, research groups in the Division, and the tasks of the stakeholders, was not in line with the ten main classes and sub-classes in the library science and computer science schedule of DDC. The 12 main categories of the subject facet previously listed were not selected from DDC.

We also found that not all high-level categories (level 2, 3) within the main categories could be se-lected from DDC. The high-level categories within 5 of the 12 main categories (less than 50%) were se-lected from DDC. Table 1 lists the 5 main categories and relevant DDC classes. The 5 main categories were fairly similar to relevant sub-classes of DDC. For ex-ample, Information Institutions is similar to 026-027 Library and Information Sciences—Specific Kinds of Institutions; The Information Industry to 338.1-338.4 Economics—Production—Specific Kinds of Indus-tries; and high-level categories within the two main categories of Collection Management and User Ser-vices, and Information and Knowledge Organization, were selected from the 025 Operations of Libraries, Archives, and Information Centers.

Main Categories (relevant) DDC Information Institutions 026, 027 Collection Management and User Services

025.2, 025.5, 025.6, 028

Information and Knowledge Organization 025.3, 025.4

Information Technologies 004, 005, 006, 384.1-384.6

The Information Industry 338.47001-338.47999

Main Categories / hierarchies (non-relevant) Reasons

Information and Knowledge Management / Archives man-agement

No related classes

Information Searching and Re-trieval / Information storage and retrieval systems

Not compatible with the users’ per-spectives (025.04, 025.06)

The Information Profession No detailed struc-ture (020.9)

Education and Training No detailed struc-ture (020.07)

Table 1. Relevant main categories and examples of non-relevant main categories

High-level categories within other 7 main categories were not from the DDC in three situations. Table 1 lists examples of the main categories and the reasons

why. First, some areas did not fall in the main classes of DDC. For example, no specific classes were re-lated to the areas of archives management, document management, knowledge management, and scholarly writing. Second, DDC represented some areas in one class but could not provide detailed structures, for example, library and information science education (020.7), information professionals (020.9), and re-search methodologies (001.4). These two situations are probably typical because organizational taxono-mies are different from general classification sche-mes in the nature of the subject coverage, and tax-onomies used for navigation would require more de-tailed categories for tagging resources other than book collection. Another possible situation is that the structures provided by the classification schemes might not be compatible with the perspectives of the intended users, as experienced by Bertolucci (2003). For example, the structure of the 025.04 class pro-vided by DDC was not adopted because it organized types of information retrieval systems by subjects and persons. Such a structure would not fit needs of the students and instructors for learning or teaching and research in the area. Our findings suggested a general classification scheme would not be sufficient to construct the hierarchical structure.

Within the 5 relevant main categories, 15 out of 29 (52%) categories at level 2 and 44 out of 106 (41.5%) categories at level 3 were identified from DDC. Table 2 lists examples of the categories. These categories were identified from classes or relative in-dex terms of the DDC. For example, within the main category of Information Institutions, the three categories at level 2, Archive, Libraries, and Informa-tion Centers, were identified from the 026 and 027 classes. Also, within the main category of Collection Management and User Services, the three categories at level 2 and some at level 3 were identified from the 025, 028 classes, and Relative Index terms.

In addition to the high-level categories, we ob-served that general classification schemes might pro-vide support for the low-level categories, especially those based on the “genus/species” division. Within the 5 relevant main categories, lower categories at level 4 and 5 (14 out of 31 hierarchies) were identi-fied from the DDC: 9 of the 14 hierarchies are based on the “genus/species” division. For example, the lower categories within the hierarchy of Classifica-tion Schemes, Controlled Vocabularies, Special Ma-terial Cataloging, which are based on the “ge-nus/species” division, were identified from the 025.3 and 025.4 classes.

Page 25: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

261

However, we found that the DDC might not fully support the high-level categories within the 5 relevant main categories; while it would play a major role. In this case, about half of the high-level categories within the 5 main categories were not from DDC. These categories were not from the DDC in three situations. The first situation was that it was difficult to make use of the discipline-based main classes, to represent cate-gories within the main category of Information Tech-nologies from the 000 schedule that focused on com-puter science. Similarly, categories within Archive and Information Centers were selected from the ASIS&T and LISA thesaurus because the 020 schedule focused more on librarianship. The second situation was that it was difficult to fully make use of classes that allows number-building, for example, to identify categories within the main category of The Information Industry from the 338.47001-338.47999 class. Similarly, lower categories in the hierarchy of Computer Applications in Information and Knowledge Organization could

not be from DDC because DDC used only one class (025.30285) to represent this area. The above two si-tuations could be typical because organizational tax-onomies and general classification schemes are differ-ent in the nature of the subject coverage and applica-tions. The third situation was that DDC did not cover some new concepts, such as, metadata and social tag-ging. The high-level category of Resource Description within the main category of Information and Knowl-edge Organization was thereby added from the Divi-sion’s course materials to accommodate the new con-cepts. Similarly, categories such as Ontologies, Cata-loging Outsourcing, Semantic Networks, and Mobile Communications were added from the thesauri. Also, categories could be added for the purpose of adding the taxonomy features. For example, the category of Collection Measurement was added within the hierar-chy of Collection Development to make the hierarchy comprehensive and consistent with other hierarchies. The above findings, as previously pointed out, indi-

Table 2. Examples of categories identified from DDC

Page 26: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

262

cated that more than a general classification scheme had to be employed to construct the hierarchical structure of organizational taxonomy.

A general classification scheme may not support the division criterion that selects categories at the same level for organizational taxonomies. General facets such as subjects, geography, and persons used in DDC were not appropriate for the taxonomy with a narrow coverage. Table 3 summarizes the support of compo-nents of DDC for the organizational taxonomy. 4.2 Domain Thesauri As we expected, most categories (concepts and terms) of the subject facet were from the three do-main thesauri. But we found that they were not suf-ficient for reflecting the organization, interests of the stakeholders, and features for navigation. About 16.6%, 71 out of 427 categories, were not from the thesauri. These categories can be grouped into new concepts, compound terms, and terms particularly related to the taxonomy domain and the organiza-tion. Table 4 lists examples of these categories. New concepts, such as media resource centers, collabora-tive tagging, mobile information retrieval systems, and digital watermarking, reflected the interests of the stakeholders. Among the new concepts, terms in the area of information management and knowledge

management were not from the thesauri because they treated these areas from a broad perspective, such as information science or knowledge, and these concepts were used in the Division more from the organizational communication perspective. Com-pound terms, such as archival collection develop-ment and audiovisual material cataloging, were nec-essary for the hierarchical structure to support navi-gation. Other terms, such as knowledge management professionals, knowledge management education, li-brary and information science schools, and informa-tion system development methodologies, reflected the organization as well as the taxonomy domain. The above findings suggest that more than the do-main thesauri had to be used to collect category terms and concepts for the organizational taxonomy.

We observed that the hierarchy of terms in a the-saurus has the potential for supporting high-level cate-gories. For example, the five categories at level 2 within the main category of The Information Society were identified from the Hierarchical Index of ASIS&T thesaurus. We found that the term relation-ships of thesauri were helpful for identifying low-level categories. Most relevant low-level categories were identified from the term relationships of the thesauri. A small number of low-level categories were not from them in two situations related to the scope of the thesauri and division’s interior for creating narrow

Table 3. Support of DDC and the domain thesauri for the Information Studies Taxonomy

Page 27: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

263

terms. For example, narrow terms provided by the ERIC thesaurus in the area of quantitative research methodologies were not adopted because they were not in line with the “genus/species” division used in the hierarchy. As concluded by Saeed and Chaudhry (2002), we found that term relationships of thesauri were also helpful for identifying the scope of terms. For example, as previously mentioned, the terms in the area of information and knowledge management provided by ASIS&T and LISA thesauri were not adopted because they were not appropriate for the or-ganizational context. The scope of these terms was identified by their term relationships. Table 4 lists the support of components of the domain thesauri for the organizational taxonomy.

New concepts – Media resource centers – Collaborative tagging – Mobile information retrieval

systems – Digital watermarking – Mobile commerce – Information development – Information audit – Information distribution – Information sharing – Information utilization – Knowledge audit – Knowledge development – Knowledge capture – Knowledge sharing – Knowledge utilization Compound terms – Archival collection develop-

ment – Electronic collection develop-

ment – Audiovisual collection deve-

lopment – Audiovisual material cataloging – Cartographic material catalo-

ging – Digital resource cataloging Terms related to the organization

– Knowledge management pro-fessionals

– Information science & systems education

– Knowledge management edu-cation

– Library and Information Sci-ence Schools

– Information system develop-ment methodologies

– Oral presentation skills

Table 4. Examples of categories from sources other than the three thesauri

5.0 Difficulties Encountered The major difficulties we encountered involved par-tial support of DDC and domain thesauri, manipula-tion of multiple sources, and incorporation of stake-holders’ interests and perspectives in the construc-tion of the hierarchical structure and categories. The main categories and hierarchical structures within some main categories had to be constructed without the help of the DDC. Concepts and terms of catego-ries had to be collected from sources more than the domain thesauri.

The selected sources (classification schemes and thesauri) may represent the same concepts from dif-ferent contexts and using different terms. For exam-ple, for the concept of classification schemes, the DDC used it in the librarianship context; the ASIS&T thesaurus represented it by taxonomies. And for the concept of automatic thesaurus generation, the ASIS&T thesaurus chose automatic taxonomy genera-tion; and the LISA thesaurus combined the two terms of automatic construction and construction of thesauri. Also, vocabularies from the 020 schedule of DDC focused more on librarianship, and the terms of the ERIC thesaurus focused on general education.

The sources might provide different structures/ term relationships for the same terms/concepts. For example, for the concept of ontologies, DDC re-flected it using the 006.33 class of knowledge-based systems; the ASIS&T treated it as a narrow term of controlled vocabularies; and the LISA thesaurus re-lated it to semantic web, controlled vocabularies, and thesauri. Also, the term relationships in a thesaurus may not be rigorously applied. For example, the LISA thesaurus put the three terms of archival description, archives law, and national archives that are not at the same conceptual level as narrower terms of archives. 6.0 Steps and Guidelines for Constructing

the Hierarchical Structure and Categories 6.1 Steps We developed the steps for addressing the difficulties we encountered in using the general classification scheme and domain thesauri. We used multiple sour-ces in addition to the DDC and domain thesauri. In particular, sources from the organization were used to reflect the stakeholders’ interests and perspectives such as in collecting the concepts and terms of cate-gories, and determining the mapping of low-level ca-tegories with high-level categories. Sources related to

Page 28: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

264

the professional association (IFLA) and the com-munity (library and information science schools) were employed to reflect the taxonomy domain and for determining the main categories. We designed the steps keeping in mind the need to manipulate multi-ple sources and incorporate the stakeholders’ inter-ests and perspectives. The steps involve constructing facets, main categories, category concepts and terms, hierarchies, and labels of categories. 6.1.1 Selecting Facets The facets were not selected from the DDC or the domain thesauri. They were determined based on the taxonomy application context. The specific steps are as follows: 1. Select facets from the application context of the

taxonomy, such as tasks and roles of the stake-holders, and types of target content.

2. Create labels for facets. 6.1.2 Determining the Main Categories (Level 1)

of the Subject Facet We designed the main categories as a separate step because they are at the top-level, represent the sub-ject coverage of the taxonomy, and general classifica-tion schemes or domain thesauri are not expected to be useful in specifying the main categories. The spe-cific steps are as follows: 1. Identify major areas and concepts of the stake-

holders’ interests by reviewing sources related to the stakeholders’ tasks.

2. Select the main categories to cover concepts of in-terest and subject areas from industry/community sources (documents from professional associations such as IFLA Guidelines, course descriptions from library and information science school websites), and domain taxonomies (e.g. Information Science Taxonomy).

3. Create additional main categories to cover concepts of interest and subject areas not found in commu-nity/industry sources and domain taxonomies.

4. Select labels from the sources and construct labels for main categories not found in the sources.

6.1.3 Collecting Category Concepts and Terms In the medical taxonomy project, McGregor (2005) used a term list representing the online journal con-

tent as a basis to select terms from the MeSH head-ings. We designed a term list representing the stake-holders’ interests to select terms. The selection is in three steps as follows: 1. Create a list of concepts and terms related to the

stakeholders’ interests by selecting and consoli-dating terms from sources related to the stake-holders’ tasks.

2. Select terms from the general classification scheme (DDC, class captions, and Relative Index terms), domain thesauri, and domain taxonomies based on the relevance to terms in the lists.

3. Consolidate concepts and terms from different sources.

6.1.4 Constructing the Hierarchies Saeed and Chaudhry (2002) proposed three steps for constructing the hierarchical structure using classifica-tion schemes and domain thesauri: selecting hierar-chies from the classification schemes, selecting terms from domain thesauri, and mapping the selected terms into the hierarchies. Based on their proposal, we de-signed four steps to construct the hierarchies. In addi-tion to the three steps of high-level categories, low-level categories, and mapping, we inserted a step of di-vision criteria to create neat categories at the same level. The specific steps are as follows: – High-level Categories (Level 2)

1. Based on relevance to the main categories, iden-tify and reconsolidate the high-level categories (concepts and terms) from structures/term rela-tionships of the general classification scheme (DDC), the Hierarchical Index of the thesaurus (ASIS&T), and relevant domain thesauri (mainly Information Science Taxonomy).

2. Review whether the selected high-level catego-ries cover the lower category terms. Create new high-level categories based on the lower cate-gory terms when the selected high-level catego-ries cannot cover them.

– Division Criteria (To Create Categories at the

Same Level) 3. Determine the division criteria by identifying

knowledge structures inherent in sources in the organization (e.g. course lectures).

Page 29: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

265

– Low-level Categories (Level 3) 4. Identify and reconsolidate the low-level catego-

ries from the multiple sources using the high-level categories as the starting points.

5. Determine the low-level categories based on the chosen division criteria.

– Mapping low-level categories and expansion of

hierarchies (Level 4 or 5) 6. Map the low-level categories to the main cate-

gories and high-level categories by identifying knowledge structures inherent in sources in the organization (e.g. course lectures).

7. Build cross references for categories that can be mapped into more than one perspective or can-not be mapped into the ideal “hierarchies”.

8. Expand the hierarchies by further identifying low-level categories and mapping them to the main categories and higher categories.

9. Balance the levels of hierarchies within 5 levels. 6.1.5 Determining Labels of Categories We created guidelines for selecting labels from vari-ous terms that were collected from multiple sources. To support user navigation, labels should fully reflect the concepts at hierarchical levels, be simple expres-sions, and be consistent. We reviewed whether the terms were appropriate for the organization, stake-holders, and at the target hierarchical levels. Similar guidelines can be found in the literature, but we had to address aspects of labels and terms from different sources. The specific steps are as follows: 1. Select labels from category terms according to the

guidelines we created. 2. Determine the scope of the labels (terms) based

on their uses in organization sources. 3. Modify the vocabularies (class captions and rela-

tive index terms) from the general classification scheme (DDC) into simpler expressions and in a style reflecting the taxonomy domain.

4. Modify the terms from the thesauri into expres-sions fully reflecting concepts at the target hierar-chical levels.

5. Create labels for higher-level categories when la-bels cannot be found in the category terms.

6. Format the labels according to the thesaurus con-struction standard (ANSI/NISO Z39.19-2005 standard).

7. Organize the labels of categories at the same level alphabetically (e.g. “genus/species”) or logically (e.g. “aspects” and “procedure”) based on the divi-sion criteria used, group categories based on the same division criterion together when more than one division is employed at the same level.

8. Build UF (used for) references used for category terms that were not chosen as labels.

Table 5 lists examples of labels modified from DDC or the domain thesauri, and the reasons why. 6.2 Guidelines We developed guidelines based on the difficulties we encountered in using the general classification sche-me and domain thesauri, and experience we had in constructing the hierarchical structure and catego-ries. The specific guidelines are as follows: 6.2.1 Selecting Facets 1. Labels of facets are not likely to be found in a

general classification scheme or domain thesauri because they are usually very broad concepts or a combination of concepts. They usually have to be home-created.

Table 5. Examples of labels modified from DDC and the thesauri

Page 30: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

266

6.2.3 Constructing the Main Categories (level 1) of the subject facet

1. The main categories are not likely to be found in a

general classification scheme or domain thesauri. However, it depends on whether the subject cov-erage of the taxonomy matches the main classes of the general classification scheme or its sub-classes.

2. When the subject coverage of the taxonomy does not match the main classes of the general classifi-cation scheme or its sub-classes, the main catego-ries should be selected from sources in the or-ganization, industry/community sources, and relevant domain taxonomies.

3. The size of the coverage and the width of the sub-ject facet need to be considered to determine the number of main categories.

6.2.4 Selecting Category Concepts and Terms 1. Most of the category concepts and terms are

likely to be found in domain thesauri. However, it depends on the availability of thesauri and the scope of the thesauri. Some category concepts and terms are likely to be found in the general classifi-cation scheme (class captions) and relevant do-main taxonomies.

2. New concepts, compound concepts, and concepts representing organization tasks and activities (rather than academic subjects) may not be found in the domain thesauri.

3. The concepts and terms should be annotated with their sources.

4. See references provided by the domain thesauri should be kept with terms.

6.2.5 Constructing Hierarchies – Hierarchies: high-level categories (level 2)

1. If the main categories match the main class or sub-classes of the general classification scheme, about half the high-level categories (concepts and terms) are likely to be found in the general classification scheme.

2. When the main categories are not related or relevant to the main classes or sub-classes of the general classification scheme, the high-level categories are likely to be found in the Hierarchical Index of a domain thesaurus and relevant domain taxonomies.

3. The high-level categories (concepts and terms) can be selected from classes at differ-ent levels as well as related relative terms of the general classification scheme.

4. The division criteria inherent in the selected high-level categories should be reviewed to ensure they are at the same conceptual level.

5. The categories at level 2 should be based on one division criteria.

6. When the high-level categories cannot be se-lected from the general classification scheme as well as domain thesauri and relevant do-main taxonomies, they have to be selected from sources in the organization or home-created.

7. The selected high-level categories should be annotated with their sources.

– Hierarchies: division criteria (used to create

categories at the same level) 8. The division criteria are not likely to be found

in the general classification scheme. However, it depends on the size of the subject coverage of the taxonomy. For a taxonomy involving several subjects and with a narrow coverage, the division criteria have to be home-created.

– Hierarchies: low-level categories (level 3)

9. Most low-level categories (concepts and terms) are likely to be found in the domain thesauri.

10. More than one division criterion can be used to determine the low-level categories, depend-ing on the number of categories and levels of hierarchies. These divisions are expected to be at the same conceptual level.

– Hierarchies: mapping low-level categories and

expansion of hierarchies (level 4 or 5) 11. The selection of categories at level 4 or 5 de-

pends on the stakeholders’ interests. Areas of high-level categories that are not the stake-holders’ major interests are not expected to have many lower categories and levels.

12. The number of levels can be shortened by us-ing more than one division criterion at the same level.

– Determining Labels of Categories

13. Prefer labels from sources in the organization 14. Choose the same terms to represent the same

concepts.

Page 31: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

267

7.0 Conclusion We conducted an empirical study of the extent that the DDC and three relevant domain thesauri can be used in developing an organizational taxonomy for an academic department in the information studies do-main. We started with the assumption that a general classification scheme and several relevant domain thesauri would provide excellent support for the sub-ject facet of an organizational taxonomy, particularly in an academic organization. Some modifications to the scope of the concepts, terms (labels), or con-cept/term relationships selected from these sources would of course be necessary to adjust them to the organizational context. But we found that the DDC and the domain thesauri were far from being sufficient for the organizational taxonomy. In particular, DDC could not provide support for the top-level categories of the taxonomy because the taxonomy is not disci-pline-based. Its subject coverage depends on the ac-tivities of the organization, and the tasks of the stake-holders. The DDC could also not provide complete support for the high-level categories (level 2 or 3 of the subject facet) for the same reasons and because of the focus on supporting navigation. The two selected domain thesauri in the area of library and information science also could not provide support for concepts and terms in the area of information and knowledge management because the organization treats them from a different perspective – the perspective of or-ganizational communication.

Organizational taxonomies are different from gen-eral classification schemes and domain thesauri in their application scope and navigation roles. We used addi-tional sources, for example, course materials and re-search publications of the organization to reflect the stakeholders’ interests, and a domain taxonomy (In-formation Science Taxonomy) and those from the pro-fessional association and the websites of sibling organi-zations to help with the top-level categories. The steps we used to construct the hierarchical structure and categories took into consideration the necessity of ma-nipulating multiple sources, the requirements for navi-gation, as well as requirements for a good taxonomy. The guidelines we developed were based on the issues encountered in the developing the taxonomy.

The findings of the study and the solutions imple-mented are limited to some extent to the context of our study. For example, due to the domain the study, we made use of partial schedules of DDC. Also, we used other knowledge organization systems such as domain taxonomies to complement the resources of

DDC and domain thesauri. We did not cover ontolo-gies. For example, the GEM ontology seems to be a good starting point for collecting terms and term rela-tionships in the field of education. The findings about using DDC and the domain thesauri, as well as the steps and guidelines for constructing the hierarchical structure and categories, need to be validated in other types of organizations, knowledge domains, and using other knowledge organization systems.

The prototype of taxonomy was implemented in the University’s e-learning platform using a taxon-omy software, TLE-Equella version 3, to support navigation. The evaluation of the effectiveness of the taxonomy for supporting end-users’ navigation has been conducted to identify the problems of the tax-onomy construction steps. The evaluation methods we used and issues found in the evaluation results will be reported in a separate paper. References Aitchison, Jean et al. 2000. Thesaurus construction

and use: A practical manual. (4th ed.). London: Aslib IMI.

Barnard, Chester. I. 1938. The functions of the execu-tive. Cambridge: Harvard University Press.

Batley, Susan. 2005. Classification in theory and prac-tice. Oxford: Chandos Publishing.

Bertolucci, Katherine. 2003. Happiness is taxonomy: Four structures for Snoopy. Information outlook 7n3: 36-44.

Cheuk, W-Y B. 2002. Real-world taxonomy develop-ment: Creating a taxonomy that makes sense to your employees. InsideKnowledge 5n6. Available http://communication.sbs.ohio-state.edu/sense- making/art/artabscheuk02kmtaxon.html (accessed July 20, 2007).

Choksy, Carol E. B. 2006. 8 steps to develop a taxon-omy. Information management journal Nov/Dec. Available http://findarticles.com/p/articles/mi_qa 3937/is_200611/ai_n16871474 (accessed December 4, 2006).

Cisco, Susan. L. and Jackson, Wanda. K. 2005. Creat-ing order out of chaos with taxonomies {Elec-tronic version}. Information management journal May/June: 45-50. Available from EBSCO Aca-demic Search Premier database (accessed Decem-ber 4, 2005).

Conway, Susan and Sligar, Char. 2002. Chapter 6. Building taxonomies. In Susan Conway and Char Sligar. Unlocking knowledge assets. Redmond, Wash.: Microsoft Press, pp. 105-24.

Page 32: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

268

Dickson, Ian. 2008. Taxonomy: Some guidelines for ef-fective design of taxonomies. Available http://drupal. org/node/81589 (accessed January 31, 2009).

Hunter, Anthony. 2000. Taxonomies. Available http:// www.cs.ucl.ac.uk/staff/a.hunter/tradepress/tax. html. (accessed July 10, 2005)

Iyer, Hemalata. 1995. Classificatory structures: Con-cepts, relations and representation. Verlag, Frankfurt/ Main: INDEKS.

Lambe, Patrick. 2007. Organizing knowledge: Tax-onomies, knowledge and organizational effective-ness. Oxford: Chandos Publishing.

McAuley, John et al. 2007. Organization theory: chal-lenges and perspectives. Harlow, England; New York: Prentice Hall/Financial Times.

McGregor, Bruce. 2005. Constructing a concise me-dical taxonomy. Journal of the Medical Library As-sociation 93: 121-23.

Pack, Thomas. 2002. Taxonomy’s role in content man-agement. Available http://www.econtentmag.com/ Articles/ArticleReader.aspx?ArticleID=867 (ac-cessed July 10, 2005).

Raschen, Bill. 2006. A resilient, evolving resource: How to create a taxonomy {Electronic version}. Business information review 22: 199-204. Available from Sage Publications database (accessed August 15, 2006).

Roberts-Witt, Sarah. L. 2000. Practical taxonomies. Available http://www.destinationKM.com/articles/ default.asp?articleid=684 (accessed July 15, 2005).

Saeed, Hamid and Chaudhry, Abdus. S. 2002. Using Dewey Decimal Classification Scheme (DDC) for building taxonomies for knowledge organization. Journal of documentation 58: 575-84.

Schein, Edgar H. 1970. Organizational psychology (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.

Sharma, Ravi. S. et al. 2008. Developing corporate taxonomies for knowledge auditability: A frame-work for good practices. Knowledge organization 35: 30-46.

Singhal, M. and Nath, S. S. 2008. Chapter 27. An ap-proach toward taxonomies. In I V Malhan and Shivarama Rao K ed. Perspectives on knowledge management. Lanham, Maryland: Scarecrow, pp. 391-406.

Vizine-Goetz, Diane. 2002. Classification schemes for Internet resources revisited. Journal of Internet cataloging 5n4: 5-18.

Wang, Zhonghong et al. 2006. Potential and pros-pects of taxonomies for content organization. Knowledge organization 33: 160-69.

Wang, Zhonghong et al. 2008. Using classification schemes and thesauri to build an organizational taxonomy for organizing content and aiding navi-gation. Journal of documentation 64: 842-76.

Wyllie, Jan. 2005. Taxonomies: Frameworks for corpo-rate knowledge (2nd ed.). London: Ark Group, in association with Inside Knowledge.

Appendix: Sources of the Information Studies Taxonomy Cheung, C. F. et al. 2005. A multi-facet taxonomy

system with applications in unstructured knowl-edge management {Electronic version}. Journal of knowledge management 9n6: 76-91. Available from Emerald Full text database (accessed August 14, 2006).

Doke, E Reed. and Barrier, Tonya. 1994. An assess-ment of information systems taxonomies: Time to be re-evaluate? Journal of information technology 9: 149-57.

Education Resources Information Center. U.S. De-partment of Education. ERIC Education Thesaurus. Retrieved December, 2006 from the Education Re-source Information Center database. Access pro-vided by Nanyang Technological University.

Hawkins, D. T. et al. 2003. Information Science Ab-stract: Tracking the literature of information sci-ence. Part2: A new taxonomy for information sci-ence {Electronic version}. Journal of the American Society for Information Science and Technology 54: 771-81. Available from ProQuest ABI/INFORM Archive Complete database (accessed July 10, 2005).

IFLA. Education and Training Section. 2000. Guide-lines for Professional Library/Information Educa-tional Programs – 2000. Available http://www. ifla.org/VII/s23/bulletin/guidelines.htm (accessed November 3, 2006).

Mentzas, G. 1994. A functional taxonomy of com-puter-based information systems {Electronic ver-sion}. International journal of information man-agement 14: 397-410. Available from ScienceDirect database (accessed July 12, 2005).

Proquest – CSA Social Science. LISA library and in-formation science thesaurus. Retrieved December, 2006 from the Library and Information Science Abstracts database. Access provided by Library of Nanyang Technological University.

Redmond-Neal, Alice and Hlava, Marjorie. M. K. 2005. Thesaurus of information science, technology,

Page 33: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 Z. Wang, A. S. Chaudhry, and Ch. Khoo. Support from Bibliographic Tools to Build an Organizational Taxonomy...

269

and librarianship, 3rd ed. Medford; New Jersey: In-formation Today.

OCLC. 1978. Web Dewey. Available http://connexion. oclc.org/ (accessed December, 2006). Access pro-vided by WKW School of Communication & In-formation, Nanyang Technological University.

Page 34: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

270

Concepts and Terms in the Faceted Classification: the Case of UDC

Vanda Broughton

Department of Information Studies, University College London, Gower Street, London WC1E 6BT, United Kingdom <[email protected]>

Vanda Broughton is Senior Lecturer in Library & Information Studies and Programme Director for the MA LIS at University College London. She has worked on the revision of the Bliss bibliographic classification since 1972 and is Joint Editor of the second edition. She has been involved with the UDC since 1997 and is now Associate Editor. A member of the Classification Research Group from the 1970s, she has also been a member of the IFLA Committee on Classification and Indexing and is the author of a number of books and articles on the theory and design of classifications and thesauri.

Broughton, Vanda. Concepts and Terms in the Faceted Classification: the Case of UDC. Knowledge Organization, 37(4), 270-279. 32 references.

ABSTRACT: Recent revision of UDC classes has aimed at implementing a more faceted approach. Many compound classes have been removed from the main tables, and more radical revisions of classes (particularly those for Medicine and Religion) have introduced a rigorous analysis, a clearer sense of citation order, and building of compound classes according to a more logical system syntax. The faceted approach provides a means of formalizing the relationships in the classification and making them explicit for machine recognition. In the Bliss Bibliographic Classification (BC2) (which has been a source for both UDC classes mentioned above), terminologies are encoded for automatic generation of hierarchical and associative relationships. Nevertheless, difficulties are encountered in vocabulary control, and a similar phenomenon is observed in UDC. Current work has revealed differences in the vocabulary of humanities and science, notably the way in which terms in the humanities should be handled when these are semantically complex. Achieving a balance between rigour in the structure of the classification and the complexity of natural language expression remains partially unresolved at present, but provides a fertile field for further re-search. 1.0 Introduction In recent years the UDC has seen a significant chan-ge in the level of consistency and uniformity in the modelling of its content. Work by Cordeiro and Sla-vic (2002) identified the need for robust models not only for data representation, but also for supporting the semantic structure of subject tools, and lamented the lack of universal standards for this. In a net-worked environment the lack of structure in the sys-tem cannot be compensated for by a sophisticated interface (Slavic 2006, 30):

A poor data structure, however, may impose fundamental limits on the search and interac-tion options that may be presented at the user interface . If a database does not contain infor-mation on relationships (hyperlinks) between,

for example, a UDC number and its broader class or a UDC number and its caption, or UDC notation and verbal expressions, no in-terface technology will overcome these limita-tions.

Other work on the Master Reference File (MRF) (Slavic, Cordeiro, and Reisthuis 2007) shows how im-portant the consistency of the structure is to the effi-cient management of the classification database. It is certainly clear that the application of facet analysis to the classification scheme has some powerful advan-tages in terms of confirming the structure, facilitating machine management, and clarifying the semantic re-lationships between classes. The faceted approach to subject analysis provides a systematic means of for-malizing the relationships in the classification and making them explicit for machine recognition,

Page 35: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

271

whether this is in a database structure, or in an en-coded format for exchange and/or display. The termi-nologies of the Bliss Bibliographic Classification (BC2) which have been a source for UDC revision, are, in their original form, encoded for hierarchical and associative relationships in a way that permits the semi-automatic generation of an associated thesaurus (as well as the creation of the classification display, and the alphabetical index). This is enabled by the clarity of structure, since specialized software can infer from the coding the broader and narrower terms, and cope to a limited degree with equivalence relationships.

However, some significant questions arise as to how the faceted structure is represented, notationally and structurally, and how compounds built through facet syntax are managed. The potential conflict be-tween notation and language needs to be reconciled, and a clear basis established for the formal delineation of classes. Close control of vocabulary may also be highly significant where interoperability is concerned. The exchange of information is greatly enhanced by the use of a common classification scheme where the notation may act as a surrogate and obviate the need for linguistic control (Balikova 2005).

In Balikova’s paper a project for cross searching is described in which UDC is used as the basis of a switching process. Here the advantage is the classifi-cation coding (2005, 6):

It is based mostly on numerical notations and uses language independent coding. The scheme UDC MRF is available among others in Eng-lish and Czech languages and in machine read-able form. It is flexible more than other univer-sal classification schemes; it supports very de-tailed expressions of complex subjects using a variety of common and special auxiliaries, spe-cific symbols and punctuation.

In principle, notation provides a language independent means of retrieval and exchange. A paper by Riesthuis (1999) describes an algorithm which allows for so-phisticated UDC search strategies based on an under-standing of its notational expressiveness. (Other ex-amples of catalogues where the library management system supports such retrieval of notational elements irrespective of where they appear in the classmark, in-clude those of the British Geological Survey http://geolib.bgs.ac.uk/webview and the Royal Soci-ety of Chemistry http://opac.rsc.org/R10305UKStaff/ OPAC/index.asp.) The association of codes and terms can also have advantages. In the same paper, Riesthuis

suggests that text may be combined with the notation to facilitate word-based searching, without investigat-ing this in any greater detail, and Slavic (2003) also discusses the viability of linking UDC numbers with an external vocabulary.

However, where terms, either single or complex, form the basis of search and exchange, through the use of mapping or otherwise, the situation becomes much more complicated, and, in practice, the asso-ciation of notation with other than simple class de-scriptions may be very far from straightforward. This can be a particular issue in exchanging between natural languages where the representation of com-pound concepts may differ radically; for example, there are very many more cases of single term repre-sentation of compound concepts in say German or Dutch than in Romance languages. 2.0 Recent work in improving the UDC structure Some major work on rationalization of the UDC be-gan in the mid-1990s, when efforts were made to en-hance the implicit faceted structure of the classifica-tion and to make it more logical and structurally co-herent. This was very much a development of earlier ideas about the application of facet analysis to UDC and some exploratory work in the 1970s (Kyle and Vickery 1961; Dahlberg 1971) Alongside major revi-sions of main classes along faceted lines, a rolling pro-gramme was initiated to remove many examples of compound classes from the schedules, particularly where these represented the enumeration of topics that might more properly have been built using the systematic auxiliaries. This had the additional advan-tage of reducing the overall number of classes and making room for new topics without compromising the agreed size of the MRF database at around 65,000 classes.

There were many obvious examples throughout the main tables of unwarranted enumeration of com-pounds involving concepts from existing auxiliary ta-bles, notably persons, and also materials. These were removed, to be replaced by the use of the systematic auxiliaries, which were correspondingly expanded and enhanced. The following examples of cancellations and amendments, taken from Extensions and correc-tions to the UDC 20 (1998) and 24 (2002), alongside the enumerated classes which they replaced (taken from the English Medium Edition of 1993) demon-strate the editorial work going on during that period (although it should be emphasised that they do not necessarily represent the current state of affairs).

Page 36: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

272

Figure 1. Examples of replacements It was also recognised that the classification con-tained many other repetitious concepts, not previ-ously acknowledged as such, which could be better represented by new auxiliary tables. The first of these to be formally developed was the properties table, 1k –02 (Broughton 1998), which led quickly to the replacement of enumerated compounds by ex-amples of combination with systematic properties auxiliaries, as these examples (Figure 2) from the proposed revision of Class 77 Photography, show (Extensions & Corrections 2002, 65-67):

Figure 2. Proposed revision The same policy was also applied to eliminate the ru-dimentary and unsatisfactory ‘Point-of-view’ table, with its miscellany of auxiliary concepts, and to in-

troduce a systematic table, 1k –04, for processes, operations and relations (Broughton 2002) . 3.0 Making UDC fully faceted The rationalization of the systematic auxiliaries is, however, not coextensive with a fully faceted scheme, in which synthesis within classes (as well as between classes, and between main tables and auxiliaries) should also function on the basis of a logical coherent analysis and organization of constituent concepts. Previous efforts to introduce a completely faceted structure into UDC had concentrated on classes such as Literature where there was relatively little detailed enumeration, and the facet structure was very evident and comparatively easy to manage. UDC policy at that time was clear; the classification would not in-clude any enumeration of built classes, other than a very small number of examples of combination to guide the indexer in the application of the schedules. This principle was evident in all of the arts and hu-manities schedules to be revised, but particularly so in the case of literature.

This class has a very ‘pure’ faceted structure, simi-lar in style to the Colon classification, where, for the most part, only simple isolates are provided, and classmarks for more complex compounds must be constructed by the indexer. A similar structure is to be found in the Dewey Decimal Classification, and both are in striking contrast to, for example, Library of Congress Classification, where the norm is to enu-merate individual authors and their works. In both UDC and DDC, individual authors or works of lit-erature may only be specified in terms of the lan-guage, form, and period of the work, although in UDC there is more systematic provision for ex-pressing other aspects of the subject either through special auxiliaries, or through colon combination. The following combinations (Figure 3) would be typical:

Figure 3. Combinations

Page 37: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

273

A similar situation pertains in history, where again, classes are created on the basis of place, time, and subject content, but without the facility to express individual persons, processes or events, other than by alphabetical extension (Figure 4).

Figure 4. Alphabetical extension. The faceted structures created for these classes was very much of the same order as the rationalization of the systematic auxiliary concepts, and it clearly im-proved the logical structure of the classification. It is now the practice to include rather more examples of combination, so that major topics and sought terms are clearly represented, but that practice does not af-fect the principle of conformity and consistency in the basic classification structure which was the ob-jective of the revision. 3.1 Faceted UDC on the BC2 model Until the mid-1990s, this rationalization and regu-larization of the UDC structure represented the lim-its of attempts to fully facet the schedules. The late 1990s saw the initiation of an imaginative and ambi-tious project to utilise the fully faceted schedules of the Bliss bibliographic classification 2nd edition (BC2) (Mills and Broughton 1977) to provide a speedier route to a more fully faceted UDC. It was envisaged that BC2 would be a major source both for termi-nology and for structure in subsequent revision of UDC (McIlwaine and Williamson 1993, 1994; McIl-waine 1997). The initial work concentrated on two main areas: Religion, which had been published as Class P of BC2 in 1977, and Medicine, published as Class H in 1980. A revised UDC Class 2 was com-pleted in 1999 (Broughton 2000), although work on the much larger medical vocabulary is still in pro-gress. The use of BC2 terminologies as a basis for UDC revision provided the latter with a rich source of data, and obviated the need for much labour in the groundwork of analysis and facet organization. It did, however, raise some difficulties in reconciling the rather different conceptual structures of the two classifications.

3.2 The faceted religion vocabulary in UDC. The new Class 2 was modelled directly on the BC2 1977 vocabulary with some modifications and ex-pansions. Twenty years on, it was easier to spot weaknesses and omissions in the BC2 structure, and while maintaining the general principles and the broad facet structure of that class, a more de-tailed and a more rigorously organized terminology was developed for UDC. Using the standard facet analytical approach, eight major facets were identi-fied: religious concepts, religious evidences, per-sons, religious activities, religious processes, or-ganization and administration (parts), religious properties, and faiths (entities). Terminology was assigned to these categories, attempting as far as possible to maintain a linguistically neutral tone, al-though that was to some extent difficult, as reli-gious language in English tends to be Christian in nature.

Combination of ‘simple’ classes to create semanti-cally complex concepts presented some practical dif-ficulties, in that, although any degree of complexity could be managed through colon combination, the resulting classmarks would be very lengthy indeed, with considerable internal repetition of notation. Accordingly, the facets were constituted as a series of special auxiliaries within the class, using the hyphen as a linking device, as was the norm for such auxilia-ries. A general classification for theology and relig-ion was created, using 2- numbers, in which any sub-division of 2 could be substituted for the main class number (Figure 5).

Figure 5. General classification for theology and religion Using this ‘basic’ schedule as a model, classifications for individual faiths could be developed in which faith specific terms could be substituted for the more neu-tral concepts of the basic schedule. A number of special expansions were developed to demonstrate how this would be achieved for individual religions and faiths; in the original revised schedule, examples were pro-

Page 38: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

274

vided for Hinduism, Judaism, and Christianity, and later those for Buddhism, Islam, and Orthodox Chris-tianity were created (Figure 6), these being published as Extensions and corrections to the UDC.

Figure 6. Extensions for Judaism, Orthodox Christianity, and Islam.

This followed the BC2 practice where many com-pound concepts were inserted into the faceted sched-ule in order to aid the indexer, by demonstrating the syntax of the faceted scheme, and also to ensure the inclusion in the alphabetical index of many specific terms which would not be represented in the bare facet structure. It was however rather at odds with the poli-cies of UDC as implemented in the schedules for his-tory and literature, and subsequently some doubt was expressed, as to whether this was the best way to rep-resent the compound classes in the classification, and whether they should not rather be handled as exam-ples of combination.

A second problem came to light during the FATKS project (FATKS n.d.) when a database was con-structed to hold the humanities terminologies derived from BC2. It was clear that the vocabulary contained a number of compound classes that could not be repre-sented precisely as the sum of other simpler concepts. This was also the result partly of BC2 scheduling con-ventions, originating in the period before the creation of electronic formats of the classification, but also a consequence of the role of the notation in BC2.

Figure 7. Compound classes. Here, as is typical in BC2, the notation is used solely as an ordering device. It has no function as an indica-tor of structure, either of the hierarchical relationships between concepts, or of the facet status of a concept. It had been clear to the editors of BC2 for some time that this would be problematic for any future digital representation of BC2, and it was immediately so for the handling of the hybrid terminology in the FATKS database. The consequences for conversion of BC2 terminologies to UDC were inescapable.

Unlike the BC2 coding, the notation in UDC is re-quired to represent both hierarchy, and the presence of structural components in a state of ‘compound-ness’. The notation in examples of combination must be consistent with the notation for the elements of the combination, reflecting the semantic structure of the classmark. This is clear and unproblematic in the ex-amples of the new class 2 given above, which are rela-tively simple. But other parts of the classification had followed the BC2 practice of enumerating subdivi-sions of a compound class, where no equivalent sub-divisions existed in the constituent classes.

2 2-23 2-24

Religion Sacred books. Scriptures Specific texts. Named texts

26 26-23 26-24 26-242 26-242.2 26-242.3 26-242.4 26-242.5 26-

Judaism Sacred texts Tanakh. The He-brew Bible Torah. The Law. ThePentateuch Genesis Exodus Leviticus Numbers Deuteronomy

Page 39: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

275

2-252 2-254

Apocrypha. Pseudepigrapha Commentary on sacred works

242.6 26-252 26-254

Pseudepigrapha Rabbinic literature

Table 1: Examples of combination in UDC Class 2 Re-ligion

In this example, there are no subdivisions of 2-24 cor-responding to the subdivisions of 26-24. This makes it difficult to accommodate the term as an example of combination, since the elements of combination are not present, and the only option at the time was to avoid the inclusion of terms of this kind.

It appears that this may be a problem associated with terminologies in the humanities, which exhibit a number of features: – humanities vocabularies tend to contain many ex-

amples of named entities, such as individual writers, artists, musicians, individual created texts or other works, or named events such as battles;

– such entities may be semantically very complex, composed of a number of attributes from different facets;

– in most disciplines these greatly outnumber the conceptual classes, and they are likely to be terms sought by end users;

– the question then arises as to how documents are indexed to provide for the retrieval of both the ge-neric class, and the named members of a class;

– there may be variation in the way a concept is ex-pressed terminologically in different cultures, even when the fact of different natural languages is dis-counted; religion is perhaps the worst example here;

– it may be very unclear what relationship exists be-tween named members of a class and the class itself, when the named member is characterized by a vari-ety of attributes, some of them from other facets.

It seems that this situation is not replicated (or to nothing like the same extent) in the sciences, for the following reasons: – concepts in the sciences are usually members of

classes rather than individual entities, e.g., quanta, protons, silicates, chromosomes, rabbits, lasers, nuclear reactors;

– the concept of, for example, a proton (although it may be represented differently in various natural languages), is not differently understood across cultures;

– the relationships between a class and its members are usually straightforward in a hierarchical sense;

– it is therefore easier to map concepts in a general way, and to associate them with terminological la-bels.

There are of course some exceptions to this, particu-larly when working in a multilingual environment. Conceptual hierarchies are not always consistent ac-ross different natural languages, and the way in which the names of complex concepts are formed va-ries from one language to another. Some very rich and large vocabularies, such as that of medicine, also exhibit a greater degree of representation of com-pounds by unique terms than is the case with, for ex-ample, physics or chemistry. Nevertheless it appears that the relationship between concept and label is more challenging in the humanities than it is else-where. 4.0 The thesaurus approach UDC has for many years been used as the starting point for the construction of indexing vocabularies such as the Euratom thesaurus (Marosi 1969), and this application of UDC continues to be the subject of much research (Reisthuis and Bleidung 1990; Francu 2004). One of the objectives of the 1990s plan for major revision of UDC was the creation of further examples (Williamson 1996). A similar initia-tive is already in progress with the BC2 vocabularies. There are obvious advantages of a conceptually well structured classification when generating a thesau-rus, since the clear identification of relationships al-lows some degree of mechanical handling of the pro-cess, and the value of a faceted classification in this regard has been known for some time (Aitchison 1986). The same attributes of the faceted scheme al-so facilitate browsing structures and automatic navi-gation of the vocabulary, whether this is set up from a structural or a term oriented basis. All the work on improving the structure of UDC achieves consider-able progress towards this end.

In working on the conversion of the systematic structure to a word-based format in BC2, the gen-eration of the structural relationships between con-cepts was a very straightforward process, since the rules of the faceted scheme ensure that these are

Page 40: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

276

strictly controlled and quite apparent in the struc-ture. Relationships between non-compound classes in the same facet must of necessity be limited to broader term/narrower term and they are easily de-tected for the manual compilation of the thesaurus. BC2 is maintained electronically using a very simple mark-up language, and this can be manipulated by specialist software to generate the majority of these hierarchical relationships quite accurately and with-out human intervention (Broughton 2008a).

Figure 8. Example of BC2 input data with encoding Although the output from this software requires some degree of editing, it is clear that the conceptual structure of the classification is consistent with the conceptual structure of the associated thesaurus, and that the two can be regarded as interchangeable.

However, the same cannot be said of the verbal representation of the classification, and some con-siderable problems of vocabulary control were en-countered (Broughton 2008a; 2008b). Many class headings were simply unsuitable (and sometimes un-usable) as thesaurus terms. This arose for a number of reasons: – for the most part, in the formation of class head-

ings, little attention had been paid to the role of the class heading as a descriptor, as the notation acts as a surrogate for retrieval purposes;

– the form of the entry has not been considered at all (since irrelevant);

– some part of the meaning was often taken from the contextual location of the term, qualification being inferred from the hierarchy; and,

– a convoluted form of class heading had been con-structed in order to define precisely the scope and nature of the class.

These difficulties are also encountered in UDC, and are indeed likely to occur in any system that has been constructed on a systematic rather than a verbal basis. In addition, the problem encountered in Class

2 of the approach to the provision of very specific terminology, particularly where a single term repre-sents a compound concept, shows up some direct conflict between the culture and conventions of the thesaurus and the classification scheme. In BC2, compounds of this kind have been managed by buil-ding appropriate classmarks according to the rules for synthesis, and adding the classmarks to the pub-lished schedules. An example from the other major BC2/UDC conversion terminology, medicine, dem-onstrates this nicely (figure 9).

Figure 9. Synthesis.

Here the syntax of the faceted classification is ap-plied rigorously to generate the linear order and the hierarchical structure in respect of compound con-cepts, but the notation does not reflect that at all, despite some limited correspondence between ter-minal characters (e.g., Curettage HON KV, Hyper-trophy HPT J, Inflammation HPY). Nor does the notation (or the encoding of electronic files) repre-sent the facet status of classes, neither facet nor role indicators are used, and the syntax of the classifica-tion is imposed entirely intellectually. It should also be stressed that there is nothing comparable to the UDC MRF for BC2. Although, in the process of generating schedules, alphabetical index, and thesau-rus, the software creates a database of classes that holds information about the notation, class value, hi-erarchical level, and various index data, no independ-ently maintained BC2 database exists which can be interrogated, or function as a authority for the classi-fication.

Page 41: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

277

In UDC such compound concepts would be repre-sented as examples of combination, and the classmarks would represent more exactly the structure of the built compound. (Example in Figure 10 taken from Extensions and corrections to the UDC 2006, 83).

Figure 10. Built compound. Here the notation represents the direct addition of special auxiliary facets for therapies, and pathology, and the structure of the compound is evident; addi-tionally the notation is searchable for, for example, all instances of ‘inflammation’ or ‘hypertrophy’.

Guidelines for current thesaurus practice as ex-pressed in the British Standard (BS8723-2 2005) do not explicitly address the question of how semanti-cally complex single term concepts should be man-aged. BS8723-2 does however concede that the ‘avail-ability of so many choices presents the thesaurus edi-tor with a difficult and subjective decision: whether to admit the complex concept or whether to rely on sim-pler concepts used in combination’ (BS8723-2 2005, 18). The only real guidance provided is the rule that ‘if the concept is frequently sought, and especially if the term representing it is widely used and understood by the audience, then some provision should be made for it’ (BS8723-2 2005, 15), but no examples are given of a single term compound . The expression ‘factoring’, used in the past to denote the analysis of complex concepts, is now replaced by ‘splitting’, and the dis-cussion confined to multi-word terms. The distinction between semantic factoring (the de-construction of a single term into its constituent semantic elements) and syntactic factoring (the division of a multi-word term) is now defunct. Earlier literature does however make this distinction, and the determination of good practice in this respect is highly significant. BS8723 tells us that ‘the establishment of procedures for deal-ing consistently with compound terms introduces one of the most difficult areas of subject indexing’

(BS8723-2 2005, 11). The standard defines semantic factoring as the re-expression of a complex notion ‘in the form of simpler or definitional elements, each of which can also occur in other combinations to repre-sent a range of different concepts’. The example given is that of thermometers, which are expressed as the combination of three terms: TEMPERATURE + MEASUREMENT + INSTRUMENTS It is then very firmly stated (in bold) that this tech-nique is not recommended, and that ‘it is generally recognized that semantic factoring leads to a loss of precision in retrieval.’ We might assume that this is now so widely acknowledged, that it was felt unneces-sary to even mention it in the revised Standard of 2005. So in UDC a term such as gingivitis ought not to be represented as ‘gums + inflammation’, if there are any expectations that the classification data can be used in the future in a thesaurus format.

However, semantic factoring is essentially an inver-sion or deconstruction of the process of synthesis that is used in building up the classification structure in a fully developed faceted system. Compound classes, or ‘examples of combination’ are determined in this way, and there is a clear correspondence between synthesis on the one hand, and factoring on the other, in estab-lishing the semantic basis of compound terms. 5.0 Conclusion In the faceted scheme there is some conflict between the representation of conceptual classes, and the use of class names as descriptors, that is not encountered in enumerative systems. The difference between the use of concepts or classes (for the organization of knowledge) and the use of terms (for resource descrip-tion) manifests itself in several quite distinct ways and raises a number of significant questions particularly in the maintenance and application of UDC: – firstly, how should a semantically complex topic

be handled in the schedule; – how the complex topic is notated; – how it is regarded by and entered in the MRF da-

tabase; – what view should be taken of the desirability of

factoring complex compound terms (particularly single term complexes);

– how differences in the approach of encoded sys-tems, such as classifications, and terminologies proper, such as thesauri, might be reconciled;

Page 42: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

278

– what are the implications for forming class head-ings, and the way in which vocabulary control in the narrower sense is carried out; and,

– what are the implications of decisions made here for, on the one hand, the retrieval of specific named classes (e.g., Mozart, the Bible, Gettys-burg) and on the other, the retrieval of conceptual classes (Austrian music, sacred texts, battles)?

A significant question is the way in which terms in the humanities should be handled when these are semanti-cally complex. The initial schedules for Class 2 Relig-ion included a number of expansions of the basic class structure to accommodate terminology peculiar to in-dividual faiths. It became apparent that these com-pound classes are not always easily represented, and that care should be taken to ensure that they are exact mappings to combinations of simpler classes. There is some advantage in retaining these culture-specific terms, but they should be regarded as examples of combination rather than as classes in their own right. Particular problems occur when such examples have named sub-classes, as this phenomenon may be diffi-cult to represent accurately in terms of the classifica-tion structure. The use of differential facets, which remedies this problem in paper-based classifications, is more complex in an automated classification and can lead to confusion and duplication.

This situation is paralleled in Medicine, where many unique terms are generated by the combina-tion of concepts, notably in the names of conditions and diseases related to particular parts of the body. But these are relatively straightforward to express as examples of combination, unlike humanities vocabu-laries where named individual examples of persons, events, created works, and culture-specific concepts proliferate. Achieving a balance between rigour in the conceptual structure of the classification and the complexity of natural language expression remains partially unresolved at present, but provides a fertile field for further research.

There may need to be a trade off between the rig-our of the conceptual structure and the representa-tion of a rich semantic dimension as expressed by the vocabulary. The identification and inclusion of sought terms (mainly in the humanities, but also to a more limited extent in the sciences) may be ad-dressed by the extensive use of ‘examples of combi-nation’, but the status of these within the classifica-tion structure needs to be clarified if adequate index description (and subsequent retrieval) is not to be compromised.

References Aitchison, Jean. 1986. Bliss and the thesaurus; the

Bibliographic Classification of H. E. Bliss as a source of thesaurus terms and structure. Journal of documentation 42: 160-81.

Aitchison, Jean. 2004. Thesauri from BC2: problems and possibilities revealed in an experimental the-saurus derived from the Bliss Music schedule. Bliss classification bulletin 46: 20-26.

Balikova, Marie. 2005. Multilingual subject access to catalogues of national libraries (MSAC) : Czech Republic's collaboration with Slovakia, Slovenia, Croatia, Macedonia, Lithuania and Latvia, 71st IFLA General Conference and Council, August 14th - 18th 2005, Oslo, Norway. http://www.ifla. org/IV/ifla71/papers/044e-Balikova.pdf.

British Geological Survey GEOLIB catalogue http:// geolib.bgs.ac.uk/webview (Accessed 04.09.2009).

BS5723:1987. Establishment and development of mono-lingual thesauri London: British Standards Institute.

BS8723-2:2005. Structured vocabularies for informa-tion retrieval – Part 2:thesauri London: British Standards Institute.

Broughton, Vanda. 1998. The development of a common auxiliary schedule of property: a pre-liminary survey and proposal for its development. Extensions and corrections to the UDC 20: 37-42.

Broughton, Vanda. 2000. A new classification for the literature of religion. International cataloguing and bibliographic control 29(4), Oct/Dec.:59-61. (Also presented as a paper at the 64th IFLA Confer-ence, Jerusalem 2000, and available at: www.ifla. org/IV/ifla66/papers/034-130e.htm).

Broughton, Vanda. 2002. A new common auxiliary for relations, processes and operations. Extensions & corrections to the UDC 24: 29-35.

Broughton, Vanda. 2008a. A faceted classification as the basis of a faceted terminology. Axiomathes 18(2):193-210. Available http://www.springerlink. com/content/6mm3r57j1r44k5u5/.

Broughton, Vanda. 2008b. Language related prob-lems in the construction of faceted terminologies and their automatic management. In Clément Arsenault and Joseph Tennis eds., Culture and identity in knowledge organization: proceedings of the Tenth International ISKO Conference, 5-8 Au-gust 2008, Montréal, Canada. Wurzburg: Ergon, pp. 43-49.

Cordeiro, Maria Inês and Riesthuis, Gerhard J.A. 2006. A new editorial support system for UDC. Extensions & Corrections to the UDC 28: 17-22.

Page 43: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Broughton. Concepts and Terms in the Faceted Classification: the Case of UDC

279

Cordeiro, Maria Inês. and Slavic, Aïda. 2002. Data models for knowledge organization tools: evolution and perspectives. In Maria Huertas-Lopez ed. Chal-lenges in knowledge representation and organization for the 21st century; integration of knowledge across boundaries. Proceedings of the Seventh International ISKO Conference, Granada, Spain. Würzburg: Er-gon. Also available at: http://dlist.sir.arizona.edu/ 1303/01/ISKO_ICM_AS.DOC.

Dahlberg, Ingetraut. 1971. Possibilities for a reor-ganization of the UDC. In R. Mölgaard-Hansen and M. Westring-Nielsen eds. Proceedings of Sec-ond seminar on UDC and mechanized information systems, Frankfurt, 1st-5th June 1970. Copenhagen: Danish Centre for Documentation, pp. 193-211.

Extensions and Corrections to the UDC, 20. 1998. The Hague: UDC Consortium.

Extensions and Corrections to the UDC, 24. 2002. The Hague: UDC Consortium.

Extensions and Corrections to the UDC, 28. 2006. The Hague: UDC Consortium.

Facet analytical theory in knowledge structures. www. ucl.ac.uk/fatks.

Francu, Victoria. 2004. The impact of specificity on the retrieval power of a UDC-based multilingual thesaurus. Cataloging and classification quarterly 37(1/2): 49-64.

Kyle, Barbara and Vickery, Brian C. 1961. The Uni-versal Decimal Classification: present position and future developments. UNESCO bulletin for libraries 15: 2.

Marosi, Aviva. 1969. Euratom thesaurus and UDC: combined use for the subject organization of a small information service. Journal of documenta-tion 25: 197-213.

McIlwaine, Ia C. 1997. Classification schemes: con-sultation with users and cooperation between edi-tors. Cataloging & classification quarterly 24(1/2): 91-92.

McIlwaine, Ia C. and Williamson, Nancy J. 1993. Fu-ture revision of UDC: progress report on a feasi-bility study for restructuring. Extensions & correc-tions to the UDC 15: 11-17.

McIlwaine, Ia C. and Williamson, Nancy J. 1994. A feasibility study on the restructuring of the Uni-

versal Decimal Classification into a fully faceted classification system. In Hanne Albrechtsen and Susanne Oernager eds. Knowledge organization and quality management. Proceedings of the Third International ISKO Conference, 20-24 June 1994, Copenhagen, Denmark. Frankfurt/Main : Indeks Verlag, pp. 406-13.

Mills, J. and Broughton, V. 1977. Bliss Bibliographic Classification: Introduction and auxiliary schedules. 2nd ed. London : Butterworth.

Riesthuis, Gerhard J. A. 1999. Searching with words: re-use of subject indexing. Extensions and correc-tions to the UDC 21: 24-31.

Riesthuis, Gerhard J. A. and Bliedung, Steffi. 1990. Thesaurification of the UDC. In Tools for knowl-edge organization and the human interface: Proceed-ings of the First International ISKO-Conference, Darmstadt, 14-17 August 1990, pp. 109-17.

Royal Society of Chemistry Library catalogue http:// opac.rsc.org/R10305UKStaff/OPAC/index.asp (Accessed 04.09.2009).

Slavic, Aïda. 2003. UDC implementation: from library shelves to a structured indexing language. [Paper Pre-sented at the 69th IFLA Council and General Con-ference, Berlin, 2003]. Available at: http://archive. ifla.org/IV/ifla69/papers/032e-Slavic.pdf.

Slavic, Aïda. 2006. Interface to classification: some ob-jectives and options. Extensions and corrections to the UDC 28: 24-45. Also available at: http://dlist. sir.arizona.edu/1621/.

Slavic, Aïda, Cordeiro, Maria Inês, and Riesthuis, Gerhard J. A. 2007. Enhancement of UDC data for use and sharing in a networked environment: [pres-entation at the Librarian Workshop in conjunction with “The 31st Annual Conference of the German Classification Society on Data Analysis, Machine Learning, and Applications”, March 7-9, 2007, Freiburg i. Br., Germany]. Available at: http://dlist. sir.arizona.edu/2093/01/freiburg_udc_enhance ment.pdf.

Williamson, Nancy. 1996. Deriving a thesaurus from a restructured UDC. In R. Green, ed. Knowledge or-ganization and change: Proceedings of the 4th Inter-national ISKO Conference, Washington, 15-18 July, 1996. Frankfurt/Main: Indeks Verlag, pp. 370-77.

Page 44: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 G. Dunsire and D. Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability

280

Signposting the Crossroads: Terminology Web Services

and Classification-Based Interoperability

Gordon Dunsire* and Dennis Nicholson**

University of Strathclyde, CDLR, 26 Richmond Street, Glasgow, UK, * [email protected] ** <[email protected]>

Gordon Dunsire is Head of the Centre for Digital Library Research at Strathclyde University in Glas-gow, Scotland. He is a member of the Chartered Institute of Information Professionals (CILIP) and British Library Committee on AACR and the CILIP Committee on DDC, and is Chair of the Cata-loguing and Indexing Group in Scotland. He is a member of Classification and Indexing Section of the International Federation of Library Assocations and Institutions (IFLA). He is the principal devel-oper of the SCONE (Scottish Collections Network) collection descriptions service and other compo-nents of the Scottish Common Information Environment, and has been involved in several projects investigating the use of collection-level description and metadata aggregation in wide-area resource discovery.

Dennis Nicholson is a private consultant with expertise in the area of distributed digital libraries and library-related information technology. Between 1999 and 2009, he was Director of the Centre for Digital Library Research at the University of Strathclyde and Director of Research in Strathclyde Uni-versity's Information Resources Directorate. He has been actively involved in research in the area of distributed digital libraries and information systems since 1991. He managed and led a range of funded research projects, including the High Level Thesaurus Project, and the Co-operative Academic Infor-mation Retrieval Network for Scotland project.

Dunsire, Gordon and Nicholson, Dennis. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability. Knowledge Organization, 37(4), 280-286. 20 references. ABSTRACT: The focus of this paper is the provision of terminology- and classification-based terminologies interoperability data via web services, initially using interoperability data based on the use of a Dewey Decimal Classification (DDC) spine, but with an aim to explore other possibilities in time, including the use of other spines. The High-Level Thesaurus Project (HILT) Phase IV developed pilot web services based on SRW/U, SOAP, and SKOS to deliver machine-readable terminology and cross-terminology mappings data likely to be useful to information services wishing to enhance their subject search or browse ser-vices. It also developed an associated toolkit to help information services technical staff to embed HILT-related functionality within service interfaces. Several UK information services have created illustrative user interface enhancements using HILT functionality and these will demonstrate what is possible. HILT currently has the following subject schemes mounted and available: DDC, CAB, GCMD, HASSET, IPSV, LCSH, MeSH, NMR, SCAS, UNESCO, and AAT. It also has high level map-pings between some of these schemes and DDC and some deeper pilot mappings available. 1.0 Introduction It has become increasingly difficult for users to sat-isfy their information needs due to the rapid expan-sion of the Web and its sprawling nature; it is be-coming progressively impractical for users to consult a wide range of sources to satisfy an information

query. Consequently, it is of growing importance that users are able to search multiple distributed het-erogeneous digital repositories simultaneously. With such a wide variety of resources available, however, the feasibility of achieving interoperability between them is gradually diminishing. Services employ dif-ferent technical standards, indexing practices, search

Page 45: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 G. Dunsire and D. Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability

281

facilities, and algorithms. There is wide variation in the language and terminology on which retrieval sys-tems are founded. As a result, it is no longer suffi-cient for users to make decisions on whether to use keyword or phrase searching, employ Boolean opera-tors, or try their luck with truncation; they must also now give consideration to the terminology they use. Problems relating to disparate terminology use have been an impediment to information retrieval for ma-ny years, but the growth of the Web, associated het-erogeneous digital repositories, and the need for dis-tributed cross-searching within information envi-ronments employing multiple terminologies has re-cently drawn the issue into sharp focus. 2.0 HILT project The High Level Thesaurus (HILT) project comprised four phases of activity carried out by the Centre for Digital Library Research (http://cdlr.strath.ac.uk/) at the University of Strathclyde and funded by the UK’s JISC (Joint Information Systems Committee, http:// www.jisc.ac.uk/) with support from OCLC (http:// www.oclc.org/). The project has been investigating mechanisms to assist the further and higher educa-tion community in the UK with problems associated with providing users the ability to find appropriate learning, research, and information resources by sub-ject search-and-browse in an environment where most national and institutional service providers use different subject schemes to describe such resources. Those mechanisms, possibly applied through a JISC Shared Infrastructure Service, would help optimise the value obtained from expenditure on content and services by facilitating resource sharing. The envi-ronment is essentially monolingual with English as the predominant language in terms of resources and their users, although the project has carried out some investigation into non-English vocabularies.

The first phase (Nicholson et al. 2001) established that the preferred approach of the various services in the JISC domain to resolving the issue is one based on mapping the various subject schemes together through a central shared service that provides users with the correct alternative terms to use in the various different schemes. This architecture is referred to as a “spine”, “switching language”, or “hub-and-spoke.” Phase II (Nicholson et al. 2004) built a pilot to illus-trate the functions required of a terminologies service capable of taking a user-input subject term, identifying JISC collections relevant to the subject of the query, and providing the user with the correct subject term

to use for the subject scheme employed by any given identified collection. The project then conducted a feasibility study for developing this into a machine-to-machine (M2M) pilot service to supply terminologies and mapping data for the use of other services, and scoped out an outline design for it. The third phase built the M2M pilot and scoped out a design for an initial entry-level service meeting the needs of a shared infrastructure.

HILT Phase IV (Nicholson, McCulloch, and Jo-seph 2009) developed pilot solutions for some of the problems encountered when cross-searching multi-scheme subject-based information environments, as well as providing a variety of other terminological searching aids. This phase delivered a range of simple M2M terminology services based on SRU (Search/ Retrieve via URL, http://www.loc.gov/standards/sru/), SOAP (Simple Object Access Protocol, http://www. w3.org/TR/soap12/), and SKOS (Simple Knowledge Organization System, http://www.w3.org/2004/02/ skos/), using a database of terminologies and map-pings of terms to the DDC (Dewey Decimal Classifi-cation, http://www.oclc.org/dewey/), along with an embryonic toolkit (CDLR 2009) to help developers of information services embed M2M interactions in user interfaces to improve subject retrieval, browse, and deposit functions. The project also developed a ge-neric distributed subject interoperability and termi-nology services architecture and demonstrated its fea-sibility at a very basic level. A short extension project embedded interaction with HILT M2M services in the user interfaces of various information services serving the JISC community. 3.0 HILT architecture A diagram of the architecture of the pilot HILT sys-tem is given in Figure 1.

Client applications are services such as information retrieval interfaces aimed ultimately at end-users. They access the content of the terminologies database using programmes containing functions from the HILT Application Programming Interface (API). These functions include: a) get_collections: Takes a specified DDC notation

and returns metadata about collections classified under the notation or its stems. The metadata in-clude the subject scheme used by the collection’s catalogue or other finding-aid.

b) get_DDC_records: Takes a subject term and re-turns DDC notations and captions mapped to

Page 46: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 G. Dunsire and D. Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability

282

terminologies containing the term. Terminologies include the DDC captions and relative index.

c) get_non_DDC_records: Takes a DDC notation and returns terms from terminologies mapped to the notation. Broader terms mapped to the stem chain of the notation are included in the output. Terminologies include DDC captions.

d) get_all_records: Takes a subject term and returns the combined outputs of get_DDC_records and get_non_DDC_records.

e) get_filtered_set: Takes a subject term and one or more specified terminologies and returns match-ing terms from the terminology, optionally to-gether with their related terms, including broader, narrower, see also, and non-preferred terms.

f) get_sp_suggestions: Takes a subject term and re-turn terms with similar spellings.

g) get_wordnet_suggestions: Takes a subject term and returns definitions and descriptions of the term.

4.0 HILT terminologies database The terminologies database contains mappings to DDC notations from all or part of the following vo-cabularies: Art & Architecture Thesaurus (AAT)(Getty Research

Institute); Commonwealth Agricultural Bureaux (CAB) the-

saurus (CABI 2009); Global Change Master Directory (GCMD) science

keywords (NASA 2009); Humanities and social science electronic thesaurus

(HASSET) (UK Data Archive 2009); Integrated public sector vocabulary (IPSV) (Great

Britain. E-Government Unit. 2006); Joint academic coding system (JACS) (UCAS 2007); JITA classification schema of library and informa-

tion science (JITA) (Barrueco Cruz et al. n.d.); Library of Congress Subject Headings (LCSH) (Li-

brary of Congress, 2009);

Medical Subject Headings (MeSH) (US NLM 2009); National monuments record thesauri (NMR) (Eng-

lish Heritage 1999); Standard classification of academic subjects (SCAS),

a precursor of JACS; UNESCO thesaurus (UNESCO. 2003). Most of the vocabularies are only partially mapped, to provide a pilot testbed. The database also contains the intrinsic mapping of the DDC captions to their DDC notations.

The DDC notations thus form the hub, spine, or switching language between the vocabularies. Any two vocabularies have an implicit cross-walk map-ping via the DDC notation. This cross-walk is in-stantiated in an application programme using the get_DDC_records, get_non_DDC_records, and get_ all_records functions. An advantage of this approach is that only one primary mapping between a vocabu-lary (a spoke) and the hub is required, and it can be maintained independently of other vocabularies. A disadvantage is the increased possibility of semantic misalignment in the two-stage correspondence be-tween terms from different spoke vocabularies.

There are a number of benefits in using a classifica-tion schema as the hub, rather than a subject vocabu-lary based on natural language. Classification nota-tions are independent of natural language and so avoid many of the problems associated with terminology such as case, plurals, antonyms, and synonyms. The notation is usually shorter than the corresponding caption and is unique to the concept, so it can readily form the basis of a term or concept identification sys-tem; homonyms (and translations) make this difficult to achieve with natural language. Classification sys-tems can also provide methods for synthesising nota-tions for new concepts from those that already exist.

The database also contains sample collection-level description metadata to support the get_collections function, and WordNet data to support get_wordnet_ suggestions. The get_filtered_set function is sup-ported by the inclusion of term relationships within

Figure 1. Architecture of the HILT pilot

Page 47: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 G. Dunsire and D. Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability

283

relevant vocabularies in the database. The get_sp_ suggestions function uses an index of all terms re-corded in the database. 5.0 Embedding HILT in end-user services An extension project carried out between January and May 2009 had the aim of demonstrating enhanc-ed functionality of a number of information services by embedding HILT M2M terminology and interop-erability facilities within user interfaces. The services are: 1) The Depot (http://www.depot.edina.ac.uk/): An

e-prints repository service for researchers who do not have access to an institutional repository. Metadata are user-generated by the e-print deposi-tor, and includes subject headings taken from JACS. The service offers hierarchical browsing by subject heading.

2) Intute (http://www.intute.ac.uk/): An online find-ing-aid for web resources, selected by academics, to support study and research. The service offers browsing by a subject heading scheme of 19 catego-ries, followed by browsing and keyword searching of the various subject heading scheme used by dif-ferent component services.

3) Scottish Collections Network (SCONE, http:// scone.strath.ac.uk/Service/Index.cfm): A collection- level descriptions service for identifying and access-ing library, archive, and museum collections located in Scotland. Subject-based collections are classified by DDC and are assigned LCSH topics. The ser-vice offers browsing by DDC notation and LCSH topic, hierarchical browsing by DDC summary (top three levels or approximately 1000 classes) caption, and keyword searching by LCSH topic.

The Depot’s embedding experiment (http://lucas. ucs.ed.ac.uk/cgi-bin/hilt-depot) displays headings from JACS which match a keyword search term en-tered by the user. If no headings are found directly, it displays JACS headings which are mapped to DDC captions (via the DDC notation) containing the search term. The user can then select one or more headings as the subject metadata for their e-print. The demonstrator developed by Intute (http://www. intute.ac.uk/search_hilt.html) displays up to 10 re-lated terms for a keyword search term input by the user, along with metadata for resources matching the input term. It displays up to five terms with alternate spellings if the search term is not found. The dis-

played terms can be used to reiterate a search. The demonstrator also displays DDC captions and nota-tions based on the search term; these are inactive, but have potential use as a source of keywords for searching component catalogues.

The SCONE subject retrieval pilot (http://scone. strath.ac.uk/Service/SCONEServiceHilt/DDCsearchinput.cfm) displays the full hierarchy of DDC captions matching a keyword search term entered by the user. Captions are matched directly and via the other vo-cabularies mapped to DDC notations. The user can select a matched caption from one of the hierarchies as the input to a search for collections with DDC nota-tions matching the notation, or its stem, for the cho-sen caption. This is equivalent to the get_collections function operating on non-HILT collection-level metadata. SCONE also developed a spellchecking pi-lot (http://scone.strath.ac.uk/Service/SCONEService Hilt/IndexSpellCheck.cfm).

These demonstrators illustrate the utility of higher-level terminology functions such as: a) Deriving terms from a target vocabulary; b) Deriving alternate terms from a secondary vo-

cabulary mapped to a target vocabulary; c) Deriving related terms from a target vocabulary; d) Deriving notations from a target classification

scheme; e) Deriving notations from a secondary vocabulary

mapped to a target classification scheme; f) Deriving terms with alternate spelling; and, g) Disambiguating terms. 6.0 Developing a distributed approach Hub-and-spoke mapping architectures are efficient to maintain and scale, but the sheer effort of map-ping thousands of terms to the switching language prevents operational scaling of HILT, even within a monolingual environment, to cover all vocabularies likely to be significant for wide-area information re-trieval. Direct mappings between vocabularies, such as in the MACS project (https://macs.hoppie.nl/ pub/), are even more difficult to scale because of the combinatorial explosion: two vocabularies require 1 mapping, four vocabularies 6 mappings, six vocabu-laries 120 mappings, etc. Machine-generated map-pings can considerably reduce the cost, but require a critical mass of instance data to be even remotely ef-fective. The LCSH to DDC mapping in HILT is de-rived from WebDewey (http://www.oclc.org/dewey/ versions/webdewey/) and is based on statistical asso-

Page 48: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 G. Dunsire and D. Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability

284

ciations found in WorldCat (http://www.worldcat. org/) records. Reliability should increase with the amount of instance data processed, due to the effect of the law of large numbers. However, human review and amendment of such mappings is often required to make them successful; costs increase as a result. It seems, therefore, that all centralised approaches to improving subject interoperability at national and in-ternational scales are doomed to fail, leaving the re-trieval environment littered with the corpses of in-complete and out-of-date mappings.

Fortunately, recent developments in the semantic web offer a way forward. SKOS, used by HILT when returning data called by a function, is a means of rep-resenting vocabularies and term relationships so they can be effectively processed by machines. It currently has the status of a W3C Proposed Recommendation. LCSH has recently become available in SKOS as a download or via a web service from Library of Con-gress Authorities and Vocabularies service (http://id. loc.gov/authorities/). The dewey.info service (http:// dewey.info/) offers SKOS representations of the DDC Summaries in nine languages as an experimental linked data web service. There are proposals to treat the whole of DDC and its translations in a similar way (Panzer 2008). The European DDC Users Group (http://www.slainte.org.uk/edug/index.htm) is moni-toring these developments with keen interest. Many other initiatives are underway world-wide to provide such services for other terminologies, including RAMEAU (http://www.cs.vu.nl/STITCH/rameau/) in French.

The process of representing vocabularies in SKOS assigns a unique identifier, the uniform resource iden-tifier (URI), to each term as well as the vocabulary it-self. Linked data, a semantic web concept exemplified by the LinkingOpenData project (SWEOIG 2009), uses URIs (with specified properties) to expose, share, and connect data on the world-wide web. Data is bro-ken down into simple statements (called triples) such as “TermA has broader term TermB”, represented in a machine-processable format (RDF/XML or Resource

Description Framework in Extensible Markup Lan-guage). At the time of writing, the project had identi-fied around 13 billion triples with around 150 million links, including RAMEAU linked to LCSH via the mappings created by the MACS project.

This distributed approach has the potential to re-place the terminologies database in the HILT system architecture. SPARQL (W3C 2008), an RDF query language which has the status of a W3C Recommen-dation, can substitute for HILT’s M2M access and API facilities, as shown in Figure 2.

Client applications still have to programme higher-level end-user functions using SPARQL, but would benefit from improved reusability and inter-operability resulting from applying a common query language. A coordinating framework for such activ-ity would be highly desirable.

The true power of using the semantic web will be realised when bibliographic records contain subject metadata are also represented as linked data. Client applications will be able to retrieve records seamlessly, processing user input terms using terminology ser-vices and then linking directly to metadata for relevant resources for display. As yet, only a tiny fraction of the world’s online catalogues and other types of re-source description records in machine-readable for-mats has been exposed in this way, although many ini-tiatives are underway or being planned. One of the prerequisites for recasting records as linked data and subsequently accessing them via specific metadata elements is to expose the metadata schemas in use to the semantic web. Again, several significant projects are working towards this, including the ISBD/XML Study Group (http://www.ifla.org/en/events/isbdxml-study-group) with respect to the International Stan-dard Bibliographic Description (ISBD) and DCMI RDA Task Group (http://dublincore.org/dcmirdatask group/) with respect to the RDA: resource descrip-tion and access element set. Discussions about the MARC21 (http://www.loc.gov/marc/) and UNI-MARC (http://www.ifla.org/en/unimarc ) formats are also in progress. Eventually, the schemas can be used

Figure 2. Distributed architecture base on the semantic web

Page 49: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 G. Dunsire and D. Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability

285

to parse instance records into triples, and semantic equivalence between schema elements can be estab-lished as linked data, for example MARC21 tag 651 is the same as the Dublin Core dc.spatial element.

Names and “work” titles as subjects are not being neglected. It is likely, for example, that the Library of Congress Name Authority File (LCNAF) will be made available via the same service as LCSH. And when a large amount of instance records using, or otherwise linked to, two or more subject schemes, such as a classification and controlled subject head-ing, is made available, it provides critical mass for statistical mappings between schemes.

It should be noted that the linked data environ-ment does not favour any specific mapping architec-ture. If Scheme A is directly mapped to Scheme B, and Scheme B is directly mapped to Scheme C, then Scheme A is indirectly mapped to Scheme C; Scheme B automatically becomes a hub with two spokes. If all of the existing equivalence mappings between terms in different schemes, along with all of the se-mantic mappings between terms in the same scheme, were published as linked data the result would be a net of mappings, a mixture of hubs and spokes lin-ked to other spokes. And in many cases there would be more than one chain of links between any two terms. In this scenario, the reliability of each map-ping (the authority of its creators) and the granular-ity of equivalence (exact, near, partial, etc.) will be-come important indicators of “best” pathways be-tween terms. 7.0 Potential role of UDC A prerequisite for a hub connecting subject schemes for a wide range of disciplines and domains is that it encompasses the range; it is necessary for the classi-fication to be universal, covering all areas of human knowledge and endeavour. It is desirable that the classification has already been mapped to one or mo-re subject schemes. Added value is available if the scheme is in widespread use; the hub can be used to retrieve catalogue records directly, instead of using an indirect link via a spoke. The Universal Decimal Classification (UDC) therefore qualifies as a poten-tial hub. It is already in a machine-readable format from which a semantic web representation can be made. Like DDC, UDC has been translated fully or partially into between 30 and 40 languages. UDC is used by many special libraries, so the potential for machine-generated associative mappings with special subject vocabularies is high.

The Renardus project (Koch, Neuroth, and Day 2001) rejectied UDC as its hub, for reasons summed up as “When it comes to digital library applications … the UDC system and its development efforts are clearly insufficient and fall far behind the DDC.” This assessment would change significantly if UDC could catch up with semantic web developments. In particular, exposing the UDC schedules as open linked data, or at second-best allowing UDC in-stance data to be used without licensing restrictions, will allow UDC to become a passive hub as direct and associative mappings are added or created as linked data.

A significant milestone for UDC in this respect was the launch of the UDC Summary service (http:// www.udcc.org/udcsummary/php/index.php) in No-vember 2009. This provides online access to a selec-tion of around 2000 classes from the UDC scheme, comprising main numbers, common auxiliary num-bers, and special auxiliary numbers. The captions are available in 16 languages, with plans to add a further 7. The data provided by the service is released under a Creative Commons license allowing copying and re-use on condition that attribution is assigned to the UDC Consortium and any redistribution uses the same licence. A further milestone will be reached with the planned release of export formats and mappings to other schemes early in 2010. Export formats will in-clude RDF/XML, making the classes compatible with the semantic web and allowing them to be published as linked data. These developments will make UDC as viable as a potential hub as DDC. Indeed, if the exist-ing mapping between the high-level classes of UDC and DDC is included in the UDC Summary service in an RDF representation, there is further potential for hybrid hubs using a mix of UDC and DDC, and su-per-hubs (hubs of hubs based on UDC or DDC).

Classification is indeed at the crossroads. It has the potential to become the spaghetti junction (Wikipedia 2009) of the information superhighway. But, like so many other professional library and information prac-tices, services, and systems, classification is also at an administrative, business and technical crossroads, where decisions must be made as to which direction is best for the future. References Barrueco Cruz, José Manuel, et al. (n.d.). JITA clas-

sification schema of library and information science. Available http://eprints.rclis.org/jita/.

Page 50: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 G. Dunsire and D. Nicholson. Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability

286

CABI. 2009. CAB thesaurus. Available www.cabi.org/ cabthesaurus/.

CDLR. 2009. [HILT toolkit]. Centre for Digital Li-brary Research. Available http://hilt4.cdlr.strath. ac.uk/toolkit.zip.

English Heritage. 1999. National monuments record thesauri. Available http://thesaurus.english-heritage. org.uk/.

Getty Research Institute. No date. Art & architecture thesaurus online. Available http://www.getty.edu/ research/conducting_research/vocabularies/AAT/

Great Britain. E-Government Unit. 2006. IPSV - In-tegrated public sector vocabulary. Version 2.00. Available http://www.esd.org.uk/standards/ipsv/.

Koch, Traugott; Neuroth, Heike; and Day, Michael. 2001. DDC mapping report: Renardus D7.4. Avail-able http://homes.ukoln.ac.uk/~tk213/Mapping report-d74.htm.

Library of Congress. 2009. Library of Congress au-thorities. Available http://authorities.loc.gov/.

NASA. 2009. GCMD's science keywords and associ-ated directory keywords. Available http://gcmd. nasa.gov/Resources/valids/archives/keyword_list.html.

Nicholson, Dennis, et al. 2001. HILT: High-Level Thesaurus project final report to RSLP & JISC. Available http://hilt.cdlr.strath.ac.uk/Reports/Final Report.html

Nicholson, Dennis, et al. 2004. HILT: High-Level Thesaurus project phase II : a terminologies server for the JISC Information Environment : final report to JISC. Main report. Available http://cdlr.strath. ac.uk/pubs/nicholsond/HILT2FinalMain.pdf

Nicholson, Dennis; McCulloch, Emma; and Joseph, Anu. 2009. HILT IV and embedding extension. JISC final report. Available http://hilt.cdlr.strath. ac.uk/hilt4/documents/finalreport.pdf

Panzer, Michael . 2008. Cool URIs for the DDC: to-wards web-scale accessibility of a large classification system. In Greenberg, J. and Klas, W., eds. Metadata for Semantic and Social Applications. Proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22-26 September 2008, pp. 183-190. Available http://webdoc.sub. gwdg.de/univerlag/2008/DC_proceedings.pdf.

SWEOIG. 2009. LinkingOpenData. W3C Semantic Web Education and Outreach Interest Group. Available http://esw.w3.org/topic/SweoIG/Task Forces/CommunityProjects/LinkingOpenData

UCAS. 2007. JACS 2.0. Available http://www.ucas. com/he_staff/datamanagement/jacs/jacs20.

UK Data Archive. 2009. Humanities and social sci-ence electronic thesaurus. Available http://www. data-archive.ac.uk/search/hassetSearch.asp.

UNESCO. 2003. UNESCO thesaurus. Available http://www2.ulcc.ac.uk/unesco/.

US NLM. 2009. Medical subject headings. United States National Library of Medicine. Available http://www.nlm.nih.gov/MeSH/.

W3C. 2008. SPARQL query language for RDF. Avail-able http://www.w3.org/TR/rdf-sparql-query/.

Wikipedia. 2009. Spaghetti junction. Available http:// en.wikipedia.org/wiki/Spaghetti_junction.

Page 51: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

287

Terminology Web Services†

Ceri Binding* and Douglas Tudhope**

Hypermedia Research Unit, University of Glamorgan, Pontypridd, CF37 1DL, UK, *<[email protected]>, ** <[email protected]>

Ceri Binding is a Research Associate in the Hypermedia Research Unit, Faculty of Advanced Tech-nology, University of Glamorgan. Ceri graduated with a BSc in Computer Studies in 1997 whilst working as an Analyst Designer / Programmer for Hyder IT, before joining Glamorgan in 2000. He had responsibility for development work on the FACET project and implemented various standalone and web systems for the project. He is currently conducting research and development work for the STAR project, involving use of SKOS and CRM data. Related research interests include Knowledge Organisation Systems, intelligent web-based retrieval and interface design.

Douglas Tudhope is Professor in the Faculty of Advanced Technology, University of Glamorgan and leads the Hypermedia Research Unit. His area of research is knowledge organization systems and ser-vices. He was principal investigator on the FACET and STAR projects. Since 1997, he has been editor of New Review of Hypermedia and Multimedia. He is a member of the Networked Knowledge Or-ganisation Systems/Services (NKOS) network. He co-authored the 2006 JISC State of the Art Review on Terminology Services and Technology and the JISC Terminology Registry Scoping Study.

† This research was supported by the Arts and Humanities Research Council (AHRC). The authors would like to thank Keith May, Phil Carlisle and other staff from English Heritage for their assis-tance, as well as Andy Priest, Janine Rigby and Caroline Williams, from MIMAS.

Binding, Ceri and Tudhope, Douglas. Terminology Web Services. Knowledge Organization, 37(4), 287-298. 16 references. ABSTRACT: Controlled terminologies such as classification schemes, name authorities, and thesauri have long been the do-main of the library and information science community. Although historically there have been initiatives towards library style classification of web resources, there remain significant problems with searching and quality judgement of online content. Terminology services can play a key role in opening up access to these valuable resources. By exposing controlled terminologies via a web service, organisations maintain data integrity and version control, whilst motivating external users to design innova-tive ways to present and utilise their data. We introduce terminology web services and review work in the area. We describe the approaches taken in establishing application programming interfaces (API) and discuss the comparative benefits of a dedicated terminology web service versus general purpose programming languages. We discuss experiences at Glamorgan in creating ter-minology web services and associated client interface components, in particular for the archaeology domain in the STAR (Se-mantic Technologies for Archaeological Resources) Project. We go on to consider the case for more specialised terminology services for different kinds of controlled vocabulary. 1.0 Introduction Conventional web search involves users manually re-solving any ambiguity post search, by choosing rele-vant documents from a sea of textual matches. Users eventually learn to use term co-occurrence coupled with unusual or less ambiguous terms. Term sugges-tion tends to be based on transient popularity metrics. Keyword search and manual disambiguation of a vast

and diverse range of resources is still disappointing. Certain search engine features originate from library science and historically there have been initiatives to-wards categorisation of online resources, but there remains a chasm between library content and online content.

Controlled vocabularies are frequently cited as beneficial resources in this area, providing a useful mediating interface for search operations. Controlled

Page 52: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

288

vocabularies consist of terms considered useful for re-trieval purposes, which are used to represent concepts. This vocabulary can be used by Knowledge Organiza-tion Systems (KOS), which structure their concepts via various forms of semantic relationships. Exposing access in the form of terminology services enables programmatic integration of these useful resources into other applications. 2.0 What are terminology services? A JISC (Joint Information Systems Committee, UK) review of terminology services and technology (Tudhope, Koch, and Heery 2006, 7) describes ter-minology services as:

a set of services that present and apply vocabu-laries, both controlled and uncontrolled, includ-ing their member terms, concepts and relation-ships.… They can be applied as immediate ele-ments of the end-user interface (e.g. pick lists, browsers or navigation menus, search options) or can underpin services behind the scenes.

We are referring in this paper specifically to termi-nology web services-distributed data service func-tionality, opening up programmatic access to con-trolled terminologies for other organisations to base applications on. The services ideally expose open, freely accessible data.

Web Services generally have been applied for some time in a variety of applications and with different un-derlying bindings. Gardner (2001) gives an introduc-tion in a digital library context. Terminology web ser-vices are a more recent development although we can trace one line of descent to earlier work on protocols for programmatic access to networked (distributed) KOS, see for example, Davies (1996). In 1998, the second NKOS workshop had as one of its themes a “functional model of the process of using a KOS over a network.” Johnson (2004) outlined a theoretical proposed network of thesaurus access and navigation services. Binding and Tudhope (2004) detailed some early approaches at defining coherent service proto-cols, notably the CERES (California Environmental Resources Evaluation System), Zthes, and the ADL (Alexandria Digital Library) thesaurus protocols.

Simple Knowledge Organization Systems (SKOS) is about developing specifications and standards to support the use of knowledge organization systems (KOS) within the framework of the Semantic Web. SKOS allows Knowledge Organization Systems to be

represented in the Resource Description Framework (RDF) for purposes of interoperability. SKOS is an ef-fort by the W3C Semantic Web Deployment Working Group (SWDWG). In an earlier project leading up to this effort, the Semantic Web Advanced Development (SWAD) Europe project defined the SKOS API and implemented the DREFT (Demo of RDF Thesaurus) server demo.

In common with other APIs, terminology services offer developers the major advantage of not imple-menting all functionality from scratch. Basic pro-grammatic patterns can be invoked by calling on al-ready existing program libraries. If the patterns corre-spond to commonly agreed or widely applicable use cases then development proceeds faster by building on previous work. There are a number of advantages of terminology services over other forms of distribution. The terminology provider can maintain version con-trol and the user automatically always has access to the most up to date version of their work. Services are platform/location agnostic; the calling application does not have to be implemented using the same pro-gramming language and operating system as used for the service. Furthermore, service providers do not have to be the KOS creators/owners but may offer services based on KOS developed elsewhere. One pos-sible downside is that applications become reliant on constant network availability (assuming the service is located externally) and external server infrastructure, but in general the positives appear to outweigh the negatives. 2.1 Users and uses of terminology services End users might wish for some ready made “wid-gets” to slot into their systems, so service users may be systems developers looking to incorporate vo-cabulary data into their own applications. They may be cataloguers seeking to annotate their repository content with established terminology (see, for ex-ample, Vizine Goetz et al. 2006), or web searchers wishing to improve search performance via various forms of vocabulary based query expansion (Binding and Tudhope, 2004).

Terminology services could find usage in a number of complementary areas. Improved search facilities in-volving term suggestion are already being imple-mented within commercial search interfaces (e.g., Google Suggest, Flickr). Tag suggestion systems are used to improve search engine rankings by manipula-tion of metadata indexing for competitive advantage (deriving popular synonyms describing core compe-

Page 53: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

289

tencies for an organisation). In the digital library area, suggestion systems can be used to catalogue/index/ annotate repository content with controlled vocabu-lary terms.

Social tagging systems could also benefit from alignment with established common indexing termi-nology (Golub et al. 2009). The growth of social bookmarking sites indicates a desire for the personal organisation and structuring of web resources. Social tagging produces some interesting results, but also produces ambiguous vocabularies mixing index terms with opinions. Intuitive tools incorporating estab-lished controlled terminologies in fields other than li-braries remain sparse, yet there are clearly potential gains in facilitating their use in this area. 3.0 Existing terminology services We review a selection of terminology web services to illustrate some interesting contemporary projects and the breadth of applications in this area (this is not intended as an exhaustive list). Some general de-finitions are given first.

XML (eXtensible Markup Language) is a standard markup language for Web documents. RDF (Resource Description Framework) is a standard conceptual mo-delling language for the Semantic Web, based on sub-ject-predicate-object triples. SOAP (Simple Object Access Protocol) is a protocol specification for ex-changing structured information using Web services, while REST (Representational State Transfer) is a ligh-ter weight HTTP protocol. JSON (JavaScript Object Notation) is a lightweight computer data interchange format used for serializing and transmitting structured data over a network connection. SRU (Search/Retrie- val via URL) is a REST based protocol for Internet search queries. SparQL (Simple Protocol and RDF Query Language) is a standard RDF query language. The concept of Linked Data forms part of the vision of a ‘web of data’; content is made available in RDF, addressed via virtual but persistent URIs that allow HTTP clients to “negotiate” their preferred represen-tation of the content. i. The German National Library of Economics

(ZBW) has published an experimental REST (Representational State Transfer) web service in-terface to the STW Thesaurus for Economics. The service offers both XML and JSON output formats.

ii. OCLC have produced a set of services accessible via the SRU (Search/Retrieval via URL) query

language CQL. Concept details can be retrieved in a variety of formats – HTML, MARC XML, SKOS, and Zthes, from a number of controlled vocabulary resources.

iii. The CATCH (Continuous Access To Cultural Heritage, NL) programme, in the context of the STITCH (Semantic Interoperability to Access Cultural Heritage) and TELplus projects, has developed a SKOS-based Vocabulary and Align-ment service prototype. The core of the service is SOAP-based, with a REST-like access layer, re-turning RDF/SKOS data and JSON output for concepts.

iv. The European Environment Information and Observation Network (EIONET) GEMET the-saurus has a REST interface, derived from the SKOS API definition.

v. The Library of Congress Authorities and Vo-cabularies service is a groundbreaking demon-strator of a REST Linked Data service exposing LCSH SKOS data.

vi. The HILT (High Level Thesaurus) Phase IV project has produced a SRU/W (Search and Re-trieve Web) Service operating against a number of common vocabulary resources.

vii. The Food and Agriculture Organisation of the United Nations (FAO), under their Agriculture Information Management Standards (AIMS) ini-tiative, have produced the Agrovoc Concept Server with a set of terminology web services.

viii. The Getty Vocabularies Web Services offer re-trieval and update of Getty vocabularies to licen-sees of the vocabularies in real time.

ix. The American National Biological Information Infrastructure (NBII) Biocomplexity Thesaurus is exposed as a terminology web service based on SKOS API.

x. The Finnish Semantic Computing Research Group (SeCo) have implemented ONKI SKOS – a server for lightweight vocabularies in SKOS and ontologies in RDFS/OWL (RDF Schema/ Web Ontology Language) format with web ser-vice support (Tuominen et al. 2009)

xi. The UK Becta Vocabulary Bank provides an SRU web services interface to its educational vo-cabularies via the Zthes profile, with some addi-tional indexes.

xii. The British Oceanographic Data Centre (BODC) Data Grid’s Vocabulary Server pro-vides web service access to its vocabularies repre-sented in SKOS. A mapping service is based on the SKOS mapping relationships.

Page 54: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

290

xiii. As part of the Explicator project, Gray et al. (2009) have implemented a vocabulary search web service, applied to SKOS astronomy related vocabularies, which focuses on identifying the best vocabulary concept for a given query string.

Despite the clear success of early terminology service implementations there are still some hurdles to over-come to facilitate greater adoption and use. Some ex-isting large scale “standard” vocabularies have licens-ing restrictions on their usage. In order to offer termi-nology as a persistent service, there is first the need to resolve licensing and copyright issues. Perhaps this would be an opportune moment to suggest that (part of) UDC (Universal Decimal Classification) could be released for public use and for incorporation into some of the existing terminology services? 4.0 Programmatic API approaches / protocols Currently various approaches are taken to exposing programmatic access to vocabulary data via a network: a. Linked Data b. SKOS API, SRU/W c. SPARQL Endpoints d. Combinations of the above The distinction between SKOS API (say) and Lin-ked Data is not necessarily entirely mutually exclu-sive. SKOS API is an abstract interface so could be implemented via a RESTful approach. While current Linked Data implementations tend to involve more “atomic” implementations, exposing data at the level of individual resources (e.g., concepts), a terminol-ogy service could offer various forms of search func-tionality over associated linked data. This may be ne-cessary for some use cases, where following individ-ual links in turn may be impractical.

However, a discussion on the relative merits of SOAP vs. REST vs. XML-RPC (XML Remote Pro-cedure Call), etc., would risk missing the point; a service API is abstract, specifying what you are able to ask for and what you can expect to get back. The value of an established API can get lost in occasion-ally zealous discussions about what is essentially a low level delivery mechanism. The issue then is more between using a specific API (linked data, SKOS API, SRU) versus a more flexible query interface (SPARQL).

The specific API approach has a number of attrac-tive features:

i. Abstracts and hides underlying architecture and implementation details.

ii. Predefined functionality – limited defined set of function calls. User does not need to know any-thing about the underlying data schema, just the expected syntax for calls and responses.

iii. Can implement efficient methods with server si-de optimisation.

iv. Can take advantage of browser cache for more efficient use of services.

SPARQL endpoints, on the other hand, are a slightly different proposition. Whilst SPARQL undoubtedly offers very powerful server side facilities with advan-tages of flexibility there are also some not insignifi-cant associated disadvantages, which may serve to limit their viability or attractiveness for use as a reli-able outward facing terminology service mechanism. 4.1.1 SPARQL advantages i. Flexibility – end user decides nature of query

and data to be returned ii. Standardisation – query compatible with any

SPARQL enabled system iii. Native implementations within some platforms,

no need to deploy any specific server application. 4.1.2 SPARQL disadvantages i. To construct a SPARQL query the end user

needs to have detailed knowledge of the underly-ing data schema. It also delegates optimization of queries to the end user.

ii. Use of SPARQL as the API rather dictates the underlying implementation.

iii. Does not easily support implementation of con-cept expansion and other algorithm based / prob-abilistic functionality.

iv. Publicly available SPARQL endpoints are elegant but in practice not necessarily an appropriate solu-tion. The same arguments apply as to exposing a public SQL(Structured Query Language) inter-face – they may expose the server to excessive / malicious activity.

v. SPARQL queries incorporating full text querying can be inefficient, as they involve regular expres-sion filtering. As a consequence, performance may not be sufficient for real-time applications. In fact, we worked around this limitation in the STAR project by supplementing the underlying triple store database with a full-text index. Alistair Miles

Page 55: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

291

also reported encouraging experience of using the Lucene full text search engine in concert with LARQ (a Jena bridge between ARQ and Lucene) to work around the same issues (See SKOS list, February 2009).

To some degree, the appropriate choice depends on the particular circumstances and development con-text, along with user requirements. This is also cur-rently a fairly quickly moving field. 5.0 Use of terminology services at Glamorgan A series of projects has explored the use of termi-nology services and Glamorgan and developed vari-ous service and client implementations. 5.1 Pilot Client for SKOS API In 2003, a use case driven low level SKOS API was developed by ILRT (Institute for Learning & Re-search Technology, Bristol) for the SWAD Europe project. Although the demonstrator implementation (DREFT) took the form of a set of SOAP based web services, the API was intended as an abstract defini-tion of the standard functionality that a SKOS thesau-rus service might typically offer at the API level, inde-pendent of whether machine access was via a web ser-vice. Development and maintenance of the DREFT software effectively ended when that project ended in 2004, but there has been continuing interest in expos-ing vocabulary resources to programmatic access and a number of practical approaches have come to the fore.

In 2005, University of Glamorgan created a Win-dows based client application as a research prototype (Tudhope and Binding 2006) working against this existing SKOS API DREFT service (running but un-supported) at ILRT Bristol. The application was a ‘rich client’ browser displaying concept details and facilitating browsing via semantic links, as shown in Figure 1 (accessing the GEMET thesaurus).

Due to limitations imposed by the remote server configuration, the application utilized only a small subset (two) of the possible SKOS API calls: ‘get-Concept’ and ‘getAllConceptRelatives’. At the time these calls did not return sufficient relationship in-formation, so the browser could only display immedi-ate semantically related terms, without indicating the specific nature of the relationship. The application did, however, provide a fast enough response for satisfac-tory real-time interaction, and a further enhancement involving the caching of previously retrieved data sig-

nificantly improved the user experience. The exercise provided initial empirical evidence that the SKOS API in the form of a web service could be used to support real-time client applications, and this motivated the development of further services and applications within the scope of our later projects. 5.2 STAR Project services and clients

based on SKOS API The STAR project subsequently developed a pilot set of web services based on a subset of the SWAD-Europe SKOS API, with extensions for concept ex-pansion. Our implementations typically concentrated on providing the functionality necessary for our own purposes, rather than a complete (re)implementation of the original SKOS API DREFT server. The service currently consists of 7 function calls (see Figure 1). The services provide string matching across the asso-ciated thesauri, which are represented in SKOS, along with browsing and semantic concept expansion within a chosen thesaurus. Figure 2 summarises the services. The STAR website provides more details under Semantic Terminology Services, including a WSDL (Web Services Description Language) file and service description and an example client that can be downloaded. – GetConceptSchemes Returns an array of all sup-

ported ConceptSchemes in the triple store. – GetConceptScheme Given the URI of a particu-

lar ConceptScheme, returns a data structure rep-resenting that ConceptScheme.

Figure 1. Initial SKOS API client application

Page 56: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

292

– GetTopmostConcepts Given the URI of a particu-lar ConceptScheme, returns an array of Concepts that are positioned at the top of the hierarchical structure.

– GetConcept Given the URI of a particular Con-cept, returns a data structure representing that Concept.

– GetAllConceptRelatives Given the URI of a par-ticular Concept, returns an array of ConceptRela-tive – consisting of all directly related Concepts and their associated relationship.

– expandConcept Given the URI of a particular Concept, performs a spreading expansion of that Concept, using supplied weighting parameters for core thesaurus relationships. Returns an array of ConceptRelative which includes a distance metric representing the semantic distance of each Concept from the originating Concept.

– getKeywordMatch General free text search against the preferredLabel (and optionally the non-PreferredLabels) of all Concepts in the triple store. Returns an array of RDFTriple indicating the indi-vidual triples where the match occurred.

The thesauri used for the STAR project were SKOS conversions of thesaurus data received from English Heritage. The services were used in conjunction with applications for cross-search of archaeological data-sets, allowing searching to be augmented by SKOS-based vocabulary resources. A series of demonstrator client applications were developed (Figure 3) extend-ing the functionality of the initial SKOS API client application.

Queries are often expressed at a different level of generalization from document content or metadata, or may employ a slightly different semantic perspec-tive. In combination with the search system, the ser-vices allowed queries to be expanded by synonyms or by concept expansion over the SKOS semantic re-lationships. Concept expansion was based on a measure of “semantic closeness” (Binding and Tud-hope 2004). Subsequently a number of web browser based “widget” controls were developed (Figure 4), working against the same underlying services. These were developed to be incorporated within online STAR demonstrators and other applications.

Figure 2. STAR SKOS_WS Service Interface

Page 57: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

293

Figure 3. SKOS API client application further developed for the STAR project

Figure 4. Browser widgets developed for the STAR project

Page 58: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

294

During the course of the project the STAR ser STAR demonstrators and other applications.vices have gone on to be utilized by other projects – notably the ADS (Archaeological Data Service) ArchaeoTools pro-ject and a DELOS prototype Digital Library Manage-ment System (Binding et al. 2007). They have also been used by undergraduate projects within the University. This demonstrates their utility beyond the particular domain for which they were originally developed. 6.0 A case for more specialised services Current service implementations tend to conflate different kinds of vocabularies in a common pro-grammatic interface; indeed in areas where there is a degree of commonality it makes sense to provide common service functionality across multiple vo-cabularies. However, there is also a potential case for more specialist services.

The reference documentation for SKOS refers to a common data model for knowledge organization systems. Short of creating specialized subclasses of skos:ConceptScheme there is currently no way to specify the “type” of a vocabulary in SKOS, so appli-cations accessing the data would potentially treat thesauri and classification schemes (for example) as if they are same. Thus there is a case in general for specialised extensions to SKOS.

Our work to date has primarily involved exposing thesauri for programmatic access. More recently, however, building on core elements of the STAR work, we developed a term suggestion service work-ing against the Dewey Decimal Classification (DDC), with a URL-based service call interface returning JSON/XML data. This service was developed for a project PERTAINS (PERsonlisation Tagging interface Information in Services), led by MIMAS (University of Manchester) to explore personalization of tag sug-gestions for users of their COPAC and Intute sys-tems. This initial work surfaced a number of observa-tions concerning the differences between thesauri and other vocabularies. With particular emphasis on major schemes, classification schemes: i. tend to be more general, covering a wider subject

area (i.e., whole library); ii. tend to have longer, more descriptive captions; iii. have an associated notation (often encompassing

a specific ordering principle); iv. tend to be more associated with browsing usage; v. tend to be intended for classification, not index-

ing;

vi. tend to encourage pre-coordinated descriptor strings for use in indexing and browsing (as op-posed to post-coordinated thesauri) – see, for example, Broughton (2001) and FATKS (Facet Analytical Theory in Managing Knowledge Structures).

Pre-coordinated descriptors and ordering based on notation have been emphasised as important distinc-tive elements of classification schemes (Broughton 2001, Gnoli and Hong 2006). These differences have potential implications for the service calls to be ex-posed. Possible specialisation extensions to services for classification schemes would be services to handle pre-coordination of terms informed by facet grammar or synthesis rules, incorporating validity checking constraints and also ranking/ordering services.

Term suggestions in a “type ahead” style interface work well when every term is unique, as is the case in a thesaurus. Term lookup in classifications and subject heading schemes however becomes more complex, since a term can appear in many more places within captions. The context of DDC terms depends on their ancestry for clarity in online display (this issue was observed in another Glamorgan project, EnTag: En-hanced Tagging for Discovery). When offering sug-gestions starting with the characters typed, even just within the 1000 top level classes of the DDC Summa-ries, the term “Philosophy and theory” occurs over 100 times; only with the associated context of the broader term would each suggestion be useful.

The “reverse order” characteristic of LCSH (Li-brary of Congress Subject Headings) terms (see Figure 5) would make them less appropriate for interactive type ahead style interfaces, as they often share a com-mon prefix:

Laurence-Moon Syndrome — ultrastructure

Laurence-Moon Syndrome — therapy

Laurence-Moon Syndrome — surgery

Laurence-Moon Syndrome — rehabilitation

Laurence-Moon Syndrome — radiotherapy

(etc.)

Figure 5. LCSH subject headings In order to reduce the volume of suggestions (due to the nature of the DDC captions as described previ-ously), the term suggestion service for the PER-TAINS project incorporated an extra parameter allow-ing the user to specify areas of interest from the higher level categories. In the demonstration applica-

Page 59: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

295

tion (Figure 6), a search on “moon” is restricted to suggestions from class 520 (Astronomy). This pre-vents suggestions, e.g., from astrology, author names, place names, from being returned. The problem of qualifying the returned suggestions is however still evident in this particular example. 7.0 Conclusions This paper has introduced terminology services and reviewed work in the area. Implementations in various projects at Glamorgan have been discussed along with some issues arising. The choice of employing a termi-nology service over alternative methods of delivering programmatic access to vocabularies depends on the application use cases and the skill set of developers in-volved. Some situations may involve a combination of (say) terminology services, linked data, general query languages not designed specifically for vocabularies.

Section 4 discusses pros and cons. General purpose languages (such as SPARQL or SRU) may offer flexi-bility if developers are familiar with the language. Fur-thermore, terminology services rely on network avail-ability (assuming the service is located externally) and external server infrastructure. On the other hand, the

limited set of function calls provided by a terminology service can offer advantages in hiding details of the underlying architecture or representation, while being optimised for common use cases involving online vo-cabularies. A terminology web service is not restricted to any particular client platform nor development lan-guage. This may suit some development situations.

Thus terminology services enjoy a set of distinc-tive advantages for many contexts and situations. These include: i. Abstracts and hides underlying architecture and

implementation details; ii. Predefined functionality – limited defined set of

function calls. User does not need to know any-thing about the underlying data schema, just the expected syntax for calls and responses;

iii. Services are platform/location agnostic; iv. Can implement efficient methods with server side

optimisation; v. Can take advantage of browser cache for more ef-

ficient use of services; vi. Can assist the terminology provider maintain ver-

sion control.

Figure 6. DDC search within specific categories

Page 60: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

296

Strong commonality exists in the abstract API of cur-rent terminology services. We have discussed pro-grammatic API approaches and observed how this commonality can sometimes be lost in technical dis-cussions of low level delivery mechanisms such as REST/SOAP/RPC. Current terminology services and associated data models have tended to conflate various types of vocabulary in the interests of common pur-pose. However, there are compromises inherent in this approach, and we have discussed the case for more specialised services, particularly for major classifica-tion schemes. References Binding, Ceri et al. 2007. DelosDLMS: infrastructure

and services for future digital library systems, 2nd DELOS Conference, Pisa (2007). Available http:// www.delos.info/index.php?option=com_content& task=view&id=602&Itemid=334

Binding, Ceri and Tudhope, Douglas. 2004. KOS at your service: Programmatic access to knowledge organization systems. Journal of Digital Informa-tion, 4 (4). Available http://journals.tdl.org/jodi/ article/view/110/109

Broughton, Vanda. 2001. Faceted classification as a ba-sis for knowledge organization in a digital envi-ronment: The Bliss Bibliographic Classification as a model for vocabulary management and the creation of multi-dimensional knowledge structures. New review of hypermedia and multimedia, 7: 67-102.

Davies, Ron. 1996. Thesaurus-aided searching in search and retrieval protocols. In Rebecca Green ed., Knowledge organization and change: proceed-ings of the 4th International ISKO Conference. Frankfurt: Indeks Verlag, pp. 137-43.

Gardner, Tracy. 2001. An Introduction to Web ser-vices. Ariadne, 29. Available http://www.ariadne. ac.uk/issue29/gardner/intro.html

Gnoli, Claudio and Hong Mei. 2006. Freely faceted classification for web based information retrieval. New review of hypermedia and multimedia 12: 63-81.

Golub, Koraljka, et al. 2009. EnTag: enhancing social tagging for discovery. Proc. 9th ACM/IEEE-CS

Joint Conference on Digital Libraries (JCDL 2009), Austin, TX, June, ACM Press, pp 163–72.

Gray, Alasdair et al. (2009 in press). Finding the right term: Retrieving and exploring semantic concepts in astronomical vocabularies. Information processing and management doi:10.1016/j.ipm.2009.09.004

Johnson, Eric H. 2004. Distributed thesaurus Web services. Cataloging and classification quarterly 37 (3-4): 121-53.

SKOS public discussion list: public-swd-wg@w3. org: Mail Archives. Available http://lists.w3.org/ Archives/Public/public-swd-wg/

Summers, Ed; Isaac, Antoine; Redding, Clay; Krech, Dan. 2008. LCSH, SKOS and linked data. Pro-ceedings of the International Conference on Dublin Core and Metadata Applications, (DC 2008). Available http://arxiv.org/abs/0805.2855

Tudhope, Douglas and Binding, Ceri. 2006. Toward terminology services: experiences with a pilot web service thesaurus browser. ASIS&T bulletin 32(5): 6–9, June/July 2006.

Tudhope, Douglas and Binding, Ceri. 2008. Faceted thesauri. Axiomathes 18(2): 211–22. Available http://www.springerlink.com/content/m67378t5576g8670/fulltext.pdf

Tudhope, Douglas, Koch Traugott and Heery, Ra-chel. 2006. Terminology services and technology: JISC state of the art review. Available http://www. jisc.ac.uk/media/documents/program-mes/capital/terminology_services_and_technology_review_sep_06.pdf

Tuominen, Jouni; Frosterus, Matias; Viljanen, Kim; Hyvönen, Eero. 2009. ONKI SKOS server for pub-lishing and utilizing SKOS vocabularies and on-tologies as services. Proceedings of the 6th European Semantic Web Conference (ESWC 2009), Heraklion, Greece, May 31 - June 4, 2009. Springer-Verlag. Available http://www.seco.tkk.fi/publications/2009/ tuominen-et-al-onki-skos-2009.pdf

Vizine-Goetz, Diane; Houghton, Andrew; Chil-dress, Eric. 2006. Web services for controlled vo-cabularies. ASIS&T bulletin 32 (5-6): 9-12.

All URLs are last checked in February 2010.

Page 61: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

297

Appendix 1: Terminology Web Services

Alexandria Digital Library. The ADL The-saurus Protocol. http://www.alexandria.ucsb.edu/thesaurus/specification.html

Archeological Data Service. ADS Arche-oTools project. http://ads.ahds.ac.uk/project/archaeotools/

Becta Vocabulary Bank Web Services. http://bank.vocman.com/bank-webapp/technical

BODC British Oceanographic Data Centre, Natural Environment Research Council, NERC Vocabulary Server.

http://www.bodc.ac.uk/products/web_services/vocab/

CATCH Vocabulary and alignment repository demonstrator. http://www.cs.vu.nl/STITCH/repository/

CERES and National Biological Information Infrastructure (NBII) Biological Resources Division (BRD). The CERES/NBII Thesau-rus Partnership Project.

http://ceres.ca.gov/thesaurus/

Copac National, Academic, and Special Li-brary catalogue. http://copac.ac.uk/

EIONET GEMET web services. http://www.eionet.europa.eu/gemet/webservices?langcode=en

EnTag - Enhanced Tagging for Discovery Pro-ject. http://www.ukoln.ac.uk/projects/enhanced-tagging/

Explicator Project. http://explicator.dcs.gla.ac.uk/

FAO Agrovoc web services. http://aims.fao.org/website/Documentation/sub

FATKS - Facet Analytical Theory in Manag-ing Knowledge Structures. http://www.ucl.ac.uk/fatks/

Getty vocabularies web services. http://www.getty.edu/research/conducting_research/vocabularies/ vocab_web_services.pdf

HILT SRU/W Server. http://hilt4.cdlr.strath.ac.uk/hilt_sru.cgi

Jena – A Semantic Web Framework for Java. http://openjena.org/

LARQ - Free Text Indexing for SPARQL. http://jena.sourceforge.net/ARQ/lucene-arq.html

Library of Congress. Authorities and Vocabu-laries service. http://id.loc.gov/authorities/

Linked Data : Connect Distributed Data Across the Web. http://linkeddata.org/

MIMAS. Centre of Excellence, University of Manchester. http://mimas.ac.uk/

CSA/NBII Biocomplexity Thesaurus Web Services. http://nbii-thesaurus.ornl.gov/thesaurus/

OCLC terminology services project and prototype.

http://www.oclc.org/research/projects/termservices/ [accessed 7/17/09, no longer available

PERTAINS - PERsonlisation Tagging inter-face INformation in Services presenting tag re-commenders in UK national services.

http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/pertains.aspx

SKOS - Simple Knowledge Organisation Sys-tems - W3C Semantic Web Deployment Working Group.

http://www.w3.org/2004/02/skos/

SKOS API (SWAD Europe). http://www.w3.org/2001/sw/Europe/reports/thes/skosapi.html

SPARQL endpoint. Semantic Web. http://semanticweb.org/wiki/SPARQL_endpoint

Page 62: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 C. Binding and D. Tudhope. Terminology Web Services

298

STAR - Semantic Technologies for Archaeo-logical Resources Project. University of Gla-morgan Hypermedia Resarch Unit.

http://hypermedia.research.glam.ac.uk/kos/STAR/

STITCH @ CATCH - Semantic Interopera-bility to access Cultural Heritage. http://www.cs.vu.nl/STITCH/

STW Web Services (beta)- German National Library of Economics REST web service. http://zbw.eu/beta/stw-ws

SWAD Europe. http://www.w3.org/2001/sw/Europe/Overview.html

SWAD-Europe Thesaurus Activity. Deliver-able 8.7. RDF Thesaurus Prototype. http://www.w3.org/2001/sw/Europe/reports/thes/8.7/#sec-demo-server

The Zthes specifications for thesaurus represen-tation, access and navigation. http://zthes.z3950.org/

Page 63: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

299

Visual Analysis of Classification Scheme

Veslava Osińska

Institute of Information Science and Book Studies, Nicolas Copernicus University, ul. Gagarina 13a, 87-100 Toruń, Poland <[email protected]>

Veslava Osińska received an MSc in physics from Vilnius University and a Ph.D. in Information Sci-ence and Bibliography from Nicolaus Copernicus University in Torun (Poland), where she teaches In-formation and Communication Technology and Computer Graphics. She has applied her Computer Science background and programming skills to research areas which include effective visualization of multidimensional information, as for example, bibliographical data generated in digital libraries. She is a member of the Polish Chapter of the International Society for Knowledge Organization and Polish Computer Science Society.

Osińska, Veslava. Visual Analysis of Classification Scheme. Knowledge Organization, 37(4), 299-306. 25 references. ABSTRACT: This paper proposes a novel methodology to visualize a classification scheme. It is demonstrated with the Asso-ciation for Computing Machinery (ACM) Computing Classification System (CCS). The collection derived from the ACM digital library, containing 37,543 documents classified by CCS. The assigned classes, subject descriptors, and keywords were processed in a dataset to produce a graphical representation of the documents. The general conception is based on the similar-ity of co-classes (themes) proportional to the number of common publications. The final number of all possible classes and subclasses in the collection was 353 and therefore the similarity matrix of co-classes had the same dimension. A spherical sur-face was chosen as the target information space. Classes and documents’ node locations on the sphere were obtained by means of Multidimensional Scaling coordinates. By representing the surface on a plane like a map projection, it is possible to analyze the visualization layout. The graphical patterns were organized in some colour clusters. For evaluation of given visualization maps, graphics filtering was applied. This proposed method can be very useful in interdisciplinary research fields. It allows for a great amount of heterogeneous information to be conveyed in a compact display, including topics, relationships among topics, frequency of occurrence, importance and changes of these properties over time. 1.0 Introduction With the exponential growth of Internet resources, it has become more and more difficult to find relevant information from one hand and organize professional information services from the other. From this per-spective, this article will focus on the visual analysis and evaluation of a classification system in Computer Science (CS) which has evolved into a very dynamic domain. New computer technology branches emerge, some of them split into smaller ones, while other sub-fields of the CS domain have disappeared. Professional CS classifications are nowadays challenged by rapid changes in taxonomy and users needs to retrieve rele-vant information. To strengthen research in this area there is a need to build upon innovative efforts of in-formation visualization (Infovis), computer, and li-brary scientists.

While library resources are continuously extended, LIS researchers may use new tools derived from In-fovis methodologies in order to support collection management. Visualization of the complex structures of large amounts of information may help in under-standing relations between components and visually searching relevant information. Thus visualization be-came a phase of data analysis. In the last decades, In-fovis projects have specialized in both LIS and medical data representation tasks (Börner 2003, Chen 2006, Kosara and Miksch 2002). The main aim of the pre-sented work is to visualize the chosen classification scheme and its universe. In our opinion, only one publication presents an effort to visualize classifica-tion. For example, treemap visualization of a specific library collection is performed to facilitate document retrieval in bibliographic collections (Pfeffer et al. 2008).

Page 64: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

300

First, modern techniques of a hierarchy mapping are introduced. Some of them inspired the author to make the final conception of the information space. The results encouraged researchers to find new ex-perimental problems and the ways of solving them. The primary task – visualization – developed further into several other tasks with more specific interests such as: classification scheme evaluation, domain evo-lution, and documents retrieval. Results of the ex-periments show that librarians may use proposed methods in classification modernization, evaluation, and analyzing, as well as in studying the scientific do-main organization. 2.0 Hierarchical representation of the structures 2.1 Trees A natural way to present the hierarchical nature of data structure is a tree. The starting element, root, is usually positioned on top. The names of relationships between nodes are modelled after kinship relations. A node is a parent of another node if it is one level higher than subordinate nodes, children. Sibling nodes share the same parent node. Tree diagrams impose lin-ear order, vertical direction (Figure 1a). In a tree struc-ture, information disseminates one way: from parents to children and vice versa. Hierarchical information is the most frequent type of data occurring in the human environment. Such hierarchy exists in library classifi-cation systems, genotype systems, genealogy data, as well as computer directory structures and object-oriented programming languages class definitions. If paths between siblings became available, a tree struc-ture evolves in the net. E-book texts with hyperlinks of chapters can be an example of such type of infor-mation space.

Traditional library classifications are presented in deductive, top-down schemes with a set of mutually exclusive classes (Jacob 2004). Exclusivity means that a given entity must be assigned to one and only one class within a system of mutually exclusive and non-overlapping classes. The top class is the most inclusive class and depicts the domain of the classification. Be-ing a system of classes and subclasses, a classification is organized according to predetermined and essential properties of a set of entities. Construction of the scheme involves the logical process of division and subdivision of the original universe. In consequence, the hierarchical tree of generic relationships is formed. Within superordinate classes, more or less subordinate classes are nested. To simplify the task of classification

visualization, it is convenient to limit it to its mono-hierarchical structure; this is the case with a classifica-tion universe that encompasses only one hierarchy tree.

Kwasnik (1999) described the browsing of a classi-fication scheme in the following way: "[it] involves moving down the hierarchy, from superordinate to subordinate and from left to right, to generate a series of relationships between classes that can be translated into the linear order of the library shelf." This feature of linearity, amongst exclusivity, aggregation, and infi-nite hospitality, is identified as a characteristic of a bib-liographic classification scheme (Shera 1965). Librari-ans appreciate a tree as a way of representation of the relative placement of the entities because of its good local visibility (child nodes frequency). On the other hand, some disadvantages are associated with this form of knowledge representation that are specific to library classifications (Kwasnik 1999):

1) Lack of flexibility in adding new entities and coping with new knowledge emergence. This often requires changing the general shape of the tree, which is determined a priori; 2) Partial inference: trees are limited in the rep-resentation of knowledge volume; 3) Only vertical direction of information dis-semination, therefore imposing the same un-derstanding on individuals; 4) Selective perspective: by emphasizing a cer-tain relationship, a tree masks other equally in-teresting relationships.

The distinct methods to display hierarchy structures will be discussed below. Tree mapping inspired many Infovis researchers, HCI (Human-Computer Inter-action) experts and strictly commercial data mining applications designers (Börner 2002). 2.2 TreeMaps The main distinguishing feature of a treemap tech-nology relies on unlimited recursive construction of nested geometric primitives: rectangles, circles, arcs and so forth – thus mosaic plots can be created. This property allows a final layout to be extended to hier-archical data with any number of levels. This idea was invented by Ben Shneiderman (1998-2009) in the early 1990s “in response to the common problem of a filled hard disk.… Since the 80 Megabyte hard disk in the HCIL was shared by 14 users it was diffi-cult to determine how and where space was used.

Page 65: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

301

Finding large files that could be deleted, or even de-termining which users consumed the largest shares of disk space were difficult tasks.” According to treemap algorithms, one must divide an original rec-tangle (or another shape) space into sub-rectangles as many times as number of levels in the structure. This technique sometimes is called tiling or puzzling. Sub-rectangles have an area proportional to a speci-fied dimension of data, usually size or population of nodes (Figure 1b). Colour is used to separate a type of data (for example electronic files format in direc-tories trees). While the colour and size dimensions

are correlated in this way with the tree structure, the occurred pattern can reveal interesting properties of data. A second advantage of treemaps is an efficient use of space: it is possible to legibly display thou-sands of items on the screen simultaneously. Tree-maps initiated an entire software generation serving for visualization of large datasets. Some of them are open source or demo versions such as the Treemap 4.0 tool designed by the HCIL at the University of Maryland (HCIL 2003).

If one replaces rectangles by circles, the circle tree-map will arise (Figure 1c Sunburst mapping which also

Figure 1 Graphical representations of hierarchies: a) traditional – as tree; b) rectan-gle treemap; c) circle treemap; d) sunburst map; e) in the hyperbolic space.

Page 66: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

302

exploits circles, is shown in Figure 1d. The root node is located at the center, each successive level drawing farther out from center. An infinity of hierarchy levels can be represented. Size and a type of data are identi-fied by the sweep angle and colour of the item, respec-tively.

An important technique to magnify an explo-ration space is workspace construction in hyper-bolic 3D geometry. The first applications that used “fisheye technique” (called also “focus+ context”) are hyperbolic browsers – this ap-proach assures more place to visualize the hierar-chy (Chen 2006). This leads to the convenient property that the circumference of a circle grows exponentially with its radius, which means that exponentially more space is available with in-creasing distance (Lamping et al. 1995). It is pos-sible to study the hyperbolic view of various types of data and construct a graph upon samples using online available applications such as ], Hy-perbolic 3D (Munzner 1998) or Walrus (CAIDA 2005-2009). Figure 1e illustrates such an hyper-bolical treemap.

3D visualization is very promising because of the continuously growing potential of hardware. Due to current users’ requirements, 3D models are standard in any type of computer games, simulation, and movies. The above mentioned shapes and properties of treemaps, especially sphere determination of an information space, were used in the conception of the classification visualization being presented. 3.0 Visualized Classification The tested classification, Computing Classification System (ACM 1998), is a subject classification for computer science devised by the Association for Computing Machinery, the first scientific and educa-tional computing society in the world. The last ver-sion of CCS was published in 1998 and is still being updated.. The ACM digital library is a vast collection of citations and full text (accessible for members) from ACM journals, newsletter articles, and confer-ence proceedings. Citations consist of a title, author, publication data, abstracts, references, symbols from CCS, and other metadata.

The ACM CCS consists of a four-level tree [con-taining three levels coded by 11 capital letters (from A to K) and numbers plus a fourth uncoded level], General Terms, and implicit subject descriptors. Thus, the upper level consists of 11 main classes:

A. General Literature B. Hardware C. Computer Systems Organization D. Software E. Data F. Theory of Computation G. Mathematics of Computing H. Information Systems I. Computing Methodologies J. Computer Applications K. Computing Milieux

Each top-level category has two standard subcatego-ries: “general”, coded with “0”, and “miscellaneous”, coded with “m”. For instance, H.0 denotes the “gen-eral” subcategory of Information Systems, while H.m describes its miscellaneous subcategory. CCS is still being updated and therefore new subdivisions appear with the “New” label while some of the exist-ing categories are marked as “Revised”. Besides a primary classification, every document may be as-signed additional ones; as a result, two or more clas-sification trees will be generated. Thus, for example, the book Semantic Digital Libraries, by S.R. Kruk and B. McDaniel, Springer, 2008, will have the fol-lowing classifications:

Example: Primary Classification: H. Information Systems H.3 INFORMATION STORAGE AND RE-

TRIEVAL H.3.7 Digital Libraries

Additional Classification:

A. General Literature

A.m MISCELLANEOUS

I. Computing Methodologies

I.2 ARTIFICIAL INTELLI-GENCE

I.2.4 Knowledge Representa-tion

Formalisms and Methods

Subjects: Semantic networks

4.0 Methodology Our previous articles describe in detail the construc-tion of a new graphical representation of an original classification scheme (Osinska and Bala 2008, 2009). Metadata of articles published in 2007 were com-

Page 67: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

303

pleted and processed using ACM Digital Library. The key feature in the method presented was the ex-clusiveness of CCS classification. Therefore, over-lapping classes and subclasses will appear simultane-ously among document's citation attributes (as in the above example). According to the author's assump-tion, these “common” articles must decide on the semantic similarity of thematic categories of classifi-cation. The main idea consisted in estimating of co-occurrences of classes, i.e., counting of common do-cuments for every pair of classes and subclasses. Such similarity of co-classes (themes) is propor-tional to the number of common publications. The final number of all possible classes and subclasses in the collection was 353. Similarity matrix of co-classes had the same dimension. In order to decrease such a high dimension, MDS (Multidimensional Scaling) 3D plot was used. A sphere surface was cho-sen as target information space because of its sym-metry, curved surface, and ergonomic feature (Osin-ska and Bala 2009). Classes and document nodes lo-cations were mapped on a sphere by means of Multi-dimensional Scaling (MDS) coordinates. 5.0 Mapping results By projecting a sphere surface on a plane according to cartographic rules, it is possible to analyze the visuali-zation layout of classes and items nodes. Moreover, nonlinear digital filtering can be applied to given pat-tern of 2D maps (see Figure 2a, b). After evaluating the Computer Science domain evolution by means of longitudinal maps, i.e., a series of chronologically se-quential maps (Garfield 1994), a novel technique de-rived from fractal theory was successfully used.

5.1 Classes and documents visualization Figure 2 presents the resulting visualization layouts on sphere surface (a) and its projection on a plane (b). There are 3 attributes: colour, luminosity of col-our and size of node were used to indicate main classes, subclasses level and classes population re-spectively. The documents (37,543) inherit the col-our of the main class, therefore the final patterns shown on Figure 2 consist of 11 colourful irregular spots.

Finally, non-linear graphic filtering techniques were applied to remove noise and detect cluster edges, me-dian and contour filters used sequentially. These algo-rithms enabled access to essential information about the main classes’ frontiers and mutually related fields as well as the study of thematic diversity (clusters of some classes are shown in Figure 3). 5.2 Keywords mapping In the next stage of the research, such attributes of documents as keywords were used. Within each given cluster, statistical ranking of keywords was performed. Figure 3 illustrates clusters of class I and the keyword sets that characterized them. Analyzing each cluster in this way, it is possible to build a semantic map of all classified documents based on the keyword set.

It is worth comparing the first map of classes’ themes with the second ( keywords) in respect to se-mantic conformity. Previous works report about local accuracy of tested maps, that means paradigmatic and intuitive comprehension of themes (Osinska and Bala 2009). This issue will be considered more in detail be-low, in the Discussion section..

Figure 2. a) Class and document nodes visualization on a sphere surface; b) cartographic projection of previous layout

Page 68: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

304

5.3 Longitudinal Mapping Using series of chronologically sequential maps, one can study how knowledge advances and knowledge organization change. The term longitudinal mapping was first introduced by Garfield (1994, 1998) to de-scribe this method of domain analysis . He empha-sized that longitudinal maps become forecasting tools because main trends can be detected by observ-ing changes from year to year.

A set of visualization maps of ACM documents published every ten years was prepared. This ap-proach should reveal essential changes in Computer Science (CS) literature within the time frame of the Computing Classification System existence. On the basis of these results and computing history expert knowledge the inference about classification evolu-tion, CS integration with other domains and future trends is available. Tested collections amount to 209, 545, 19,950, 27,149, and 37,543 classified documents from 1968, 1978, 1988, 1998, and 2007, respectively. The first two layouts present small quantity of nodes without significant patterns and can be omitted in further analysis.

Information Systems is a category from which the CCS scheme started to grow. Since 1998, arrangement of classes such as B. Software, C. Computer Systems Organization, and H. Information Systems are detect-able. It is possible to conclude from three visualization layouts (Figures 1b, 4a,b) that the time when clusteri-zation started relates to the 90s. This means that the two last decades provide a close adaptation of the CCS scheme to the ACM digital library resources. 6.0 Discussion ACM digital library editors continuously make cor-rections to the CCS scheme. They are responsible for the timeliness of the updating of the classifica-tion tree, aligning it with the dynamics of computer technologies. The authors of articles are well-acquainted with the Computer Science domain, both in practice and theoretical terms. The ACM website provides detailed instructions to authors about how to classify their documents (ACM 2010). They have to add keywords and to describe the documents’ main and additional categories as well as apply sub-ject descriptors. ACM editors can correct the classi-

Figure 3. Map of keywords within 5 clusters of main class I.

Figure 4. Visualization maps of classified documents from ACM digital library published in: a) 1988 b) 1998

Page 69: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

305

fication assignment and make final decisions about the trees’ topology. It should be mentioned that the characteristics of the keywords are an effect of the author's competence and exactness. While the pri-mary visualization maps were based on class correla-tions, the keywords maps were constructed by means of keywords. The latter became the way to verify the first graphical layout. Therefore, as these independent knowledge paths are confronted, two separate social structures can be identified as a mod-ern approach to domain analysis (Hjørland 2002).

Another important feature of the map resulting from clusterization is the arrangement by colour. With this visualization process the original taxon-omy was discarded. All document nodes inherit col-our from main classes so that they form clusters which present only one hierarchical level. The reduc-tion of the structure hierarchy from three to one was noted. If the outcome of clusterization reflects the logical categorization of modern Computer Science literature then the CCS scheme will not need so many levels of structure. Consequently, the coverage of thematic-semantic categories within the clusters on the visualization map can inform about the qual-ity of the organization of the input classification.

The formal analogy of the resulting clusterization with faceted classification will be considered below. Faceting classification has been a major development in current library research, especially regarding In-formation Retrieval tasks (Mills 2004). Analytico-synthetic classification systems are inductive, bot-tom-up schemes generated through a process of analysis and synthesis (Jacob 2004). A facet classifi-cation comprises logically defined, mutually exclu-sive, and collectively exhaustive aspects, properties or characteristics of a class or specific subject (Taylor 2006). In faceted systems, instead of pre-determined, taxonomic order there are multiple ways of classifi-cation information assignment.

The first process in the present research was statis-tical analysis of co-(sub)classes of initial classification. Clusterization is made on the basis of graphical repre-sentation and can be considered as the next step, syn-thesis of clusters. In traditional faceted classification (Adkisson 2003) analysis provides breaking down subjects into basic concept (semantic analysis) and synthesis – functional categorization. The present case shows the opposite method: while the synthesis is of a semantic nature, the analysis explored the configura-tion of nonlinear features of the original classification scheme, as the primary units of analysis – (sub)classes symbols – relate to themes and areas of scientific re-

search. The resulting thematico-semantic clusters can be considered as final multi-aspect facets with dy-namical parameters such as number of data points, density, size and foremost keyword sets.

Original taxonomical classifications impose a ver-tical flow of information and thus provide a top-down exploration of the structure. Faceted classifica-tions, used in faceted search systems, enable users to browse data along multiple paths corresponding to different sorting of the facets (Taylor 2006). Simi-larly, the resulting information space allows the re-trieval of similar documents in neighbouring loca-tions, irrespective of navigation directions and pri-mary hierarchy of categories. 7.0 Conclusion This work presented a novel visualization method of Computing Classification System (CCS) and its clas-sified universe consisting of a large body of scientific literature in the Computer Science domain. An analy-sis of the initial classification scheme by independent thematic categories was proposed. The basic feature of the original information space transformation into clusters relies in the reduction of the hierarchy. It is noted that one level structure is sufficient to present a logical division of the Computer Science literature in a graphical way. Coverage of thematic-semantic catego-ries within the clusters on the visualization map can report the quality of the organization of input classifi-cation. As a result, the local accuracy within the clus-ters of visualization maps was observed. Citations gathering and data processing were repeated for arti-cles published in the years 1968, 1978, 1988, 1998, and 2007. The longitudinal mapping allows the discovery of the structure of knowledge within the CS domain as well as the social patterns of its scientific output.

With the proposed visualization method, librari-ans could depict the organization of the contempo-rary knowledge domain, investigate multidisciplinary fronts of research and predict future trends. The au-thor demonstrated its usefulness in LIS problems such as evaluation of classification schemes and their further improvement. The method can be functional in automatic classification tasks (Golub 2006) as well as, for example, in automatic generation and updat-ing of classification trees. Scientists from interdisci-plinary research fields will be able to make full use of the multidimensional navigation space. The approach described allows for large amounts of heterogeneous information and multidimensional data to be con-veyed in a compact display as well as for the retrieval

Page 70: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 V. Osińska. Visual Analysis of Classification Scheme

306

of data by topics, relationships among topics, fre-quency of occurrence, and relevance and changes of these properties. References ACM. 1998. ACM computing classification system. As-

sociation for Computing Machinery. Available http:// www.acm.org/about/class/1998.

ACM. 2010. How to use the Computing classification system. Available http://www.acm.org/about/class/ how-to-use.

Adkisson, Heidi P. 2005. Use of faceted classification. Web design practices. Available www.webdesign practices.com/navigation/facets.html.

Börner, Katy et al. 2003. Visualizing knowledge do-mains. In Cronin, Blaise, ed., Annual Review of Information Science & Technology 5. Medford, NJ: Information Today, pp. 179-255.

Browse maps. Places@Spaces: Mapping Science. Avai-lable http://scimaps.org/maps/browse/

CAIDA (2005-2009). Walrus – Graph visualization tool. The Cooperative Association for Internet Data Analysis. Cooperative Association for Internet Data Analysis. Available http://www.caida.org/tools/ visualization/walrus/

Chen, Chaomei. 2006. Information visualization: Be-yond the horizon, 2nd ed. London: Springer.

Garfield, Eugene. 1994. Scientography: Mapping the tracks of science. Current contents: social & behav-ioural sciences 7n45: 5-10.

Garfield, Eugene. Since 1998. Essays / Papers on “Mapping the World of Science”. Available http:// garfield.library.upenn.edu/mapping/mapping.html.

Golub, Koraljka. 2006. Automated subject classifica-tion of textual web documents. Journal of docu-mentation 62: 350-71.

HCIL. 2003. Treemap. University of Maryland. Hu-man-Computer Interaction Lab. Available: http:// www.cs.umd.edu/hcil/treemap/

Hjørland, Birger. 2002. Domain analysis in informa-tion science: eleven approaches – traditional as in-novative. Journal of documentation 58: 422-62.

Jacob, Elin K. 2004. Classification and categorization: a difference that makes a difference. Library trends 52n3: 515-40. Available http://findarticles.com/p/ articles/mi_m1387/is_3_52/ai_n6080402/

Kosara,Robert and Miksch, Silvia. 2002. Visualization methods for data analysis and planning in medical applications. International journal of medical infor-matics 68n1-3: 141-53.

Kwasnik, Barbara H. 1999. The role of classification in knowledge representation and discovery. Library trends 48n1: 22-47. Available: http://findarticles. com/p/articles/mi_m1387/is_1_48/ai_57046525/

Lamping, John et al. 1994. Laying out visualizing large trees using a hyperbolic Space. In Proceedings of the ACM Symposium on User Interface Software and Technology, 1994, pp. 13-14.

Mills, Jack. 2004. Faceted classification and logical division in information retrieval. Library Trends 52n3: 541-570. Available http://findarticles.com/ p/articles/mi_m1387/is_3_52/ai_n6080403/.

Munzner, Tamar. 1998. Exploring large graphs in 3d hyperbolic space. IEEE computer graphics and ap-plications 18n4: 18-23. Available http://graphics. stanford.edu/papers/h3cga/

Osińska, Veslava and Bala, Piotr. 2008. Classification visualization across mapping on a sphere. In: New trends of multimedia and network information sys-tems. Amsterdam: IOS Press, pp. 95-107.

Osińska, Veslava and Bala, Piotr. 2009. Nonlinear ap-proach in classification visualization and evaluation. In: New perspectives for the dissemination and or-ganization of knowledge: Proceedings of the IX Spain Group ISKO Congress 11-13 March Valencia, Spain. pp. 222-31. Available http://dialnet.unirioja.es/serv let/fichero_articulo?codigo=2923178&orden=0

Pfeffer, Magnus et al. 2008. Visual analysis of classi-fication systems and library collections source. In Proceedings of the 12th European conference on Re-search and Advanced Technology for Digital Librar-ies Lecture Notes In Computer Science, vol. 5173. Berlin; Heidelberg, Springer-Verlag, pp. 436-39.

Randelshofer, Werner. Visualization of large tree structures. Available http://www.randelshofer.ch/ treeviz/index.html.

Scneiderman, Ben. 1998-2009. Treemaps for space con-strained visualization of hierarchies. Last updated June 25th, 2009 by Catherine Plaisant. Available http://www.cs.umd.edu/hcil/treemap-history/

Shera, J. H. 1965. Classification as the basis of bib-liographic organization. In Libraries and the or-ganization of knowledge. Hamden, CT: Archon.

Taylor, Arlene G. 2006. Introduction to cataloging and classification. 8th ed. Englewood, Colorado: Libraries Unlimited.

All URLs are last checked in February 2010.

Page 71: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

307

UDC and Folksonomies

Alenka Šauperl

Faculty of Arts, Department of Library and Information Science and Book Studies, University of Ljubljana, Aškerčeva c. 2,1000 Ljubljana, Slovenia

<[email protected]>

Alenka Šauperl is an Associate Professor in the Department of Library and Information Science and Book Studies at the Faculty of Arts, University of Ljubljana. Her teaching and research areas are in the organization of information, including descriptive and subject cataloguing, as well as abstracting. She is the author of several books and many articles in the field of cataloguing and subject indexing.

Šauperl, Alenka. UDC and Folksonomies. Knowledge Organization, 37(4), 307-317. 28 references. ABSTRACT: Social tagging systems, known as “folksonomies,” represent an important part of web resource discovery as they enable free and unrestricted browsing through information space. Folkso-nomies consisting of subject designators (tags) assigned by users, however, have one important drawback: they do not express semantic relationships, either hierarchical or associative, between tags. As a consequence, the use of tags to browse information resources requires moving from one resource to another, based on coincidence and not on the pre-established meaningful or logical connections that may exist between related resources. We suggest that the semantic structure of the Universal Decimal Classification (UDC) may be used in complementing and supporting tag-based browsing. In this work, two specific questions were investigated: 1) Are terms used as tags in folksonomies included in the UDC?; and, 2) Which facets of UDC match the characteristics of documents or information objects that are tagged in folksonomies? A collection of the most popular tags from Amazon, LibraryThing, Delicious, and 43Things was investigated. The universal nature of UDC was examined through the universality of topics and facets covering diverse human interests which are at the same time interconnected and form a rich and intricate semantic structure. The results suggest that UDC-supported folksonomies could be implemented in resource discovery, in particular in library portals and catalogues. 1.0 Introduction Folksonomy is a form of indexing system. People par-ticipating in blogs, social networks, and other shared Web 2.0 systems and services assign tags to different information objects. When these tags are grouped, counted, and used automatically for browsing and searching, a folksonomy is formed. According to Vander Wal (2007) and Smith (2008), the first social tagging system, Del.icio.us, emerged in 2003, and the term appeared in 2004. It quickly attracted the atten-tion of researchers in information and library science. A brief look at LISTA reveals that in 2005, six docu-ments were indexed on that topic. There were 20 documents in 2006, 35 in 2007, 23 in 2008, and seven in the first half of 2009. Folksonomies are popular among Web users, allowing them to tag documents, which involves assigning keywords to resources they find on the Web or submit themselves. This enables them to retrieve documents they have already accessed

or find new documents other users have tagged. Browsing is aided by tag clouds, i.e., groups of tags, sorted alphabetically, and presented by size - their size expresses the frequency of their use (see Fig. 1).

Some researchers (Mathes 2004; Noruzi 2006; Munk and Mørk, 2007; Spiteri 2007; Smith 2008) have welcomed folksonomies. Folksonomies allow for the spontaneous, quick and easy assignment of terms, which then serve as search and browse entries. They are democratic because everyone is able to join and contribute. There is no central authority or hierarchy that edits tags, censors, or supervises the system. Us-ers are free to assign any term they choose or make up tags; there is no incorrect tag. Yet this plural system also allows individual user to develop his or her own coding system and use it, disregarding other users, if s/he so chooses. In some systems, the user can keep his or her tags private or make them publicly available. As a consequence, tags can display a rich and contem-porary vocabulary.

Page 72: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

308

The same researchers (Mathes 2004; Noruzi 2006; Munk and Mørk 2007; Spiteri 2007) have also pointed out some drawbacks in folksonomies, one being that they have no terminology control. They do not man-age synonyms and homonyms, and they allow singular and plural forms, as well as different spellings, includ-ing intentional or unintentional misspellings. Users can create new terms that may not be understood by the general public. Such terms may quickly become obsolete. Folksonomies do not support any semantic structure among terms. Relationships among terms have to be inferred by individual users. According to Smith (2008) and Steele (2009), this is actually a strength compared to traditional indexing systems. Until recently, many folksonomies did not allow the use of compound terms (or phrases). Searching or browsing was also only possible using one tag or term (not a combination of two or more). There is no limit to the number of tags assigned to one document and their specificity; this can lead to great inconsistencies. Because folksonomies rank tags and documents by popularity, less popular tags (and topics) may be hid-den or difficult to find. Additionally, a few tags get used frequently and many just a few times (Zipf Law) (Halpin, Robu, and Shepherd 2007).

Searching and browsing are very different opera-tions, particularly when performed in bibliographic databases of scientific and professional documents. Researchers need comprehensive and preferably com-plete knowledge of publications in their research area. Searching or browsing by folksonomies in a database when looking for articles supporting one’s research is likely to be unreliable, because different and unrelated tags would most probably be used to mark relevant documents. While we have problems with indexing consistency in bibliographic databases, there is at least consistency in terminology. One particular term is consistently used to name a particular concept, as op-posed to the many tags inherent in folksonomies. It may be different when one looks for information and does not require comprehensiveness, quality, and reli-ability. In such cases, any information on the topic

may be interesting and useful. Browsing by following hyperlinks may actually bring the serendipitous dis-covery of highly relevant information.

Advocates of folksonomies believe that social tag-ging is a useful and more convenient alternative to traditional indexing systems (Furner 2007; Hayman and Lothian 2007). In fact, many libraries have already included social tagging in their library catalogues (e.g., Ann Arbor District Library from Michigan, US at http://www.aadl.org/catalog). This strategy is mainly aimed at attracting users to the library catalogue. It is not likely that users would be any more satisfied with this new search or browse process compared to results that can be obtained from the Web. It is natural to think that keeping track of the tags one has assigned is easier than following traditional indexing systems in a library catalogue. However, this is only true if these tags are few in number, and the user has enough time to navigate the system. We may anticipate that users will have difficulties when tags become too numerous. Over a longer period of time, users are likely to forget terms they have selected and assigned to similar documents (Iivonen (1990) or Olson & Wolfram (2008)). Frustration and dissatisfaction may cause us-ers to abandon social tagging systems. This could be prevented by traditional indexing systems supporting folksonomies, an idea also presented by Binding and Tudhope at the UDC Seminar in The Hague in Octo-ber 2009, Hayman and Lothian at the IFLA confer-ence in 2007, and Kwan at the annual meeting of the American Society for Information Science and Tech-nology in 2008. Kwan showed how Library of Con-gress Subject Headings could be transformed to sup-port folksonomies. Our question was related: we wanted to know whether the semantic structure of the UDC could be used in complementing and support-ing tag-based browsing.

Students of indexing at the Department of Library and Information Science and Book Studies, Faculty of Arts at the University of Ljubljana have examined folksonomies. They have confirmed the strengths and weaknesses of folksonomies that are highlighted

Figure 1. Tag cloud for Cold Mountain by Charles Frazier in Library Thing (accessed on 2009-07-07 from http://www.librarything.com/work/2421)

Page 73: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

309

above. They have also seen that most of the concepts are present in the UDC (Demšar et al. 2009; Matoh and Koželj 2009). This paper brings their research a step further. Two specific questions were investigated: 1) Are the terms used as tags in folksonomies in-

cluded in the UDC?; and, 2) Which characteristics of documents or information

objects that are tagged in folksonomies compared to those that can be expressed in the UDC?

A collection of the tags from Amazon (www.amazon. com), Library Thing (www.librarything.com), Del. icio.us (delicious.com), and 43 Things (www.43things. com) was investigated. The universal nature of the UDC was examined through the universality of topics and attributes covering diverse human interests, which are at the same time interconnected and form a rich and intricate semantic structure. 2.0 A Closer Look at Four Folksonomies Our selection of websites offering folksonomies and tags was not random. How does one develop a ran-dom sample from these constantly changing systems of unknown size? The change is constant because us-ers keep adding tags. For example, in 43 Things, 139,786 people have made 324,109 resolutions and added an unknown total number of tags. On the other hand, the change is constant because tags get displayed according to the user’s browsing strategy. Even most popular tags change in time because of different events (e.g., tags for Michael Jackson peaked after his death in Del.icio.us) (Del.icio.us trend graphs, 2010). We therefore decided for a purposeful sample of an unde-fined population of tags in just four of uncountable social tagging systems on the Web. For obvious rea-sons (this paper is presented in English), the selected systems are in English language.

The site Del.icio.us is a Web service that allows us-ers to save, share, and organize their favourite book-marks, i.e., addresses of web resources. This site was selected because web resources are not traditional li-brary resources, even if they could potentially be so. The top 197 most popular tags from Del.icio.us were selected for analysis in June 2009. Further, 43 Things’ top 198 tags were also analysed. This is a site that pro-vides for users to note and share personal goals. It was selected on purpose to see how well the UDC em-braces concepts for non-library materials. Another site we examined, Library Thing, is intended for users to catalogue books and similar traditional library materi-

als. Three works were selected for analysis: Cold Mountain, a novel written by Charles Frazier; The Lit-tle Mermaid by the brothers Grimm; and The Sound of Music by Maria von Trapp. In each case, tags associ-ated with the books, sound recordings, and movies were analysed (173 tags). These particular works were selected for two reasons: their popularity in Western culture and their existence in all three forms. The same sample was also analysed in Amazon, the popular Web bookstore (471 tags). Both the Library Thing and Amazon samples included items that would usually be considered library material. We wanted to see whether the UDC covers all the concepts users express in tags and what kind of document attributes get expressed in tags.

Our content analysis (Neuendorf, 2002; Lincoln and Guba, 1985) consisted of categorizing tags. Some categories were expected and prepared in advance (such as place, time, genre etc.). These expected cate-gories were based on the disciplines expressed by the main UDC numbers and groups of auxiliary numbers. Other categories were new (e.g., accessibility, instru-ment, experience). New categories emerged during the analysis. The Slovenian translation of the UDC Mas-ter Reference File (MRF) 2006 was used to identify the appropriate UDC numbers for concepts expressed in tags. Spiteri (2007) performed a similar analysis of tags. Her categorization was based on the seven types of concepts listed in the NISO guidelines for thesau-rus construction. These categories are: things, materi-als, activities, events, properties, disciplines, and meas-ures. These categories are certainly valid; however, for the purpose of our research question, they were not helpful because they do not distinguish usual facets of the UDC - the topic, place, time, etc.

Mapping folksonomies to the UDC posed prob-lems similar to those faced at other mapping projects. Some concepts matched in both systems one-to-one. In other cases, the concept from a folksonomy ap-peared in several classes of UDC. In such cases, the first class was selected. We were able to understand the concepts from the context, and some unknown con-cepts were checked by exploring the particular tag in the folksonomy. It is likely that fewer concepts would be found in an automatic matching process. 2.1 Presence of Concepts in Folksonomies and UDC We expected that more concepts from Amazon and Library thing would be found in UDC and fewer from Del.icio.us and 43 Things. We expected that terms in 43 Things would be present less frequently in the

Page 74: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

310

UDC because the site focuses on people’s wishes and plans, not on library materials. We expected the oppo-site from Library Thing and Amazon because we se-lected traditional library materials for their parts of the study sample. Our expectations proved wrong. More concepts represented in tags of Del.icio.us and 43 Things were present in UDC than concepts repre-sented in Amazon and Library Thing (see Fig. 2). Among the terms not included in the UDC were a large number of names (further discussed in chapter 2.2). These could in fact be expressed in the UDC when associated with a main number. The share of tags we were unable to understand (such as “ada,” “e” or “3844) was not very large in any of the analysed systems - the maximum was 6% in Library Thing and” 4% in Amazon, compared to 8% in Amazon reported by Demšar et al. (2009).

The largest number of tags from Del.icio.us could be found in the UDC class 0, followed by classes 3, 6, and 7 (see Table 1). Auxiliary numbers held the third rank. Most tags from this sample were in the area of computer science. This was also observed by Spiteri (2007). In 43 Things, the UDC class 6 con-tained the largest number of tags. It was closely fol-lowed by classes 3 and 7. Class 1 and 6 held the third rank. People seem to be mostly interested in health, arts, and topics of a social and ethical nature. Thir-teen cases (7%) could be expressed by auxiliary numbers (such as time or place). Most tags from the Library Thing sample could be represented in the UDC class 7. Excluding concepts not found in UDC, auxiliary numbers rank second. Class 8 is ranked third, which mostly expresses genre. It was followed by classes 0 and 9. Cold Mountain and The

Sound of Music are works concerned with historical topics, namely the Civil War and World War II. Therefore, the association with class 9 is not surpris-ing. Class 0 represents reading (tags like ‘reading’ or ‘to read’) and medium (tags such as ‘CD’ or ‘DVD’). Most tags in Library Thing were assigned to the books and movies categories, with the least number assigned to soundtracks. Tags assigned to the book frequently referred to the movie.

Excluding the leading category of concepts not represented in UDC, tags in Amazon were most fre-quently placed in class 7, expressing music and film. Auxiliary numbers were the second largest group of tags. They express the time or place of the story. The historical nature of two works (Cold Mountain and The Sound of Music) is expressed with tags in class 9 and is ranked third. Class 8 ties with 9. The large number of non-present concepts is due to the nu-merous names assigned as tags to these works. They are either names of authors, performers (artists), or literary characters. These (118 of 471 tags) could in fact be expressed in the UDC numbers if associated with a main number. 2.2 Nature of Concepts Expressed in Tags It was mentioned above that names were frequently assigned as tags for books, movies, and soundtracks (see Fig. 3). Most frequently they were actors’ names (e.g., Julie Andrews), followed by owners’ names (Martha dvd movie collection), authors (Musker), literary characters (Ariel), names of people that were the topic of the work (biographical treatment of the Von Trapp family), and trade name (Amazon). It is

Figure 2. Percentage of concepts from Amazon (n=471), Library Thing (n=173), 43 Things

(n=198) and Del.icio.us (n=197), present in UDC

Page 75: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

311

interesting that names of actors in the film version of the story were assigned to books (e.g., Julie An-drews to the book The Sound of Music). Ownership was also expressed by pronouns (my dvd), verbs (own), or phrases (never got it). These names ap-peared in Library Thing and Amazon. In Del.icio.us, software names appeared (Linux), while in 43 Things, only three actors and four trade names ap-pear among the most popular tags. Such names can be important for subject description in library cata-logues and UDC, and some other indexing languages have provisions for including them. Curiously, the title of the tagged work was repeated among tags for the same work in about 10% of cases in Amazon and

Library Thing. Titles of other works were also men-tioned in Amazon and Library Thing, establishing a connection within the bibliographic universe. They may also be expressed in UDC, but only if they are the topic of discussion in the document. Otherwise, the relationship can only be expressed in the area of notes in bibliographic description. It seems that some support for Functional requirements for biblio-graphic records (IFLA 1999) could also be found among tags users assigned to these documents.

In total, topics were most frequently expressed by tags in the sample (see Table 2). In Del.icio.us, tags like “programming,” “art,” “food” or “education” represent topics. In 43 Things, tags like “career,” “education,”

Amazon Library Thing 43 Things Delicious Total UDC class n % n % n % n % n % 0 6 1 6 3 21 11 84 43 117 11 1 0 0 2 1 24 12 2 1 28 3 2 0 0 0 0 10 5 1 0 11 1 3 4 1 0 0 38 16 21 11 63 6 5 5 1 1 0 3 2 3 1 12 1 6 0 0 5 0 31 17 21 11 57 5 7 79 17 32 46 32 16 22 11 165 16 8 16 3 14 8 5 3 4 2 39 4 9 15 3 9 5 4 2 3 1 31 3 Auxiliary 32 7 27 16 13 7 17 9 89 9 No 297 63 66 38 17 9 17 9 397 38 Ambiguous 17 4 11 6 0 0 2 1 30 3 Total 471 100 173 100 198 100 197 100 1039 100

Table 1. Number of tags from four folksonomies and their occurrence rate as concepts in Universal Deci-mal Classification

Figure 3. Percentage of names present among all tags from Amazon (118 of 471 tags), Library Thing (26 of 173 tags), 43 Things (7 of 198 tags) and Del.icio.us (16 of 197 tags)

Page 76: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

312

“languages,” or “memories” appear. In Library Thing and Amazon, topics are similar because of the sam-ple. Tags, such as “romance,” “ocean,” “dance,” or “american civil war” represent topics in the two sys-tems. Names such as Charles Frasier or adjectives like American are rarely capitalized. Tags can also be misspelled (e.g., Hans Christian Anderson instead of Andersen). Both characteristics (misspelling and capitalization) have been noted as drawbacks of folk-sonomies by other researchers (Mathes 2004; Noruzi 2006; Munk and Mørk 2007; Spiteri 2007;

Steele 2009). Regardless of these drawbacks, the top-ics can frequently be expressed with the UDC.

While names rank second among most frequently expressed concepts in total, genre ranks third in total and is the second most frequently expressed attrib-ute among Library Thing, Amazon, and Del.icio.us tags, but only appears once in 43 Things. “Novel,” “fiction,” “biopic,” or “adventure” are examples of this attribute. “Folk music” could be perceived as the topic. But when it was assigned to the soundtrack of Cold Mountain, it described genre. All terms can be

Attribute Amazon Library Thing 43 Things Delicious TOTAL N % N % N % N % N % NAMES Artist 52 44 9 34 3 43 0 0 64 38 Author 25 21 8 31 0 0 0 0 33 20 unknown role of name 15 13 1 4 0 0 0 0 16 10 Title of work 14 12 3 11 0 0 0 0 17 10 Literary character 9 7 0 0 0 0 0 0 9 5 Biographical 1 1 2 8 0 0 0 0 3 2 Trade name 1 1 1 4 4 57 16 100 22 13 Owner 1 1 2 8 0 0 0 0 3 2 ALL NAMES 118 100 26 100 7 100 16 100 167 100

ALL CATEGORIES Name 118 25 26 15 7 3 16 8 167 16 Evaluation 78 17 6 3 0 0 5 2,5 89 9 Form 54 11 35 21 0 0 0 0 89 9 Topic 52 11 21 12 188 95 95 48 356 34 Genre 49 10 16 9 1 1 52 26 118 11 Not clear 17 4 10 6 0 0 22 11 49 5 Audience 16 3 9 5 0 0 1 0,5 26 2 Series/Collection 14 3 4 2 0 0 0 0 18 2 Related work 13 3 4 2 0 0 0 0 17 2 Plan/Action 9 2 7 4 0 0 1 0,5 17 2 Ownership - no name 9 2 4 2 0 0 0 0 13 1 Time 8 2 7 4 2 1 1 0,5 18 2 Gift 8 2 1 1 0 0 0 0 9 1 Place 6 1 9 5 0 0 1 0,5 16 1 Carrier 6 1 8 5 0 0 0 0 14 1 Award 5 1 3 2 0 0 0 0 8 1 Occasion 4 1 0 0 0 0 0 0 4 0,5 Experience 3 1 0 0 0 0 0 0 3 0,16 Instrument 2 0 0 0 0 0 0 0 2 0 Edition 0 0 3 2 0 0 0 0 3 0,16 Accessibility 0 0 0 0 0 0 3 2,5 3 0,16 TOTAL 471 100 173 100 198 100 197 100 1039 100

Table 2. Number of tags expressing different characteristics (attributes) of information objects from four folksonomies

Page 77: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

313

expressed by UDC. The form of the document is a closely related attribute. It is represented by terms, such as “soundtrack,” “movie,” or “news”. The third related attribute, carrier, expresses the technology supporting the media, e.g., “blue ray,” or the media itself, e.g., “DVD”. These two attributes may also be expressed by the UDC. Edition is expressed just a few times in Library Thing and Amazon tags. It is al-so related to the form of document and can refer to the movie version, as opposed to the book. These two codes were kept separate because, in some cases, the movie tag would appear among tags assigned to the book. In such a case, this denotes a different edi-tion of the same work, not the form of the informa-tion object it describes. The term edition is used much more loosely in this analysis and does not fol-low strictly the formal definition from ISBD.

Evaluation and form are attributes used with the same frequency, in total. Evaluation is expressed as “favourites,” “oh brother,” “singers who can’t sing,” “classic,” “best movies ever,” etc. It ranks second in Amazon, fourth in Del.icio.us, and ninth in Library Thing tags. This may be because Amazon sends an in-vitation to the customer to rate the item a week or so after the purchase. Awards are an attribute closely re-lated to the evaluation. In contrast, with the subjective evaluation of items by users, awards are given by a professional authority and are regarded as an objective evaluation. While subjective evaluation was always dis-couraged in library catalogues, awards could be part of notes (area 7 of ISBD). None of these attributes are expressed in the UDC. Although it would be possible to construct a UDC number for the Annual Academy Award (Oscar), for example, its assignment to the document would mean that the award is discussed in it, not that it has received that award.

Collection or series is a relatively popular group of tags among users of Library Thing and Amazon, but not used in 43 Things or Del.icio.us. Users ex-press it by assigning actual series names, like Golden Books or Disney Classics, or by naming their own collection, e.g., “Disney DVD” or simply “collec-tion.” This attribute can be closely related to owner-ship when it is expressed with a tag like “my DVDs.” This attribute is reported in area 6 of ISBD, but not in subject description in library catalogues.

Neither the collection (series) nor the audience is expressed in 43 Things tags, while audience is men-tioned only once in Del.icio.us. They are the sixth most frequent attribute in Library Thing and Ama-zon. It therefore seems that both attributes are more frequently associated with traditional library materi-

als (printed books and sound or video recordings) compared to information objects that are not tradi-tional library materials (web pages or personal goals). Users can note that the book, soundtrack, or movie is appropriate for a certain audience with tags like “children’s book,” “family movie,” or “toddler”. Regarding “toddler” as an example from the category “audience,” which was assigned to the soundtrack of The Sound of Music, one wonders what the user in-tended to say. There seems to be no relationship be-tween the two. This is one of the problems associ-ated with folksonomies, noted in scientific literature (Mathes 2004; Noruzi 2006; Munk and Mørk 2007; Spiteri 2007; Steele 2009).

Some tags for sound recordings are categorized by instruments’ names (“fiddle,” “banjo”). They could easily be represented by a UDC number. The same is true for places, which can either be expressed by ge-neral terms (“mountains”) or proper names (“North Carolina” or “Blue Ridge Mountains”). Place names were not coded with other names above, but only with this category. Place and time attributes were mentioned in 43 Things, Library Thing, and Ama-zon, but not in Del.icio.us. Both are represented by auxiliary numbers in UDC. Time was expressed with terms like daily, old time, 1990s, or 1999.

A group of tags expressing intention, plan or action (“gift 4 raven,” “CD to review,” “must own”) or occa-sion (“mother’s day”) could hardly be part of a sub-ject or bibliographic description in library catalogues for an obvious reason – they are highly subjective. This attribute only appears among tags in Library Thing and Amazon and cannot be expressed with an UDC number. Experience, expressed with tags such as “silly” or “suspense” is just as subjective and therefore never part of subject description in library catalogues, nor can it be expressed by a UDC number.

A very small number of tags express colour (“color movie”), source of purchase [“donation (T.Nicholas)”], accessibility (“free”), nation (“Ame- rican Indians”), web address (“URL”), or past ac-tions (“tagged,” “seen”). Some of them could easily be expressed by UDC numbers, others not (“URL” or “seen”). These tags are of little importance be-cause of the low frequency of their use. 3.0 Discussion We expected that topics as the main UDC class numbers and place, time, language, and other auxil-iary numbers would appear as attributes among tags in the four analysed systems. In fact, topic and genre

Page 78: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

314

appeared most frequently among tags. Among the four common attributes present in all four analysed services, topic shows the highest frequency. The other three are name, genre, and time. Place and form of document appeared fairly frequently and in at least three systems, which means that they are not limited to traditional library materials. Form of do-cument and medium (carrier) are two attributes re-lated to each other that are both usually part of the bibliographic description and can also be expressed with UDC numbers. The question is whether repeti-tion of the same data in the bibliographic description and subject description part of the catalogue record is reasonable and economically justifiable. Repetition of title proper of the work in about 10% of cases in Amazon and Library Thing seems particularly trou-blesome. Trant and Bearman (2008) report a similar finding. When they compared user assigned tags to museum documentation, object (45%), primary title (25%), materials (21%), creator (7%), and creation date (2%) were among the most frequent informa-tion, repeated in user tags and museum documenta-tion. They did not question whether repeating data was reasonable. Furthermore, surveyed museum pro-fessionals in their study believed that user tags were helpful in searching.

We were surprised by the high rank of evaluation tags for Library Thing and Amazon. Librarians care-fully avoid such categorisation because it is subjec-tive and could be offensive to the user. It also cannot be expressed with UDC numbers. Another surprise was the large number of names among tags for tradi-tional library materials in Library Thing and Ama-zon. Names can be added to the UDC numbers. However, the question is whether all the associated different roles names could and should be expressed in the UDC. A recent discussion on the ISKO mail-ing list revealed that there is a need to distinguish be-tween real persons, literary characters, human, di-vine, and imaginary beings in the UDC. This is only appropriate for persons or characters that are the to-pic of the work, e.g., biographical description, liter-ary study. People also have different roles in the in-tellectual creation of a work, they may be actors, ani-mators, musicians, etc. All these roles are usually ex-pressed in the area 700 of the UNIMARC catalogue record, where it can also be linked to the appropriate name authority files. Authority files then offer uni-form headings in case the user searches by a different name. In our study, we had the case of a misspelled author’s name - Anderson instead of Andersen. UDC would not be able to help the user in such a

case, but a name authority file would. We would the-refore join Richard Hartley (2009) in his suggestion to use name authority files in connection with folk-sonomies.

There are a number of attributes that seem to be important to users and are also part of the UDC. They are the audience, musical instrument, action, and occasion. Audience is usually expressed in coded fields in UNIMARC records. This information may also be included in the UDC number. Purpose (e.g., “gift idea”), intention (e.g., “to review”), or occasion (e.g., “mother’s day”) could be expressed by an UDC num-ber, but the number would express the topic of the document, not the intended attribute. Because this in-formation can be very subjective, it is not likely it could become part of the UDC in future. This means that folksonomies could accompany, not replace clas-sification systems and subject heading languages in li-brary catalogues.

A number of attributes that are important to users are not part of the UDC. Of those, some are part of the bibliographic description: collection/series, edi-tion, accessibility, and URL. It is clear that the biblio-graphic description in Amazon is poor and does not allow the user to search by date of publication or edi-tion. Users therefore have to resort to other means. They use tags to overcome the weakness of the sys-tem. One wonders whether the designers of Amazon and similar bookstores do this on purpose, to force the user to spend a long time searching and browsing for the appropriate title. During this process, the user is exposed to a large number of other titles. The share of attributes that are important to users but not part of the UDC is larger than the share of attributes that are part of the UDC. They are: award, experience, gift, occasion, and action. Award is a very useful category, as it intends to stimulate interest in the document. It should therefore be part of the bibliographic descrip-tion but not part of subject description. If included in the subject description, it would mean that the work is about the award instead of indicating that the work received the award.

Ambiguity is a frequent complaint about folkso-nomies. Neologisms may be culturally biased. There may be some terms among the tags that we do not understand yet they are part of the user’s everyday vocabulary. This is why we did not name our cate-gory “neologisms,” but rather “unclear terms”. One percent of tags were unclear in Del.icio.us, 6% in Li-brary Thing, and 4% in Amazon. Unclear tags repre-sented 8% of the total tags for Cold Mountain in Amazon in 2007, according to Demšar et al. (2009).

Page 79: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

315

Spiteri (2007) does not analyse concepts she was un-able to understand, but found that 10% of tags in Del.icio.us were neologisms, slang, or jargon. We do not know whether these percentages are high or low. It can be expected that they would cause difficulties in searching and browsing. It takes time to include new concepts in established indexing languages. In-dexing with UDC or other indexing languages could not help users when searching or browsing for these terms, until they become well known and commonly used terms. 4.0 Conclusion and Summary Inspired by numerous research reports on folksono-mies, we analysed tags in four folksonomies: Del.icio.us, 43 Things, Library Thing, and Amazon. Not all of the analysed systems are intended for cate-gorising usual library materials. However, their selec-tion was intentional: we wanted to see whether UDC is in fact universal and can cover subject description of information objects, beyond library materials. We were not able to develop a random sample. Our sam-ple is actually a very small representation of an un-known population of tags in each of the social tagging systems. We can therefore only claim that the findings hold for the sample but we cannot know whether they would hold for the entire population.

We found that 90% of tags for bookmarks in Del.icio.us and 91% of tags expressing people’s goals in 43 Things could be represented by a UDC num-ber. In contrast, only 79% of tags for books, sound recordings, and movies in Library Thing and 63% of tags for the same books, sound recordings, and mov-ies in Amazon could be represented with an UDC number. This low share is not surprising when we analyse the kinds of concepts that are used as tags.

Names are among the most frequent tags in Library Thing and Amazon (16% of all tags in the sample). They could be part of the UDC if an indexer con-structs a number with such a purpose. However, this cannot be an automatic process, and it would be more appropriate if authority files were linked to folksono-mies to help users in selecting the appropriate form of name. Topic and genre, which rank among the three top categories of tags, could be represented by UDC numbers. Other categories that could also be repre-sented with a UDC number include: form of docu-ment or information object, technology supporting its use or media, audience (e.g., children’s book), musical instruments, place, and time. These categories repre-sent 60% of all tags in the sample.

Categories which could not be expressed with a UDC number constitute 24% of all tags in the sam-ple. They represent awards, series or collection, edi-tion, evaluation, experience, action, occasion and pur-pose, availability, ownership, and related work. Some of these categories form part of a bibliographic de-scription (6% of all tags in the sample). However, none of the analysed sites adopted ISBD and, as a con-sequence, their bibliographic information is not com-plete. It seems that this information is actually impor-tant to users for information objects like books, sound, and video recordings. This finding can con-tribute to the development of the ISBD or its succes-sors. Among the remaining tags, which could not be represented by UDC numbers, evaluation alone holds a 9% share. It is unlikely that the evaluation would be-come part of a bibliographic or subject description. However, if this information were included among us-ers’ tags, it would probably be helpful to some library users. One could envisage that a user would evaluate library items and another user would find the first user’s tags fit his or her literary preferences.

Golub et al. (2009) present a project where user tagging was enhanced by traditional indexing lan-guages (DDC and LCSH). They found that users like to utilise the assistance offered by those indexing lan-guages. We would take their suggestion further in the direction of Smith’s (2008) observations of structured reports in Buzzillions.com (2009) and Mefeedia.com. These services identified the most frequently used fac-ets among tags and structured their input according to these facets. We propose that the user is offered a structured form for adding his or her tags. The struc-ture would separate personal and geographic names (not distinguishing real and imaginary persons and places). Authority files for personal and corporate names, and thesauri of geographical names could be offered here to help the user in selecting the appropri-ate form of name. UDC could be offered to help users in selecting topic, genre, form, and medium. When re-cording time, users should be offered examples of standardized forms of reporting time. It should not be too difficult to link data from the ISBD area 6 or the UNIMARC field for collection. Suggestions could also be offered regarding awards. On the other hand, evaluation, action, purpose, and experience should re-main entirely free of suggestions. We also believe that users should not be forced to use only the suggested terms. They should be able to use either a suggested term or write their own. It would also be appropriate to ask users to suggest similar works. The form should provide space for entering any other terms a user

Page 80: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

316

wishes to enter that are not appropriate to prescribed fields on the form. Our suggestion for a more struc-tured interface with the user is also based on Markey’s (2007a, 2007b) observation that people are generally inclined to work on the principle of least effort but are likely to be quite persistent if the system supports them during their work, search, and exploration.

We doubt that folksonomies could contribute sig-nificantly to the description of library resources. This thesis is based on three results of our analysis: 1) that a relatively small share of concepts represented by tags in Library Thing and Amazon can be found in UDC, and 2) that a few tags are used frequently and a large number rarely (Zipf ’s law). The first finding means that librarians and users have different views on the representation of information objects in catalogues and other information resources. Nevertheless, 59% of tags could be expressed with UDC numbers. About half of the 38% tags which could not be ex-pressed with UDC are names. The second finding means that only a small part of the universal knowl-edge would be accessible through folksonomies, while indexing languages should still provide access to the rest of the knowledge world.

Our finding that a larger proportion of tags used in Del.icio.us and 43 Things can be found in the UDC compared to Library Thing and Amazon tells us that the UDC is indeed universal and could sup-port not only library catalogues but diverse social networks and digital repositories as well. One could envisage either Del.icio.us or some company with a large document repository applying UDC to sup-port navigation through large quantities of docu-ments and information in other formats. The new decision of the UDC Consortium, made public at the UDC Seminar in the Hague in October 2009 to make a multilingual collection of about 2,000 UDC numbers publicly available on the Web may actually stimulate UDC’s widespread use (UDC Consor-tium 2009a, 2009b). It may be appropriate to further invest in the development of the UDC and a free ac-cess version. This approach may be the way to con-nect library and internet communities and bring us-ers back to libraries and library catalogues. References Buzzillions.com. 2009. Available http://www.buzzil

lions.com/ Delicious trend graphs. 2010. Del.icio.us blog. Availa-

ble http://blog.delicious.com/

Demšar, Blaž, et al. 2009. Folksonomije = Folkso-nomies. Šolska knjižnica, 19: 15-23.

Furner, Jonathan. 2007. User tagging of library re-sources: toward a framework for system evaluation. In World Library and Information Congress: 73rd IFLA General Conference and council, 19-23 Au-gust 2007, Durban, South Africa. Available http:// archive.ifla.org/IV/ifla73/papers/157-Furner-en.pdf

Golub, Koraljka et al. 2009. Enhancing social tagging with a knowledge organization system. Available http://www.ukoln.ac.uk/projects/enhanced-tagging/dissemination/entag-ifla-v3-final.pdf

Halpin, Harry; Robu, Valentin; and Shepherd, Hana. 2007. The complex dynamics of collaborative tag-ging. In 16th International World Wide Web Confer-ence, May 8-12, 2007, Banff, Alberta, Canada. Avai-lable http://www2007.org/papers/paper635.pdf

Hartley, Richard J. 2009. Folksonomies to ontolo-gies: the changing nature of controlled vocabular-ies. In Griffiths, Jillian R. and Craven, Jenny, Eds., Access, delivery, performance: the future of libraries without walls: a festschrift to celebrate the work of Professor Peter Brophy. London: Facet. pp. 145-58.

Hayman, Sarah and Lothian, Nick. 2007. Taxonomy directed folksonomies : integrating user tagging and controlled vocabularies for Australian education networks. In World Library and Information Con-gress: 73rd IFLA General Conference and council, 19-23 August 2007, Durban, South Africa. Available http://www.ifla.org.sg/IV/ifla73/papers/157-Hayman_Lothian-en.pdf

IFLA. 1998. Functional requirements for bibliographic records: Final report. München: K.G. Saur. Amen- ded and corrected Feb. 2009. Available http://www. ifla.org/files/cataloguing/frbr/frbr_2008.pdf

Iivonen, Mirja. 1990. Interindexer consistency and the indexing environment. International forum on information and documentation 15: 16-21.

Kwan Yi. 2008. A conceptual framework for improv-ing information retrieval in folksonomy using Li-brary of Congress Subject Headings. In Griffiths, José-Marie, ed. ASIST 2008: Proceedings of the 71st ASIS&T Annual Meeting: people transforming infor-mation – information transforming people. Proceed-ings of the ASIS&T annual meeting 45. Sliver Spring, Md.: ASIS&T, pp. 1-6.

Lincoln, Yvonna S. and Guba, Egon G. 1985. Natu-ralistic inquiry. Beverly Hills, Ca: Sage.

LISTA – Library, Information Science & Technology Abstracts. Available http://www.ebscohost.com/ customerSuccess/default.php?id

Page 81: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 A. Šauperl. UDC and Folksonomies

317

Markey, Karen. 2007a. Twenty-five years of end-user searching, Part I: Research findings. Journal of the American Society for Information Science and Technology 58: 1071-1081.

Markey, Karen. 2007b. Twenty-five years of end-user searching, Part II: Future research directions. Journal of the American Society for Information Science and Technology 58: 1123-1130.

Mathes, Adam. 2004. Folksonomies – cooperative classi-fication and communication through shared metadata [term paper]. Available http://www.adammathes. com/academic/computer-mediated-ommunication/ folksonomies.html.

Matoh, Robert and Koželj, Aljoša. 2009. Predstavitev in analiza dveh folksonomij [Introduction and analysis of two folksonomies]. Knjižničarske novi-ce 3. Available http://www.nuk.uni-lj.si/knjiznicar skenovice.

Munk, Timme Bisgaard and Mørk, Kristian. 2007. Folksonomy, the power law & the significance of the least effort. Knowledge organization 34: 16-33.

Neuendorf, Kimberly A. 2002. The content analysis guidebook. Thousand Oaks, Ca: Sage.

Noruzi, Alireza. 2006. Folksonomies: (un)controlled vocabulary? Knowledge organization 33: 199-203.

Olson, Hope A. and Wolfram, Dietmar. 2008. Syn-tagmatic relationships and indexing consistency on a larger scale. Journal of documentation 64: 602-15.

Smith, G. 2008. Tagging: People-powered metadata for the social web. Berkeley, CA : New Riders.

Spiteri, Luise, F. 2007. The structure and form of folk-sonomy tags: The road to the public library cata-logue. Webology 4: article no.41. Available http:// webology.ir/2007/v4n2/a41.html

Steele, Tom. 2009. The new cooperative cataloging. Library hi tech, 27: 68-77.

Trant, Jennifer and Bearman, David. 2008. Public and professional vocabularies: Comparing user tagging with museum documents and documentation. In Networked Knowledge Organization Systems and Services : The 7th European Networked Knowledge Organization Systems (NKOS) Workshop : Work-shop at the 12th ECDL Conference, Aarhus, Den-mark, Sept. 19, 2008. Available http://www.comp. glam.ac.uk/pages/research/hypermedia/nkos/nkos 2008/programme.html.

UDC Consortium. 2009a. UDC Licences. Available http://www.udcc.org/licence.htm

UDC Consortium. 2009b. UDC Summary. Available http://www.udcc.org/udcsummary/php/index.php

Vander Wal, T. 2007. Folksonomy coinage and defini-tion. Available http://www.vanderwal.net/folks onomy.html

All URLs were checked February 2010.

Page 82: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

318

Classification Issues in 2008 Report

Nancy J. Williamson

Faculty of Information, University of Toronto, 140 St. George Street, Toronto M5S 3G6 Ontario Canada <[email protected]>

Tenth International ISKO Conference, August 2008, Montreal, Canada The International Society for Knowledge Organization ce- lebrated its 10th year of exis-tence with its biennial confer-ence at the Ecole de biblio-

théonomie et des sciences de l’information, Université de Montréal in August 2008. The theme of the confer-ence was “Culture and Identity”. As with the earlier conferences, this analysis is organized according to the organization of the text of the published proceedings. However, in this case, unlike earlier proceedings which were ordered according to the order of presentation in the programme, this volume groups the 57 papers ac-cording to the 9 sub-themes of the programme. Thus the papers on a particular subtheme will have been presented in several separate sessions across the pro-gramme. For purposes of analysis this grouping has some advantages over the earlier arrangement. Some of the groups are quite large (as many as 11 papers) and the discussion below attempts to further group papers under the sub-themes.

The conference was opened with a keynote address entitled “Interrogating Identity: a Philosophical Ap-proach to an Enduring Issue in Knowledge Qrganiza-tion,” by Jonathan Furner. In this paper he focuses on the empirical evaluation of systems and the tools and techniques that we use in building our systems. He raises a number of questions that must be addressed in determining the “goodness” of the models that we use in our attempts to build better systems. Ultimately he is concerned with the use of philosophical theories in evaluating KO systems and the “extent to which KO schemes reflect the cultural identities of their users.” His presentation is represented in the proceedings by an extended abstract.

Section 1 The first section, entitled Models and Methods in Knowledge Organization. contains ten papers. Three of these papers tackle classification in its broadest sense. Louise Spiteri (Canada) discusses “Causality and Conceptual Coherence in Assessments of Similar-ity.” Starting with the notion that objects, events or entities form a concept because they are similar to one another, Spiteri examines traditionally based concept theories and finds that they do not adequately support concept coherence. To support her findings she uses two types of theory – those that are similarity-based and those that are knowledge-based. She concludes that library and information science needs to further explore “the impact of knowledge and causality upon people’s construction of concepts to see whether it is possible to achieve a concensus of coherence for these concepts within a given domain.” In their paper on “Hermeneutic Approaches in Knowledge Organiza-tion: An Analysis of Their Possible Value” Fulvio Mazzocchi (Italy) and Mela Bosch (Argentina) con-sider how hermeneutics and other related theories may bring new insights into KO. They briefly com-pare the heuristic model for which the methodologies take one of two forms - procedural form and declara-tive form – with the hermeneutic approaches. Sources are cited and the main features of the two types de-scribed. A case study was carried out on computer ap-plications of selected samples from applications in Europe, especially from Italy. Here it is briefly de-scribed. In the language context the term “Education” in English versus Ĕducation (French) is used. The purpose is to show how some of these theories might be used “to provide a more realistic representation of the complexity of knowledge and language in KO sys-tems.

Three of the papers in section 1 focus on aspects of existing universal systems - specifically, Dewey, Bliss

Page 83: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

319

and UDC. “Making Visible Relationships in the Dewey Decimal Classification: How Relative Index Terms Relate to DDC Classes” by Rebecca Green (United States) is amply described by its title. The au-thor is dealing with the relationship between two types of informational notes that regularly appear in DDC. These are “class here” notes in which topics are described as ”approximation of the whole” and “in-cluding notes” that contain topics that are being given “standing room” only. These notes perform slightly different roles. The terms in “including” notes are as-sumed to be terms that are more-or-less comprehen-sive with the topic under which they sit, while “stand-ing room” terms are also seen as comprehensive to the topic but they are restricted in number building. In re-ality, they are presented as subtopics of the main class and may at some future date be given class numbers of their own.. Currently, the two types are not differenti-ated in the DDC Relative Index and no record is kept as to which terms are kept and how semantic factoring may have been applied in each group. The author ex-plains that it would be useful to be able to ascertain the relationship between these terms and to be able to do it automatically. As explained in the methodology the differences are not always applied consistently and the difficulties are outlined. The process of distin-guishing the two types is described and the matching of Relative Index terms with schedule entries is dis-cussed. Some statistics have been derived but there is still work to be done. This investigation is part of an analysis of the classification which is intended to lead to supporting automated reasoning within the scheme. In a paper entitled “Language Related Problems in the Construction of Faceted Terminologies and Their Automatic Management” Vanda Broughton (United Kingdom) “describes current work on the generation of a thesaurus format from the schedules of the Bliss Bibliographic Classification, 2d edition (BC2). This paper capitalizes on the long-held recognition of the possible use of faceted classification in the construc-tion of a thesaurus and on the recent acknowledge-ment of faceted terminologies reference in British Standard BS8723. This research is further related to current work on BC2 where it is desirable to produce an integrated classification, index and thesaurus.” It appears possible that facet methodology can be ap-plied to all three formats but there is work to be done on terminology control. It is this latter aspect that is addressed in this paper. Four aspects of the language problem are discussed – 1) automatic generation of the thesaurus from the classification; 2) vocabulary control in class headings; 3) managing equivalence re-

lationships; and 4) compound terms and semantic fac-toring. Findings indicate that “semi-automatic man-agement … is shown to be viable.” One of the prob-lems to be faced is the fact that, up to now, vocabulary control has not been applied in BC2. For fully auto-mated derivation of terms much work is needed “on establishing rules for formatting of class headings and the control of vocabulary.” In the third paper in this group entitled “Medicine and the UDC: The Process of Restructuring” Ia McIlwaine (United Kingdom) and Nancy Williamson (Canada) describe the progress in the development and revision of Class 61 Medical Sciences in the Universal Decimal Classification. This is an on-going experiment in the possible conversion of UDC into a fully faceted system. Intellectual sup-port for the project comes from the work of the Clas-sification Research Group and Class H: Anthropology, Human Biology, Health Sciences of BC2. Phase I of project (now completed) is briefly described and illus-trated. Phase I has produced a workable base for pro-ceeding to Phase II. Findings from Phase I are identi-fied and the procedures being used in Phase II are de-scribed. It is hoped that Phase II will bring the project to a usable conclusion.

Another three of the papers in Section 1 deal with specialized subject areas.

John DiMarco (United States) presented a paper entitled “Examining Bloom’sTaxonomy and Peschl’s Modes of Knowing for Classification of Learning Ob-jects on the PBS.org/teachersource Website.” Learning objects are described as videos and animated clips that are “deployed into classrooms through public televi-sion websites.” The research is a study of metadata representations of learning objects The goal of the study is to propose and apply a comparative taxonomy to classify learning objects using these two systems. In his paper entitled “Cultural Markers and Localising the MIC Site” James M. Turner (Canada) addresses language problems in making websites produced in one language usable by users who do not speak the language of the original. His starting point is the fact that simply “translating” the website is not sufficient. To be understandable the results must be “localised”so that users may understand the content in terms of their own culture. This paper describes a project in which a kit for “localising” a chosen website (MIC) was developed and tested using selected pages from the site in French, Spanish and Arabic. The kit in the form of a pdf file is usable with other languages. Jobo Alberto de Oliveira et. al. (Brazil and Spain) presented “A Time-aware Ontology for Legal Resources” that describes a new approach to associating metadata to

Page 84: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

320

legal documents. The system exploits a fully devel-oped information ontology of legal resources. Their model builds on the Functional Reqirements of the Bibliographic Record (FRBR) Model and takes time into account. The derived model is described in detail and is accompanied by illustrations.

The remaining papers in Section 1 address a variety of factors and methods. Melanie Feinberg (United States) presented a paper entitled “Classificationist as Author: the Case of the Prelinger Library” in which she describes t the system used in the Prelinger Li-brary in San Francisco. The library is a private owned non-circulating collection of 50,000 items, not cata-logued but arranged in a progressive order from one end to the other. The order has been determined by the owners Megan Shaw Prelinger and her husband Rick. (hence, they are the authors of the scheme). The system is intended for browsing and reflects a con-scious attempt “to represent the realms of thought that bounce around the insides of both our (i.e the au-thors’) minds.” Different sections are marked with subject headings written on masking tape. For exam-ple, a series of headings on shelf 5 runs from US In-ternal Dissent to Nuclear Threat, then to War, Con-flict and on to Peace, followed by Radical Studies and then Utopia. When location is relevant, location pre-cedes subject in most cases but adapts genre innova-tively. Using this collection as a base and delving more deeply into the nature of order, Feinberg finds the au-thorial voice “works as a persuasive mechanism, facili-tating a rhetorical purpose for the collection.” Finally in Section I paper by Yves Marcoux and Ềlias Rizkal-lah (Canada) discussed “Knowledge Organization in the Light of Intertextual Semantics: A Natural Lan-guage Analysis of Controlled Vocabularies.” In short the authors provide an example to show that intertex-tual semantics might be applied to controlled vocabu-laries expressed in SKOS (Simple Knowledge Organi-zation System). Section 2 A section on Multilingual and Multicultural Environ-ments contains five papers on various aspects of the subject. K.S. Raghavan and A. Neelameghan (India) presented “Design and Development of a Bilngual Thesaurus for Classical Tamil Studies: Experiences and Issues.” In doing so, the authors examined aspects of the design and development of vocabulary manage-ment in multilingual thesauri in a culture specific do-main particular to the Tamil language, looking at alter-native ways of linking certain descriptors to long lists

of NTs and RTs, They discuss advantages of the inte-grated use of two or more knowledge organization tools, and the use of a bilingual thesaurus for certain types of research in Tamil. Among the concerns, are issues related to equivalence, non-hierarchical associa-tive relationships, homographs and NT’s. Elaine Menard (Canada) focuses on “Indexing and Retriev-ing Images in a Multilingual World.” Her paper pre-sents the problem statement and methodology and preliminary results of a project comparing two ap-proaches to image indexing – traditional image index-ing using a controlled vocabulary and free image in-dexing e3wcontrolled vocabularies and natural lan-guage together enhance the results.

Maria Odaisa Espinheiro de Oliveira (Brazil) dealt with “Knowledge Representation Focusing [on] Amazonian Culture.” Her research uses cultural terms from popular histories collected from residents of eight municipal districts in the country. Knowledge representation in the Amazon Culture is discussed and the methodology described. A classification and a thesaurus were constructed. The project resulted in a deeper knowledge of the Amazon culture and “a bet-ter understanding about the linguistics, the terminol-ogy and the theory of the classification.” A fourth pa-per by Agnes Hajdu Bart (Hungary) was concerned with “Knowledge Organization in the Cross-cultural and Multicultural Society.” Her interest is in the fact that “cross language retrieval systems are needed for those who can search in only one language. She identi-fies three problems – the lack of consensus on the definition of culture, the distinction between cultural and national boundaries, and the measurement of cul-tural attributes of organizational functioning due to lack of clarity of the definition of culture.“ Possible solutions to the problem - use of multilingual thesauri, use of multilingual subject headings, and the adapta-tion and use of classification systems not based on language - are discussed in turn in the paper. Finally Joan Mitchell (United States), Ingebjrg Rype (Nor-way) and Magdalena Svanberg (Sweden) addressed “Mixed Translation Models for the Dewey Decimal Classification (DDC) System.” They are looking at the issues involving use of two languages in a single edition of DDC. Two models, Norwegian/English and Swedish/English DDC data are described, together with the design of a pilot study “to evaluate use of a mixed translation as a classifier’s tool.” Some chal-lenges and issues related to content and representation were identified. Also the study addresses DDC as a classifiers’ tool. Yet to be considered are the implica-tions for end users.

Page 85: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

321

Section 3 Six papers fall under the broad heading of Knowl-edge Organization for Libraries, Archives and Mu-seums. Kathryn Le Barre (United States) considers facet analysis in the “Discovery and Access Systems for Websites and Cultural Heritage Sites.” She finds that facets function equally well as “browsing and searching devices” in digital museum portals and online catalogues. The author surveyed American practice on 200 websites in 2005, repeated in 2008. The 200 sites formed a base for comparison with 6 online library catalogues and 3 museum interface prototypes “that self-identify as using facets.” The results are obviously not perfect but the author feels certain that improvements will come. Mats Dahl-strőm and Joacim Hanusson ((Sweden) presented a paper “On the Relation Between Qualitative Digiti-zation and Library Institutional Identity. ” It high-lights and discusses “concepts and practices of na-tional library digitization.” Two conceptual models are suggested and the purpose of the paper is “to dis-cuss and rethink concepts of digitization and knowl-edge organization (KO) practices in relation to cul-tural heritage digitization and library identity.” The authors do not provide answers but rather aim to provide a basis for further questions and a platform for future research. Amelia Abreu (United States), in a paper entitled “Every Bit Informs Another” pro-vides a “Framework .Analysis for Descriptive Prac-tice and Linked Information.” Her concern is the problems of coherent description in the convergence of information from the divergent sources of librar-ies, archives and museums on the web. In this paper, she examines the practices of description in subject cataloguing and archival practices along with social tagging in search of “possible new paths for integra-tion.

Jean Riley (United States) addresses problems of “Moving from a Locally-developed Data Model to a Standard Conceptual Model.” She points out that while work is being done on the “connection between conceptual models and system functionality” the situation is as yet unclear. The purpose of her paper is to summarize recent developments in work with con-ceptual models in the LIS field. She examines the ef-fects on interoperability and describes work done and lessons learned “from conceptual modeling efforts to improve interoperability in a set of metadata.” Finally in this section, Jan Pisansky and Maja Zumer (Slove-nia) attack the intriguing subject “How Do Non-librarians see the Bibliographic Universe?” A pilot

study was carried out on three tasks to test the in-struments for acquiring mental models of a biblio-graphic universe. The three tasks included: sorting cards into pairs based on substitutability, card sorting in concept exercises Not surprisingly, it was found that users do not have a consistent model of the bib-liographic universe. The experiments`are described in detail and reasons for the results carefully identified. The experiment is part of a larger study ongoing. While there was failure to identify a consistent model approach to the bibliographic universe, it provided an interesting profile of users’ inadequacies in under-standing the nature of information. Section 4 Section 4 of the proceedings Knowledge Organization for Information Management and Retrieval contains 11 papers making it one of the three largest sections of papers. Examination of the contents suggests that the “information management “ appears to be somewhat of a misnomer here. The papers included here fall into various aspects of knowledge organization systems (KOS) and fall into such areas as :design of new sys-tems, the improvement of existing systems, improve-ments for retrieval, and the organization of special ma-terials using some existing methods.

In response to the inadequacies of existing sys-tems two papers focus on the design of radically new approaches. Rick Szostak (Canada) and Claudio Gnoli (Italy) presented a paper on “Classifying by Phenomena, Theories and Methods”. It uses a vari-ety of theories across the social sciences to demon-strate how documents might be classified by theory type using an approach by phenomena as opposed to classification by discipline. The approach taken fol-lows through from the development of the Leon Manifesto (http://www.iskoi.org/ilc/leon.htm) devel-oped at the conference of ISKO/Italy in Leon in 2007 and uses the Integrative Level Classification (ILC) (http://www.iskoi.org/ilc/) The theories are explained and the methodology described. Examples focusing on the social sciences are used. Michael Buckland and Ryan Shaw (United States) wrote a paper entitled “4W Vocabulary Mapping Across Di-verse Reference Genres.” The term ‘genre’ refers he-re to various reference sources (e,g, bibliographies, biographical dictionaries, catalogues, encyclopedias, gazetteers). They are divided into facets “what, whe-re, when and who” and mapping is done between both similar and dissimilar vocabularies, using the principle that “understanding requires a knowledge

Page 86: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

322

of context” to create the functionality of a reference library in a digital environment.” The process is de-scribed and examples given.

A number of papers in Section 4 focus on improve-ments to the existing tools and processes that serve our KOS. Two papers provide new tools for the use of LCSH. Kwan Yi and Lois May Chan (United States) in a paper entitled “ A Visualization Software Tool for Li-brary of Congress Subject Headings”describe a soft-wear tool (Visual CSH) to be used for effective search-ing, browsing and maintenance of LCSH. A concep-tual framework for converting the hierarchical struc-tures of LCSH into tree structures is described and implemented. Similarly, Nicolas George, Elin Jacob, et. al. (United States) presented “A Case Study of Tagging Patterns in del.icio,us.” project proposes a conceptual framework for LCSH and develops a new tool for visualizing the structure of LCSH. Referred to as Vis-ual.CSH, features of the new tool are described and demonstrated. Both sets of authors describe its fea-tures enumerated by the researchers and reveal multiple aspects of a heading; normalize the hierarchical rela-tionships; show multilevel hierarchies of terms in LCSH sub-trees, improve the navigational functions of LCSH in retrieval and enable the implementation of generic searching (i.e. the ‘exploding’ feature in LCSH).

Another attempt to improve a system, is a paper by Amanda Hill (United Kingdom) entitled “What’s in a Name?” a discussion of “Prototyping a Name Authority Service for UK Repositories.” This paper is concerned with name authority control as part of a “Names project” funded to investigate issues related to the identification of individuals and institutions in repositories of research outputs in the United King-dom. It deals with names of researchers and research institutions that are unlikely to appear in the library authority files. This project is intended to “right” a situation in which names have previously been en-tered in unorthodox and inconsistent ways resulting in problems in retrieval. The paper describes the ex-isting situation and the approach ti improving it.

Similarly, Xu Chen (Germany) studied “The In-fluence of Existing Consistency Measures on the Re-lationship Between Indexing Consistency and Ex-haustivity.” The research examines previous studies and carries out research on a large sample (6,614 re-cords) from two Chinese bibliographic catalogues. Measurements where taken from two formulae used in earlier studies, The levels of consistency found were 64.21% in one case and 70.71% in the other and relationships were high when two indexers had

the same exhaustivity and low when they used dif-ferent levels of exhaustivity.

Turning to the Internet, a paper on “A Survey of the Top-level Categories in the Structure of Corpo-rate Websites,” Abdus Sattar Chandhry and Christo-pher S.G Khoo (Singapore) take another approach to improving access. They examined websites “to iden-tify common categories, structures, facets and terms used to organize these websites.” The researchers drew on the websites of IT companies. From this a taxonomy was constructed and used to analyze the top level websites of corporate product types. New categories found were incorporated into the taxon-omy. The resulting taxonomy is expected to be used as a tool in designing websites.

Two papers in Section 4 focused on the construc-tion and use of thesauri. Veronica Vargas and Catalina Naumis (Mexico) in their “Water-related Language Analysis” addressed “The Need for a Thesaurus of Mexican Terminology” They are faced with two problems: the need for uniformity in the terminology of the Spanish language in their chosen subject area and the lack of a reliable thesaurus on the subject of water in Spanish. The domain itself presents prob-lems of multi and inter-disciplinarity, while the litera-ture of water management is diverse, requiring the re-searcher to think broadly in terms of ”the phenomena of information transmission and retrieval.” This paper presents the methodology used and the results of the analysis. The authors conclude that a new thesaurus is needed. Also concerned with thesaurus use, Ali Shiri and Thane Chambers (Canada) describe research into “Information Retrieval from Digital Libraries” by in-vestigating and “Assessing the Potential Utility of Thesauri in Supporting Users Search Behaviour in an Interdisciplii inary Domain” Transaction log data was obtained from the use of a nanoscience and technol-ogy digital library. The characteristics of users queries and search terms were analyzed. These were used to determine the extent to which users search matched terms found in two established thesauri – the IN-SPEC thesaurus and the Compendex database. Meth-odology is described and data analysis provided. En-couraging results indicate that the thesauri can be helpful to users, especially in query formulation and expansion of searches. There is potential to support both interactive and automatic query formulation, The investigation also revealed that acronyms as well as full forms are needed in the thesauri. The authors believe that the research has something to say to as-pects of knowledge organization and search behav-iour studies.

Page 87: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

323

One paper in this section focuses on KOS for spe-cial materials. Sabine Mas, L’Hèdi Zaher, and Manuel Zacklad (France) described “Design and Evaluation of Multi-viewed Knowledge System for Administrative Electronic Document Organization.” The research is taking place at the Université de Technologie de Troyes and investigates the creation and use of a fac-eted classification system for handling personal ad-ministrative documents in electronic format. The au-thors’ findings indicate that a faceted classification is a viable alternative to the use of the “hierarchical para-digm.”

Gercina Ângela Borem (Brazil) in her paper on “Hypertext Model – HTXM”: reported on “A Model for Hypertext Organization of Documents” reported on the construction and implementation of a system for the organization and representation of human knowledge.It is based on four systems – facet analysis theory (FAT), conceptual map theory (CM), a seman-tic structure of hyperlinks and a set of technical guide-lines. It is envisioned that the prototype might even-tually be used to organize a digital library. Section 5 Epistemological Foundations of Knowledge Organi-zation is a very cohesive group of 11 papers As this analysis indicates there is still a strong sense of the importance of the fundamentals and theories of knowledge underlying knowledge organization.

Peter Ohly (Germany) presented a paper entitled “Knowledge Organization Pro and Retrospective.” The author takes a long view of the nature of the term ‘knowledge organization’ as it has come to be known. Beginning with Ingetraut Dahlberg’s answer to the question “What is knowledge organization? he de-scribes the German definition of “Organization” and concludes that term means more than organizing and extends to “the processes of saving, finding and com-municating thoughts” In further discussion it is seen as a counterpart of society.. The factors of its devel-opment are discussed and the major steps – content, sustainability, public availability, persistence, coding, processing and organizing are identified. Finally future expectations for the discipline are discussed. The pa-per is a fitting introduction to the section.

In her discussion of “Knowledge and Trust in Epis-tomology and Social Software/Knowledge Technolo-gies” Judith Simon (Austria) indicates that her paper aims “to identify connections between trust and knowledge inherent. in the sotware/technologies and connections of knowledge and trust.” She identifies

various points and argues for “intensified intellectual exchange between different theoretical approaches to knowledge as well as between … theoreticians and ICT (information and communication technologies) developers.” A paper by Grant Campbell (Canada) en-titled “Derrida, Logocentralism and the Concept of Warrant on the Semantic Web” uses Derrida’s theories to consider “warrant” as understood in the traditional library. Following from an analysis of the two types of systems, he concludes that “library information prac-tice has evolved as a complex discourse around ques-tions of warrant that provide a subtlety and richness to knowledge organization that the Semantic Web has not yet attained..” Further, he says that the Semantic Web would need to find new approaches to handling this problem.. Jian Qin’s (United States) paper enti-tled “Controlled Semantics Versus Social`Semantics” describes “An Epistemological Analysis” Comparisons are made and examples given. The purpose of the pa-per is to explain the differences and conections be-tween the two types of semantics from the perspective of knowledge theory. These connections have implica-tions for further research.

In her presentation “Wind and Rain`and Dark of Night” Hope Olson (United States) addresses “Clas-sification in Scientific Discourse Communities.” This paper explores the links between natural or scientific classification and classification of knowledge. It uses discourse analysis of selected standards for natural phenomena to address two research questions: “Are scientific categorization standards of natural phenom-ena subject to the same principles as bibliographic classification (warrant,, hierarchical force, etc.)?” and ´What discourses operate in scientific communities that shape their categorization standards?” In the analysis she uses ‘temperature scales’ to measure single variables. Two standards are used to categorize com-plex phenomena – those classifying hurricanes and planets. The author states that this line of research is worth pursuing further “because of its potential to re-veal the discourses behind both approaches to classifi-cation.” Further research might address the question “Are these discourses parallel to discourses that oper-ate in relation to bibliographic classifications?”

A presentation by Thomas Dousa (United States) entitled “Empirical Observation, Rational Structures and Pragmatist Aims” deals with “Epistemology and Method” in Julius Otto Kaiser’s Theory of Indexing.” The author selects a typology of epistemological posi-tions underlying methods for designing KO systems designed by Birger Hjorland to an analyze the theory. The goal is to measure the degree of consistency in in-

Page 88: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

324

dexing - a goal in which Kaiser did not entirely suc-ceed. However in using Hjorland’s work the author sought “to pierce Kaiser’s veil of consistency, uncover the hybrid nature of his epistemology, and learn the ways in which the epistemological position (s) related to his ethnological prescriptions.” He notes the use-fulness of Hjorland’s typology in uncovering aspects of Kaiser’s system and suggests that its application to other KO systems could lead to an understanding of both historical KO systems and ways that different epistemological positions interact with other classifi-cation and indexing schemes.

In Richard Smiraglia’s (United States) paper “Noe-sis” perception is seen as a “crucial element” in the vi-ability of knowledge organization systems. The author sees it as a filter in contextual information with poten-tial for categorization. In this research he seeks to” in-crease understanding of the role of cognition in every day classification by developing a fuller profile of per-ception.” Pictures of mailboxes from various locales are the everyday objects used to demonstrate the no-etic process. In the analysis, tag clouds are used to demonstrate the perceptual differences that suggest different user perceptions involved. Findings indicate that social contexts, cultural moderation and percep-tual fluidity are constants in the ego acts of classifica-tion.” Another ever present concern in knowledge or-ganization is “bias.” Birger Hjorland (Denmark) ad-dresses this problem in his paper “Deliberate Bias in Knowledge Organization? Starting with Melanie Feinberg”s view that if we cannot eliminate bias in classification we should acknowledge this and be re-sponsible about it and defend it, the author suggests that history indicates that classificationists see their role as being documentalists and compilers as opposed to designers. In examining these claims Hjorland raises such questions as “Is KO an objective and neu-tral activity? Can it be? Should it be? In conclusion, he suggests that the epistemological arguments put forward by Feinberg and Hjorland should be applied to specific domains. Some domain analysis is available from the humanities and the social sciences but “fur-ther investigation is needed, especially in the social sciences.”

Two papers in Section 5 address the theories that underly knowledge organization. Joseph Tennis and Elin Jacob (United States) pose the idea of leading “Toward a Theory of Structure in Information Or-ganization Frameworks.” In it, they seek to lay the groundwork for the development of such a theory and begin defining “structure” in the context of a number of previous writings. Finally they examine Mooers’

method of descriptors. Then Jack Andersen (Den-mark) looks at “Knowledge Organizaton as a Cultural Form.” In doing so he draws on Lev Manovich’s ar-guments about the database as a cultural form. He ar-gues that knowledge organization is “a prime commu-nication and production form of new media, turning knowledge organization into knowledge design.” He begins by outlining Manovich’s argument and fol-lows`it with a discussion of its implications for knowledge organization research. “Aesthetics”, he says, “brings as new dimension to knowledge organi-zation theory.”

In the final paper in section 5, Hur-Li Lee (United States) describes the “Origins of the Main Classes in the First Chinese Bibliographic Classification.” Her purpose was to provide an “improved understanding” of the classification “applied in the Seven Epitomes, the first documented classified library catalogue in China” which was completed in the first century BC. The au-thor discusses the findings of an analysis of the first six classes and identifies three major issues for further consideration - the concept of ‘discipline’, the limita-tions of the classification in relation to literary warrant and “the political overtones of the claasification stemming from the fact that the catalogue was a by-product of a government-sponsored collation pro-ject.” Section 6 Section 6 is a small group of two papers on Non-Textual Materials. This is surprising, given recent em-phasis on the organization and representation of non-print material. Abby Goodrum et. al. (Canada) pre-sented a paper entitled “The Creation of Keysigns: American Sign Language Metadata.”It sets out pre-liminary results of a pilot test on the creation of “a folksonomic gestural taxonomy for sign language in-dexing and retrieval.” Sign language interpreters and deaf participants were involved in the creation of the metadata. This kind of metadata is not commonly un-derstood, making the project cognitively challenging. The paper concludes with suggestions for making the creation of such data easier from “cognitive and physi-cal perspectives.” The second presentation, “Visual Knowledge Organization” by Ulrika Kjellman (Swe-den) addresses the question “Towards an International Standard or a Local Institutional Practice?” The con-text of the paper is the digitization of visual heritage collections to make them accessible through the Internet. The author states that there are obvious rea-sons for following standards but in this paper has cho-

Page 89: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

325

sen instead to “discuss the pitfalls with this develop-ment.” Kjellman raises two points - over the years dif-ferent institutions have developed different ways of collecting and organizing pictures, and secondly this differentiation is challenged by digitization. Specifi-cally the context is one institution where the represen-tation of the collection and the KO tools are in con-flict. Section 7 Section 7 is also a small group of 3 papers covering “Discourse Communities and Knowledge Organiza-tion. Aaron Loeherlein (United States) considers “The Benefits of Participating in a Form` of Life” in the context of “Interpretations of Complex Concepts Among Experts and Novices in Records Manage-ment.” The paper is concerned with the understanding of concepts and language in a specialized discipline. The participants were presented with passages repre-senting complex concepts for ranking. The responses of the experts were then compared with the responses of two groups of novices. The experiment is explained in detail. Findings indicate “that specific wording has a`great effect on use of these complex`concepts.” In the second of these three papers Widad Mustafa El Hadi (France) presented a “Discourse Community Analysis: Sense Construction Versus Non-Sense Con-struction.” It examines the nature of political dis-course of international organizations such as the World Bank, the UN, the European Union, etc. The discussion originates “from a fundamental paradox: how can we use the same descriptive linguistic tools which we use in analyzing the production of sense for the production of non-sense?” with the analysis, the author describes she proceeded to answer the question “How can this paradox be explained?” In the third pa-per Chaomei Chen, Roberto Pinho, et. al. (United States, Brazil) investigated “The Impact of the Sloan Digital Sky Survey on Astronomical Research” look-ing at the influence and “The Role of Culture, Iden-tity, and Imternational Collaboration.”

Texts from the three area were analyzed using text mining systems. The research is described, supported by illustrations and diagrams Section 8 Section 8 contains 8 papers on “Users and Social Con-text.” Not surprising the researchers still consider the user to be an important component in the knowledge organization equation. In “Social Tagging and Com-

munities of Practice” Edward Corrado and Heather Moulaison (Untited States) presented the results of two “Case Studies.” Each study describes how two disparate communities of practice use tagging to dis-seminate information to other members of the com-munity. The first study looks at Code42Lib, a com-munity of users made up largely of librarians and sys-tems developers.The second study looks at tagging on video sharing sites used by French teenagers. Method-ology, results and discussion are provided in both cases. The two studies show similarities in the way so-cial tagging can be used in organization and retrieval. Suggestions are made for future research in this area, including larger data sets. In “Searching with Tags” Margret Kipp addresses the question “Do Tags Help Users Find Things?” The authors experiment with us-ers who were asked to use “a social bookmarking tool specializing in academic articles (CiteULike) and an online journal database (Pubmed)” to see whether us-ers found tags useful in their searches. It was found that, yes they did use the tags as guides to searching an as hyper links. However they used controlled vocabu-laries in the journal database .as well,

Lynne Howarth (Canada) described “Creating Pathways to Memory: Enhancing Life Histories Through Category Clusters.” She discussed the fact that memory plays a part enabling humans to catego-rize knowledge and add new knowledge to these cate-gories. She raises the question “When memory and/or language is impaired, how does such contex-tualizing and categorizing occur?” The paper reports on a preliminary pilot study of “mixed methods re-search examining the sense-making, sorting, categori-zation and recall strategies” of individuals with mild cognitive impairment in the early stages of dementia. Details of the research are given and preliminary find-ings identified. In a paper entitled “Machine Versus Human Clustering of Concepts Across Documents” Christopher Khoo (Singapore) and Shuyan Ou (United Kingdom) discuss “an automated method for clustering terms/concepts from a set of docu-ments on the same topic.” The clustering method that “makes use of a combination of lexical overlap between multiword terms, syntactical restraints, and semantic considerations” is evaluated as is the human clustering approach. The research raises questions “about whether machine-generated clustering can be evaluated by comparing with human clustering.”

In keeping with the main theme of the Conference, Maria López-Huertas presented a paper on “Cultural Impact on Knowledge Representation and Organiza-tion in a Subject Domain.” The aim of this discussion

Page 90: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

326

is to consider how different cultures. impact on sub-ject areas and how this may affect knowledge organi-zation and representation in KOS. A methodology was developed and applied to a sample. Gender studies was chosen as the subject area. Uruguay and Spain are the cultures chosen. To determine differences the areas studied were terminology, categorization and concep-tualization. To gather the data an information analysis of gender studies was carried out on each of the coun-tries. Data from the analysis is presented in the paper and several conclusions drawn.

Inge Alberts (Canada) provided “A Pragmatic Per-spective of E-mail Management Practices in Two Ca-nadian Public Administrations.” The author exam-ines contextual factors involved in the use of e-mail by middle managers in Canadian institutions. The intent is to find a way to alleviate some issues in e-mail management.As a result of the research an E-mail Pragmatic Framework is presented.aiming at the needs of a different group of users, June Abbas (United States) gave a paper entitled “Daddy, How do I Find a Book on Purple Frogs?” that focused on “Representation Issues for Children and Youth.” Subject access tools and controlled vocabularies are examined and Wittegenstein’s Language Games The-ory is presented as a possible framework for con-trolled vocabulary construction. Background is pro-vide and four research questions are given, accompa-nied by a discussion of each. There are interesting findings as well as revealing gaps in the research and literature concerned.

In the last paper in this section José Guimarâes, Juan Fernández-Molina, et. al. (Brazil and Spain) gave a presentation entitled :Ethics in the Knowledge Organization Environment.” to provide “An Over-view of Values and Problems in the LIS Literature.” A premise that library and information science litera-ture has been more focused on information access and dissemination than on ethical aspects of knowl-edge organization and representation lead the au-thors to investigate the existence of ethical values and problems in the field. They analyzed the con-tents of five well known journals in the field over the years 1995 to 2004. They found two complementary dimensions - “one reflecting the respect of diversity and the other concerning the specificity of warrant.” An analysis of the results lead them to reflect on KO education suggesting that “the focus must not only be set on content issues but also social (and conse-quently ethical) issues.” This is because subject ac-cess to information systems is intended to serve di-verse types of users.

Section 9 In the last section, Section 9, there were two papers on the broad topic Systems, Tools and Evalution. Ismail Timimi and Stéphane Chaudiron (France) in-vestigated “Information Filtering as a Knowledge Organization Process,” with emphasis on “Tech-niques and Evaluation.” They begin by showing that information filtering systems may be considered to be “semi-automatic knowledge organization de-vices.” Then they point out how the technical di-mension of the system must be related to the user dimension. Finally they describe an overview of soft-ware called InFile (Information FILtering, Evalua-tion). At the time of writing the software had not been tested but the goal to define the evaluation pro-tocol is in place. In the final paper, “Retrieving Ter-minological Information on the Net” Carles Tebé and Mari-Carmen Marcos (Spain) pose the question “Are Linguistic Tools Still Useful?” The paper is a comparative evaluation of the effectiveness of search engines and linguistic tools in retrieving information from the net. The experiment used student transla-tors. Two scientific texts in English were selected. Participants read the texts and were asked to propose translations and indicate the level of success they thought they had achieved. The search engines were more effective than the linguistic tools. Reference Arsenault, Clément and Tennis, Joseph T., eds. 2008.

Culture and Identity in Knowledge Organization. Proceedings of the Tenth Internationqal ISKO Con-ference 5-8 August 2008, Montreal, Canada. Wűrz-burg: Ergon-Verlag.

Report. IFLA Section on Classification and Indexing At each annual IFLA Conference, its Section on Classification and Indexing mounts a programme which includes two or three papers on aspects of subject analysis, classification and indexing. Topics are germaine to the Section’s interests and activities but are of interest to researchers and practitioners at large. Full texts of the papers can be accessed through the IFLA website.

In 2008, at Quebec, Canada, three papers were pre-sented. Anita`Angjeli (France) and Antoine Isaac (Netherlands) presented a paper entitled “Semantic

Page 91: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

327

Web and Vocabularies Interoperability: An Experi-ment With Illuminations Collections.” The paper de-scribes research carried out through collaboration by the Bibliothèque nationale de France and Koninklijke Bibliotheek (National Library of the Netherlands) under the framework of the Dutch project STITCH (Semantic Interoperability To Access Cultural Heri-tage). It investigates semantic interoperability in rela-tion to searching. It is an attempt to find answers to the question “How can we conduct semantic searches across several digital heritage collections? The experi-ment is carried out on two iconographic collections. The collections are similar in two significant ways, They have been processed differently and the vocabu-laries used to index them are very different. The vo-cabularies are both hierarchical and controlled but have different semantic structures. The experiment began with a “precise analysis” of each vocabulary. Then researchers “studied and implemented mecha-nisms of alignment of the two vocabularies.” Because the models were different a common standard was needed to accomplish the alignment. RDF and funded by the German Federal Ministry of Education and Re-search and the German Research Foundation SKOS were used. The product was a prototype that permits querying “in both databases at the same time through a single database.” Further research is needed.

Somewhat akin the first presentation is a paper en-titled “Cross-concordances: Terminology Mapping and its “Effectiveness for Information Retrieval” by Phillipp Mayr and Vivien Petras (Germany). The topic of this project to develop a “terminology mapping ini-tiative to organize, create and manage cross-concordances” between various controlled vocabular-ies was funded by the German Federal Ministry of Education and Research and the German Research Foundation. At the time of presentation, “64 cross-walks and more than 500,000 relations” had been es-tablished. A major evaluation of the project to test and measure the effectiveness of the vocabulary mappings in an information system was carried out.. This paper reports on the development of the cross-concordances and the evaluation results. The project is ongoing.

The third paper on this programme, by Michael Kreyche (United States) discussed “Subject Headings for the 21st Century: The lesh-cs-org Bilingual Data-base.” In this case the subject headings are in the Span-ish language. The situation is one in which various in-stitutions have developed their own systems were very little effort to collaborate or use the same it. The au-thor has posited the idea that current technology could be used to improve this situation. This project

“demonstrates this concept in a practical way and sug-gests a new model for international cooperation in au-thority control.”

In 2009 at Milan, Italy, two papers were presented. “Introducing FRSAD and Mapping it With SKOS and Other Models.” by Marcia Zeng (United States) and Maja Žumer (Slovenia) introduces the Functional Re-quirements of Subject Authority Data and considers it in relation to other conceptual models. The second paper on this programme by Alberto Cheti, Anna Lu-carelli, and Federica`Paradisi (Italy) dealt with “Sub-ject Indexing in Italy” with a focus on “recent ad-vances and future perspective.” of the Italian library scene. For many years, there has been a tradition of including in the section’s programme a paper on some aspect of the subject analysis methods used in the li-braries of conference’s host country when possible. It-aly has recently published a new cataloging code (RE-CAT). This paper documents recent developments in subject indexing, standards and systems in Italian li-braries. Report. International UDC Seminar 2009 A two-day International Seminar entitled Classifica-tion at a Crossroads: Multiple Directions to Usabil-ity was presented at the Koninklijke Bibliotheek, in The Hague, Netherlands, on October 29th and 30th 2009. The conference itself was preceded by a one day UDC Round Table and policy session of some 20 editors and contributors to the UDC system. Approximately 133 persons attended the Seminar at which two keynote addresses were made and 22 pa-pers on various aspects of classification research we-re presented in 6 sessions.

The Seminar opened with a keynote address by Dagobert Soregel (United States) entitled “Iluminat-ing Chaos: Using Classification to Harness the Web.” He described the Web as a chaotic place in-creasingly complicated by Wikis, blogs and social tagging. His purpose was to present some of the ways in which classification might help the situation. In the first part of his talk he concentrated on the need for structure and provided examples of ways in which classification might provide that structure and aid the users in developing queries. In the second part of his talk he discussed the partial overlapping of ontologies`and other KO systems and introduced a conceptual hub approach to KOS mapping to pro-vide the basis for universal facet-based search of the

Page 92: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

328

Web. His presentation set the stage for the presenta-tions of the first day of the seminar.

In session 1, three papers addressed the topic Classifying Web Resources. Anders Ardo (Sweden) spoke on “Automated Classification: Insights into Benefits, Costs and Lessons Learned.” Ardo recog-nized that automated methods of classification have been around for some time but that exponential growth of the World Wide Web has brought these methods to the forefront of a number of different research areas, including “machine learning (Artifi-cial Intelligence), document clustering (Information Retrieval) and weighted string-matching against con-trolled vocabulary (Library and Information Sci-ence).” In this context he described research carried out in the NetLab at the Lund University Library beginning with the use of “UDC in Nodric WAIS/WWW as early as 1992 and continued re-search in the 1990s testing automatic classification on Engineering` Index classification and DDC.” Similarities and differences of three approaches were discussed and the problems of automatic classifica-tion were recognized. A major issue is the evaluation and comparison related to “the challenge of identify-ing the aboutness” of the documents and the quality of the indexing. However, an effort was made to “discuss general benefits and costs, resulting quality and lessons learned.” Linda Kerr (United Kingdom) described “Intute: From a Distributed Network to a Unified Database, Lessons Learned and Future De-velopments.” Intute (htt:www.intute.ac.uk) is a UK service funded by the Joint Information Systems Committee (JISC) which catalogues the best Inter-net resources for education and research. The system is a unification of seven subject catalogues previously funded separately by the JISC. The paper describes the processes and challenges of integrating the sys-tems into one catalogue using one standard metadata scheme, as well as describing a ‘course and theme’ view onto the resources. It also outlines two projects for evaluating the cost effectiveness of manual and automatic metadata creation. The projects are de-signed to assess the requirements for the most effec-tive retrieval of resources aimed to improve the effi-ciency of metadata generation processes and user satisfaction in retrieval. In the third paper in this ses-sion, Jakob Voss (Germany) addressed the topic “Wikipedia as Knowledge Organization System.” This paper began with a general introduction citing it as system designed for the distribution of knowledge and went on to show how the system could also be used in knowledge organization and how it is con-

nected with other knowledge organization systems. He described how it could be viewed as a controlled vocabulary “built of articles, languages, categories and links.” In doing so, he refers to the possibilities of semantic linking and dynamic concept hierarchies. Since it is not limited to a subject domain he sees Wikipedia as a top level ontology like UDC, DDC, CyC and WordNet. Also he outlines how Wikipe-dia`could be use in subject indexing and how it can be “linked and mapped” to other controlled vocabu-laries using Open Linked Data and Resource De-scription Framework (RDF) technology.

Session 2 focused on Classification and Thesaurus and contained four papers. Emphasis was on the in-tegrated use of classification and a thesaurus. In a paper on thesaurus construction and use, Marlene van Doorn and Katrien Polman (Netherlands) ad-dressed the question “From Classification to The-saurus … and Back? Subject Indexing Tools at the Library of the Afrika-Studiecentrum, Leiden. This is an African Studies thesaurus constructed, from 2001 to 2006, for use in subject indexing and retrieval at the University of Leiden. Word-based , it was devel-oped as a more user-friendly alternative to the use of the UDC codes used at the time. In the construc-tion, the UDC codes were used as a starting point. The UDC codes were ‘translated’ into thesaurus de-scriptors using the basic thesaurus relationships. “In a parallel but separate operation … each UDC code … assigned to an item in the library’s catalogue was subsequently converted into one or more thesaurus descriptors.” Also, the UDC codes, updated, were included in the thesaurus, leaving “open the possibil-ity of linking the thesaurus to different language ver-sions of the UDC MRF in the future.” Victoria Frảncu and Cosmin-Nicolae Sabo (Romania), in a paper entitled “Implementation of a UDC-Based Multilingual Thesaurus in a Library Catalogue: The Case of BiblioPhil” described an approach to im-proving classification based subject access in a library catalogue. The authors represented UDC classifica-tion numbers with thesaurus descriptors and used them in an “automated way.” The system is called BiblioPhil and standard formats used are UNI-MARC for bibliographic and subject authority re-cords with MARCXML support for data transfer. “The verbal equivalents, descriptors and non-descriptors, are used to expand the number of con-cepts and are given in Romanian, English and French.” Ths approach is seen as a time-saver for the indexer and easier access for the user. Similarly, in her paper “Integration of Thesaurus and UDC to

Page 93: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

329

Improve Subject Access: the Hungarian Experience” Agnes Hajdu Barảt (Hungary) explores two possible solutions for integration a thesaurus and a classifica-tion scheme. She reports on two projects, one in which UDC and thesauri are combined under a ho-mogenous framework called MẢTrIkSz`(Hungarian Comprehensive Information Retrieval Language Dictionary) and the other a project of thesaurus construction in the Hungarian National Library (Szếchếnyi). The role of UDC is analyzed, struc-tured and well documented examples are given sup-ported by literature research into UDC theory and use.. The importance of cognition as a basis for con-cept-building is emphasized and some possibilities for integration of thesauri and UDC are identified. The final paper in Session 2, “Providing for Interop-erability Between Thesauri and Classification Schemes in ISO 25964” presented by Stella Dextre Clarke (United Kingdom) discussed of the impor-tance of interoperability across systems in general. The ISO 225964 Standard is being developed to re-place existing thesaurus standards ISO 2788 and ISO 5964 will cover not only construction of thesauri but also interoperability with classification schemes and other types of controlled vocabulary. Clarke ex-plained how this will be handled. Issues that need to be resolved include: the handling of pre-coordinated classes, the provision for classes not enumerated in the scheme but synthesised on demand, and the question of whether (and if so how) to include a data model for each type of KOS. ISO 25964, at the time of presentation, still in the ” initial drafting stage” and Clarke was hoping for useful ideas from this Seminar to aid in solving some of these problems.

Session 3, the final session of the first day, con-tained three papers focusing on Classification Frame-works, Concepts, Structure and Relationships. The first paper “Concepts and Terms in Faceted Classifica-tion” presented by Vanda Broughton (United King-dom) addressed the importance of faceted classifica-tion and its role in the development of modern clas-sification systems. Specifically she noted the impact of faceted classification on recent revisions of UDC. In particular, she identified the removal of com-pound classes from the main UDC tables and the more radical revisions of classes (especially Medicine and Religion). Among the effects are rigorous analy-sis, a clear sense of citation order, and the building of compound classes according to a more logical system of syntax. The result is the formalization of relation-ships in the classifications making them explicit and enabling machine recognition. However, she notes

vocabulary control is not without difficulties, nota-bly in the differences in the way terminologies in the humanities and the sciences should be handled. Yet to be resolved is a balance between the rigour in the structure of the classification and the complexities of natural language – “a fertile field for further re-search.” In his paper entitled “Classification Tran-scends Library Business.” Claudio Gnoli (Italy) ad-dressed the needs for the classification of objects as opposed to bibliographic classification and called for “a broader conception of classification … that can be applied to any knowledge item.” The subject of his research was bagpipes in Northern Italian folklore, using a variety of types of sources, including pub-lished documents, police archives, painting details, museum specimens and ethnographic organizations. For this kind of search he found the use of tradi-tional classification inadequate. Needed were tools from which knowledge items could be “retrieved in-dependently from other topics with which they are combined or the context where they occur.” He de-termines that the concept ‘bagpipes’ should be re-trievable and browsable in combination with other phenomena, discipline or media. Examples were pro-vided using notation from a draft of the Integrative Level Classification. In the third paper “Specifying Intersystem Mapping Relations: Requirements, Strategies and Issues” by Felix Boteram and Jessica Hubrich (Germany) focus was on the improvement and development of intersystem relations at the level of comprehensive international knowledge organiza-tion systems and between typologically different in-dexing languages. Intersystem relations may differ considerably from interconcept relations. From the authors’ experience the characteristics of specific mapping depend largely on the characteristics of the systems they are to be connected with. They exam-ine the differences and peculiarities of mapping sys-tems and first approaches to such a system are made in linkages between Universal Decimal Classification and thesauri.

The second day of the Seminar began with a key-note address on “Open Web Standards and Classifi-cation: Foundations for a Hybrid Approach” by Dan Brickley (Netherlands). Brickley began with a dis-cussion of the current state of knowledge and its in-creasing accessibility through machine-processable formats, the creation of, communally maintained data sets ia the Web and the use of open Web stan-dards “to ensure these works are all cross-referenced and richly linked. New Web standards are bridging the gaps between thesauri, ontologies and data-

Page 94: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

330

bases.” This approach is opening up vast opportuni-ties for collaboration, information sharing and user interface design. The author used examples from television, subject based information gateways and Web 2.0 trends to propose some foundation steps to ensure that “professional subject classification re-mains central to resource discovery, annotation and linking.”

Session 4 included 4 papers on Classification and the Semantic Web. Ceri Binding and Douglas Tudhope (United Kingdom) gave a paper on “Terminology Ser-vices” which addressed the problem that traditional classification and vocabulary control have not solved all the problems of subject access to online resources. The authors note that examination of social book marking sites suggests a need for structuring of Web resources. Moreover, social tagging has terminological problems and the use of controlled vocabularies other than by libraries is sparse. The authors suggest that terminology services should provide solutions to some of these problems. In this paper they related their ex-periences in “creating terminology Web services and associated client interface components for the archae-ology domain in the STAR project (http://hyper media.research.glam.ac.uk/kos/STAR/) and demon-strate how the same principles can be readily adapted to other subject areas (http://hypermedia.research. glam.ac.uk/kos/terminology_services/).” The second paper “Signposting the Crossroads: Terminology Web Services and Classification-Based Interoperability” by Gordon Dunsire and Dennis Nicholson (United Kingdom) focused on the JISC-funded HILT project. The paper dealt specifically with HILT Phase IV which developed pilot Web services for purposes of delivering “machine-readable terminology and cross-terminology mappings data likely to be useful to in-formation services” in enhancing their subject searches or browsing services. The authors described some of the user interface enhancements created by UK in-formation services. HILT currently has 11 subject schemes mounted, including DDC, MeSH and AAT. It also has high level mappings between DDC and some of the other schemes. The last two papers were experimental in nature. A.R.D. Prasad and Devika Madalli (India) presented a paper entitled “Classifica-tory Ontologies.” Their presentation described an ap-plication of Colon Classification, as enunciated by Ranganathan, in developing ontologies. He explored issues in modeling the Colon Classification using the Web standard Simple Knowledge Organization Sys-tem (SKOS). In another application of the SKOS standard, Antoine Isaac (Netherlands) discussed “Us-

ing SKOS in Practice, with Examples from the Classi-fication Domain.” He began with a `brief presentation of the features of the SKOS model and its role with respect to knowledge organization systems and the Semantic Web and identified some practical problems that need to be overcome in using SKOS. Examples were taken from typical classification schemes such as UDC and the author demonstrates what the SKOS model can accomplish, identifying some key features, such as concept coordination, “which are still lacking proper means of representation.” Hints are given as to how SKOS might be extended to overcome these problems, and the author endeavours to answer the question: “To what extent can consensual extensions be devised to use SKOS successfully with classification systems?”

In Session 5, three papers addressed the topic New Approaches to Classification. Veslava Osinska dis-cussed “Visual Analysis of a Classification Scheme” in which she proposed “a novel methodology to visualize a classification scheme.” The Association for Comput-ing Machinery (ACM) Computing Classification Sys-tem (CCS) was used in the demonstration. “The at-tributes, classes, subject descriptors and keywords were processed in a dataset to make a graphic repre-sentation of the documents.” A similarity matrix of co-classes was made and “a spherical surface was cho-sen as the target information space. Classes and documents node locations on the sphere were ob-tained by means of Multidimensional Scaling coordi-nates. By representing the surface on a plane like a map projection, it is possible to analyze the visualiza-tion layout. The author sees this methodology being used in interdisciplinary research fields. Alenka Sau-perl (Slovenia) discussed “UDC and Folksonomies.” Folksonomies are social tagging systems which have come to represent an important part of Web resource discovery. Their main advantage is that they “enable free and unrestricted browsing through information space.” The tags are assigned by users, consequently there is a drawback in that there is no expression of semantic relationships as there is in a thesaurus sup-ported system. Searching is based on coincidence rather than on logical and meaningful connections be-tween related resources. This paper proposes the use of UDC semantic structure to support and comple-ment tag-based browsing of the system. “Two specific questions were investigated: (1) Are terms used as tags in folksonomies included in the UDC? and (2) Which facets of UDC match the characteristics of documents or information objects that are tagged in taxonomies? The universality of UDC was addressed. The results

Page 95: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4 N. J. Williamson. Classification Issues in 2008

331

suggested that UDC supported folksonomies could be used in resource recovery “in particular library por-tals and catalogues.” The final paper in this session, by Phillipe Cousson (France) focused on “UDC as a Non-disciplinary Classification System for a High-School Library.” In this project, the problem ad-dressed was the requirement of students who often need access to interdisciplinary subjects partw of which may be scattered in UDC. It dealt with estab-lishing “a user-friendly systematic collection arrange-ment” resulting from the merging of two collections - a high school library and a college library collection classified by UDC. Interpreting UDC topics as phe-nomena and doing some local indexing, topics diversi-fied by UDC were brought together. In practice it may be necessary to overcome the constraints of a dis-ciplinary classification system.

In the final session, session 6, the seminar ad-dressed Classification in Library Networks. Three papers were presented. Marie Balikovấ (Czech Re-public) spoke on “The role of UDC Classification in the Czech Subject Authority File” She outlined the standardization function of the authority file and explored the role of the UDC as a switching lan-guage between various indexing systems In doing so she addressed compatibility problems such as level of specificity, syntax, and usage of terminology. and

suggested ways in which the difficulties may be overcome using UDC. The subject systems used in-cluded those in libraries, museums, galleries and ar-chives. Darija Rozman (Slovenia) considered “The Practical Value of Classification Summaries in In-formation Management and Integration.” The paper explored the use of short extracts from UDC classi-fication tables to provide a method of broader classes for use in bibliographic` listings, organization of physical documents, presentation of web resources and information integration in network resources. Il-lustrations were drawn from the Slovenian union catalogue COBISS/OPAC. In the final paper, Rose San Segundo (Spain) discussed “Using MARC Clas-sification Format for UDC and Mappings to Other KO Systems for an Enriched Authority File.” The Seminar closed with a brief panel discussion and question and answer session.

This report has been prepared from the abstracts. All papers will be published. The full text of several papers appear 36 pagesin Knowledge Organization vol. 37 nos. 3 and 4 (2010). Shorter versions of some of the papers will appear in the UDC Consortium’s annual publication Extensions and Corrections to the UDC, no. 31. This was the second of these biennial UDC Seminars, the first having been held in 2007. The next one will be held in 2011.

Page 96: KO KNOWLEDGE ORGANIZATION - Ergon-Verlag · The inclusion of subjective and social information from the taggers is very different from the tra-ditional objectivity of indexing and

Knowl. Org. 37(2010)No.4

KNOWLEDGE ORGANIZATION KO Official Quarterly Journal of the International Society for Knowledge Organization ISSN 0943 – 7444

International Journal devoted to Concept Theory, Classification, Indexing and Knowledge Representation Publisher ERGON-Verlag GmbH, Keesburgstr. 11, D-97074 Würzburg Phone: +49 (0)931 280084; FAX +49 (0)931 282872 E-mail: [email protected]; http://www.ergon-verlag.de

Editor-in-chief (Editorial office) Dr. Richard P. SMIRAGLIA (Editor-in-Chief), Palmer School of Library and Information Science, Long Island University, 720 Northern Blvd., Brookville NY 11548 USA. Email: [email protected]

Instructions for Authors Manuscripts should be submitted electronically (in Word,

WordPerfect, or RTF format) in English only to the editor-in-chief and should be accompanied by an indicative abstract of 100 or 200 words. Submissions via email are preferred; submissions will also be accepted via post provided that submissions are ac-companied by a 3.5” diskette encoded in Word, WordPerfect, or RTF format.

A separate title page should include the article title and the au-thor’s name, postal address, and E-mail address, if available. Only the title of the article should appear on the first page of the text. To protect anonymity, the author’s name should not appear on the manuscript, and all references in the body of the text and in foot-notes that might identify the author to the reviewer should be re-moved and cited on a separate page. Articles that do not conform to these specifications will be returned to authors.

Criteria for acceptance will be appropriateness to the field of the journal (see Scope and Aims), taking into account the merit of the contents and presentation. The manuscript should be concise and should conform as much as possible to professional standards of English usage and grammar. Manuscripts are received with the understanding that they have not been previously published, are not being submitted for publication elsewhere, and that if the work received official sponsorship, it has been duly released for publication. Submissions are refereed, and authors will usually be notified within 6 to 10 weeks. Unless specifically requested, manuscripts and illustrations will not be returned.

The text should be structured by numbered subheadings. It should contain an Introduction, giving an overview and stating the purpose, a main body, describing in sufficient detail the materials or methods used and the results or systems developed, and a con-clusion or summary.

Reference citations within the text should have the following form: (author year). For example, (Jones 1990). Specific page numbers are optional, but preferred when applicable, e.g. (Jones 1990, 100). A citation with two authors would read (Jones & Smith, 1990); three or more authors would be: (Jones et al., 1990). When the author is mentioned in the text, only the date and op-tional page number should appear in parenthesis – e.g. According to Jones (1990), …

References should be listed alphabetically by author at the end of the article. Author names should be given as found in the sources (not abbreviated). Journal titles should not be abbreviated. Multiple citations to works by the same author should be listed chronologically and should each include the author’s name. Arti-

cles appearing in the same year should have the following format: “Jones 2005a, Jones 2005b, etc.” Issue numbers are given only when a journal volume is not through-paginated. Examples: Dahlberg, Ingetraut. 1978. A referent-oriented, analytical concept

theory for INTERCONCEPT. International classification 5: 142-51.

Howarth, Lynne C. 2003. Designing a common namespace for searching metadata-enabled knowledge repositories: an inter-national perspective. Cataloging & classification quarterly 37n1/2: 173-85.

Pogorelec, Andrej and Šauperl, Alenka. 2006. The alternative model of classification of belles-lettres in libraries. Knowledge organization 33: 204-14.

Schallier, Wouter. 2004. On the razor’s edge: between local and overall needs in knowledge organization. In McIlwaine, Ia C. ed., Knowledge organization and the global information society: Proceedings of the Eighth International ISKO Conference 13-16 July 2004 London, UK. Advances in knowledge organization 9. Würzburg: Ergon Verlag, pp. 269-74.

Smiraglia, Richard P. 2001. The nature of ‘a work’: implications for the organization of knowledge. Lanham, Md.: Scarecrow.

Smiraglia, Richard P. 2005. Instantiation: Toward a theory. In Vaughan, Liwen, ed. Data, information, and knowledge in a networked world; Annual conference of the Canadian Association for Information Science … London, Ontario, June 2-4 2005. Available http://www.cais-acsi.ca/2005proceedings.htm. Footnotes are not permitted; all narration should be included

in the text of the article. Illustrations should be kept to a necessary minimum and

should be submitted electronically when possible. Photographs (including color and half-tone) should be scanned with a mini-mum resolution of 600 dpi and saved as .tif files (Tagged Image File Format preferred). Tables and figures should be embedded within the document or, alternatively, saved as separate files with clear instructions indicating their placement in the text. Tables should contain a number and title at the top, and all columns and rows should have headings. All illustrations should be cited in the text as Figure 1, Figure 2, etc. or Table 1, Table 2, etc. Illustrations submitted in hard copy only should be marked to indicate their placement in the text.

Upon acceptance of a manuscript for publication, authors must provide a wallet-size photo and a one-paragraph biographical sketch. The photograph should be scanned with a minimum reso-lution of 600 dpi and saved as a .tif file (Tagged Image File For-mat).

Advertising Responsible for advertising: ERGON-Verlag GmbH, Keesburg- str. 11, 97074 Würzburg (Germany).

© 2010 by ERGON-Verlag GmbH. All Rights reserved. KO is published quarterly by ERGON-Verlag GmbH. The price is € 129,00/ann. including airmail delivery.