Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept....

Post on 02-Jan-2016

218 views 3 download

Tags:

Transcript of Subject access to information in Web 2.0 environments Sonja Špiranec, PhD Assistant Professor Dept....

Subject access to information in Web 2.0 environments

Sonja Špiranec, PhDAssistant Professor

Dept. of Information Sciences,Zagreb, Croatia

IP LibCMASS Sofia 2011Contract Contract № 2011-№ 2011-ERAERA--IP-IP-77

Sofia, 04.-17. September, 2011

Content• context: Web 2.0 and libraries

• KOS (knowledge organization systems)

• folksonomies– strengths, weakness– research, studies– tag/folksonomies improving techniques

• seminar/discussion /group work

Context• Web 2.0 influences and transforms

information landscapes

• addresses issues of generating and using information, organization and access to information

• the influence on the LIS sector is natural

Web 2.0 in library context• enthusiasm, excitement

• do we really have a win-win situation?

Added-value for users?

• personalisation of services, interactivity• users will appreciate the possibility to “privatize”

their space on the library web site• growing expectation from library users that they

will be able to interact with the catalogue, not just passively receive information delivered by the cataloguers

• strengthens connection between library and their user.

• psychologically the connection is stronger if they participate, and create something by themselves

The downside: Quality of information, quality of intelectual access to

information• what if the content is inaproppriate, of low

quality

• consistency in the organization of information

• providing acces to materials

• ...

Short group discussion

• discuss within your group the meaning of the term Library 2.0.

• What does it denote for You?

• Strengths? Weakness?

A short memento:

KOS in libraries

KOS in libraries

• To organize is to: give orderly structure to; frame and put into working order

• libraries are in the business of organizing information, namely documents, from their beginnings

• to this end languages for document representation and organization where used

• information professionals have developed indexing languages;(set of terms used to represent topics or features of documents, and the rules for combining or using those terms)

• since the 1960, much research has been done to test whether controlled or uncontrolled indexing languages provide better retrieval results

• E. Svenonius suggested that “free text and controlled vocabulary terms each contribute to precision and recall

Controlled vs. natural languages

• control is exercised over which terms are used and what are the relationships between the terms

• terms are standardised and similar or related resources are collocated for ease of discovery by the user (Lancaster, 1972).

Features of controlled languages

• control the use of synonyms (and near-synonyms) by establishing a single form of the term. This ensures that indexers apply the same terms to describe the same or similar concepts, e.g. “car”, “automobile”, “motorcar”, or “motor vehicle”, etc.).

• discriminates between homonyms, allowing the indexer to resolve clashes of meaning that arise when several terms assume the same form but assume distinct meanings (e.g. jaguar)

• controls lexical anomalies by minimising any superfluous vocabulary or grammatical variations that could potentially create further noise in the users' results set spelling variants, singular and plural forms, verb tenses

• it unites similar terms, or systematically refers the indexer to closely related alternatives, in order to ensure that similar or related resources are collocated.

• This is normally achieved by displaying the “genus/species” relationship between terms within some form of semantic hierarchical structure,

negative features of controlled languages:

• investments of time, money, training, expertise and professional intervention

• current schemes are incapable of reflecting the transient nature of knowledge and therefore the demands of the modern information user.

Folksonomies

Why folksonomies?

• in the context of Web 2.0 developments• the growth of user-generated context increases

demand for suitable methods and facilities of storage and retrieval

• companies (and individuals) have developed collaborative inf. services: social bookmarking, photosharing, videosharing– enable users to store and publish information, but

also to index, organize it– via tags; the totality of tags >>> folksonomy

• the magic?– users do it by themselves– no guidance, no structure, no

rules, no fields...

• folksonomies turned our professional views, standpoints upside-down

Folksonomies vs. KOS

• the perspective of folksonomies on KOS is an altered one

• instead of choosing criterions, subject departments, classes and filling them with resources the point of departure for folksonomies are resources

• folksonomies employ a resource-centric approach (instead a criteria-centric)

Weller, K. & Peters, I. (2007). Reconsidering relationships for knowledge representation. In: Proceedings of I-Know ‚07, Graz, September 5-7 (pp. 501-504).

two new players in indexingand knowledge representation

A folksonomy represents simultaneously some of the best and worst in the

organization of information.

Mathes, 2005.

Proposed alternative terms:

• democratic indexing• social classification system• collaborative classification system• ethnoclassification• grassroots taxonomy• user-generated metadata• folk wisdom• folksabulary• mob indexing• tagsonomy• metadata for the masses• lightweight knowledge representation

Folskonomies: basic features

“...a conflation of the worlds ‘folk’ and ‘taxonomy’ used to refer to an informal, organic assemblage of related terminology” (Vander Wal)

• organic structures that mirror the understanding users have of resources

• nothing is predetermined• develop and advance with usage• progress based on collective intelligence (the group is

smarter than the individual)• the collective creation of tags ought to be more rich

semantically than with controlled vocabulary (several opinions and perspectives)

• statistical consensus

Terminology

• the basic unit of folksonomy: tag• tags are user-generated keywords – have been

suggested as a lightweight way of enhancing descriptions of on-line information resources

• social tagging: refers to the practice of publicly labelling or categorizing resources in a shared, on-line environment.

• The resulting assemblage of tags form a “folksonomy”

Exercise

• Basic features of folksonomies– problems– strengths

• Examples:– Connotea tag cloud– Amazon tag cloud– LibraryThing tag cloud

Folksonomies: limitations

• Ambiguity of the tags (emerge as users apply the same tag in different ways)

• the lack of synonym control can lead to different tags being used for the same concept, precluding collocation

• Spaces, Multiple Words • different word forms, plural and singular, inconsistent

and ambiguous assignation of tags• the user proclivity towards exhaustive tags (e.g.

“marketing”, “technology”), popular tags and personal tags (e.g. “me”, “to read”) further compromises precision and contributes to high levels of recall and noise

• Can folksonomies collapse due to rising number of users, tags or resources?

Folsksonomies: strengths

• browsing (finding things unexpectedly)

• up-to-date (can more easily accommodate new terms and concepts than heavily controlled vocabularies)

• reflects the vocabulary of users

• cheaper

• feasibility for large data collections

Folksonomies: strenghts

• cataloguers or indexers will attempt to keep similar or related concepts together– Shirky argues that it is impossible to “collapse” such

terms without loosing the essence of what each term conceptually denotes. He therefore states that it is impossible to disentangle terms such as “queer”, “gay” or “homosexual”

• in traditional controlled vocabulary-based indexing, all terms assigned to a document carry more or less equal weight

• in social tagging, certain tags will be much popular than others

Considering new approaches to knowledge organization in library

contexts

LC Working Group

• Report of The Library of Congress Working Group on the Future of Bibliographic Control

• the tightly controlled consistency designed into library standards thus far is unlikely to be realized or sustained in the future, even within the local environment.

• Integrate User-Contributed Data into Library Catalogs

• develop methods to guide user tagging through techniques that suggest entry vocabulary (e.g., term completion, tag clouds).

user tags

indexing terms

question:

• can tagging solve the problem of organizing knowledge– strengths– today more important – reflect the spirit of modern

time (new problems, new issues, subjects, innovations, research fields

• decentralization• communities of practice• multiperspective (before one viewpoint was OK

because collections had local character) • user-friendly

• defining the term “tag” on LibraryThing

“Tags are a simple way to categorize books according to how you think of them, not how some official librarian does.”

Research, studies

• studies concerning tag distributions, tag categories, users’ taging behavior, comparison between tags and subject headings...

Tag distributions

• determined that in folksonomies the distribution of tags on resource level resembles a Power Law curve; a few tags are very popular, but the majority are used infrequently

• the frequently used are extremely general and make the vaguest allusions to the content

Users tagging behavior

• users pick out particular aspects that are important/interesting to them and express them via tag

• it’s not a representation of the whole entity

Tag categories

• linguistic level; occurence of regularities regarding certain genres or forms of tags

• Golder and Huberman (2006):– “topics”– “type”: format, e.g. blog, article– adjectives: reflect the author’s opinion: funny, boring– “self reference” relation between the tagger and resource: my

stuff– “task organizing”: to read, to do– refining tags, tags which describe antother tag in detail

• categories depend on the social bookmarking system; e.g. on Flickr: geographic tags, time/events

Tag redundancy:

• 19% tags reflect the title (added value?)

• the majority of compounded tags is used only once

Basic-level tags

• Basic-level theory states that terms can be cognitively structured in a hierarchical system with different level of specificity

• The basic level often contains the one term which is the most demonstrative, but not specific– Furniture – chair (basic level) –kitchen chair

• Basic level terms occur much more frequently in natural language

• Heavily used in folksonomies

Folksonomy data set: how to improve it?

• no good guys vs. bad guys• critiques are mainly based on comparisons of folksonomies with

traditional methods of knowledge organization systems (thesauri, classification systems etc.) and professional indexing techniques.

• boundaries between structured KOS and folksonomies are not at all solid but rather blurred.

– folksonomies can adopt some of the principle guidelines available for traditional KOS and may gradually be enriched with some elements of vocabulary control and semantics.

– folksonomies provide a useful basis for the stepwise creation of semantically richer KOS and for the refinement of existing classifications, thesauri or ontologies

• gradual refinement of folksonomy tags and a stepwise application of additional structure to folksonomies is a promising approach

• some platforms already provide different features to actually manipulate, revise and edit folksonomy tags

Flickr’s cluster’s

Theoretical approaches for structural enhancement of folksonomies

• “emergent semantics“, "semantic upgrades" or "semantic enrichments”...

• Tag gardening

• processes of manipulating and re-engineering folksonomy tags in order to make them more productive and effective

• on top of existing folksonomies (don’t inhibit the user)

• No clear picture of folksonomies has emerged yet– Strong tool with major flaws

• Great enthusiasm of users to participate in indexing

• Strengthen the positive effects and minimize the negative ones

• Additional method for knowledge organization which complement traditional controlled vocabularies

Conclusion

Literature• Golder, S.A. Huberman, B.A. Usage Patterns of Collaborative

Tagging Systems. // Journal of Information Science 32, 2(2006), 198–208

• Peters, I. Folksonomies: indexing and retrieval in Web 2.0. Berlin : De Gruyter/Saur, c2009.

• Rolla, P.J. User Tags versus Subject Headings: Can User-Supplied Data Improve Subject Access to Library Collections? // Library Resources & Technical Services, 53(3), 2009, 174-184.

• Spiteri, L.F. Structure and Form of Folksonomy Tags:The Road to the Public Library Catalogue. // Webology 4, 2(2007).

• Svenonius, E. The intellectual foundation of information organization. Cambridge, Mass.; London : The MIT Press, 2000.

• Yi, K. Chan, L.M. Linking folksonomy to Library of Congress subject headings: an exploratory study. // Journal of Documentation, 65, 6(2009), 872 – 900.