Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments,...

26
Taxonomies, folksonomies, ontologies? What are they and how do they support information retrieval? Madely du Preez INTRODUCTION Until fairly recently thesauri, lists of subject headings and taxonomies were the most important controlled vocabularies that were used in formal information retrieval systems. The development of more sophisticated information retrieval tools triggered debates on the usefulness of controlled vocabularies and classification systems. Despite these debates, people involved in information organisation and retrieval have become acutely aware of the value these tools have for information retrieval. When the World Wide Web (WWW) first started, only a few experts were able to use it to distribute information while the majority of users dealt with the WWW as consumers (Stock & Stock, 2013, p. 611). However, technological advances and the development of Web 2.0 or the Semantic Web – that is a web of linked data - has made it possible for digital libraries or digital repositories to provide interfaces that allow endless information access (Hwang, Yang, & Ting, 2010, p. 297). These digital repositories or libraries still use thesauri, ontologies and taxonomies to organise their resources and to assist users in locating the information they require (Garcia, Martin-Moncunill, Sanchez-Alonso, & Garcia, 2014, p. 285). Apart from the availability of numerous digital libraries and online databases, Web 2.0 has also made social media possible and terms like blogging, wikis, social tagging, and cloud computing have become part of our communication technology related vocabularies. Furthermore, it seems as if social media sites such as Facebook and YouTube as well social catalogues such as LibraryThing.com and Amazon also contributed to the development of folksonomies to support information retrieval.

Transcript of Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments,...

Page 1: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

Taxonomies, folksonomies, ontologies? What are they and how do they support information retrieval?

Madely du Preez

INTRODUCTIONUntil fairly recently thesauri, lists of subject headings and taxonomies were the most important controlled vocabularies that were used in formal information retrieval systems. The development of more sophisticated information retrieval tools triggered debates on the usefulness of controlled vocabularies and classification systems. Despite these debates, people involved in information organisation and retrieval have become acutely aware of the value these tools have for information retrieval.

When the World Wide Web (WWW) first started, only a few experts were able to use it to distribute information while the majority of users dealt with the WWW as consumers (Stock & Stock, 2013, p. 611). However, technological advances and the development of Web 2.0 or the Semantic Web – that is a web of linked data - has made it possible for digital libraries or digital repositories to provide interfaces that allow endless information access (Hwang, Yang, & Ting, 2010, p. 297). These digital repositories or libraries still use thesauri, ontologies and taxonomies to organise their resources and to assist users in locating the information they require (Garcia, Martin-Moncunill, Sanchez-Alonso, & Garcia, 2014, p. 285).

Apart from the availability of numerous digital libraries and online databases, Web 2.0 has also made social media possible and terms like blogging, wikis, social tagging, and cloud computing have become part of our communication technology related vocabularies. Furthermore, it seems as if social media sites such as Facebook and YouTube as well social catalogues such as LibraryThing.com and Amazon also contributed to the development of folksonomies to support information retrieval.

The purpose of this presentation is to learn a bit more about taxonomies, folksonomies, ontologies and thesauri are and their roles in information retrieval.

CONTROLLED VOCABULARIESTwo indexing languages are generally used when indexing or searching for information in retrieval systems such as databases and the Internet. These are natural language and controlled vocabularies. Fourie and Burger (2005, p. 54) explain that when the indexer or the user uses the words of the author or words that are generally used within a subject field to index a document or search a database, they are using natural language terms. However, when they select words from a list of possible indexing terms, they are using a controlled vocabulary.

When viewed from an information organisation point of view, the objective of controlled vocabularies is to ensure consistency in indexing, tagging or categorising (Hedden H. , 2010, p. 135). However, when viewed from an information retrieval point of view, controlled vocabularies also

Page 2: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

support users when they search for information. This means that the user does not need to think of all the possible terms (including spelling variations and synonyms) that would retrieve the required information. By using terms from a specific information system’s controlled vocabulary, the users are ensured of retrieving some information that is relevant to their search.

Ontologies, thesauri, taxonomies and folksonomies are four examples of controlled vocabularies that are relevant to this presentation. The following discussion will therefore focus on learning more about these controlled vocabularies and to discover how they facilitate information retrieval in an online environment.

THESAURIThesauri are described by Mamassion (2010, p. 103) as “controlled vocabularies that contains index terms that are used to describe the contents of a document”. Currás (2010, p. 72) extends this definition when he defines a thesaurus as a “controlled and dynamic vocabulary of terms that share semantic and generic relationships, and that are applied in a particular field of knowledge.” From a functional point of view, he views a thesaurus as an instrument for the control of terminology, used to transmit, in a more strict language, the language used in documents.”

Based on these definitions, Currás (2010, p. 72) identified certain conditions that a thesaurus must fulfil:

It must be a specialised language Must be normalised in a post-controlled process It must be possible to convert the indexing terms in a thesaurus into keywords which could

assist in determining the theme of the document The keywords have a hierarchical relationship The indexing languages are terminological They must allow for the introduction or suppression of terms so that the thesaurus can be

updated They must convert natural language into a controlled vocabulary They must serve as a nexus of union between the document and the user.

The conditions identified by Currás (2010) is supported by Burger (2005, p. 160) when she identifies the following characteristics of thesauri:

formally structured (e.g. alphabetically or hierarchically) indicate a variety of indexing terms. These terms are either preferred terms (UF – used for)

or non-preferred terms (U – Use) in the thesaurus refer indexers/users from the terms not suitable for indexing to the preferred terms in the

thesaurus indicate relationships with other terms by means of special codes such as BT (broader

terms), NT (narrower terms) and RT (related terms).

The hierarchical relationships among terms in a thesaurus are based on a logical progression from broader terms to narrower terms. For example:

BT TransportNT Road transport

Page 3: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

Air transportShipping

In turn, the narrower terms can also be broader terms for other terms. For example:

BT Road transportNT Buses

Motor carsTrucks

In addition to having a hierarchical relationship with each other, thesaurus terms can also have an “equal” relationship with other terms. In the above example, buses, motor cars and trucks and three different types of motorised vehicles that are used in road transportation. Although all three terms have a hierarchical relationship with road transport, they are also related terms as they all three appear on the same hierarchical level.

Furthermore, not all terms that are listed in a thesaurus have some form of relationship with other terms.

Another feature of thesauri which has not been previously mentioned is the inclusion of scope notes. Not all terms in a thesaurus includes a scope note, but they are used to clarify certain terms in the thesaurus which could be misinterpreted or used wrongly.

Thesauri, according to Currás (2010, p. 74)started with the increase in themes that emerged from the literature and where neither hierarchical nor faceted systems could provide adequate responses to the demand for information. What everybody thought was a new idea, was in fact something already in use. The earliest thesauri emerged from the use of concepts taken from existing documents that were not necessarily related to each other. The only problem was, no-one knew for certain that they could be applied to information processes. But, with the introduction of computers, the first indexes were developed and the first formally constructed thesauri began to appear in the 1960s and two different classes of thesauri developed: general thesauri and specialised thesauri. The Eric thesaurus and the ISAP thesaurus are examples of general thesauri. These thesauri are multidisciplinary. Thesauri that were developed to describe specialised collections such as an art collection, or historical photograph collection can also cover more than one discipline at different levels of importance.

Currás (2010, p. 26) uses the following figure to illustrate how thesauri developed, first from information and then through the field of computing.

Page 4: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

However, thesauri are not only used to describe the contents of a document. They are also used information retrieval systems’ users to identify terms they could use to retrieve information that is relevant to their individual information needs. For example, some multidisciplinary databases such as ISAP (Index for Southern African Periodicals) allow users to search the database while using keywords (natural language terms) and thesaurus terms. In such a database, the keywords are the most specific terms and the thesaurus terms are the broader terms. The purpose of the thesauri in such databases is to categorise the information sources according to subject and they therefore narrow an information search. The following record from the ISAP database illustrates the use of keywords and thesaurus terms.

RE 1528LN AfrikaansTI Mikrorekenaarmatige persoonlike inligtingstelsels.AU Burger, M.AB Refers to the general characteristics and application of personal information systems on microcomputer. Aims to stimulate professionals. Briefly discusses indexing, computer hardware and the advantages of a computerised system.SO Mousaion series 3VO 8 IS1 MO Sep YE 1990PA 32-47SN 0027-2639THInformation services

Page 5: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

KEComputerised information retrievalIndexingInformation managementInformation systemsMicrocomputersLO 10/12/2013DD 10/12/2013PN P13194

Since thesauri are generally developed for specific information systems, thesauri not only narrow an information search, but also ensure that some information will be retrieved when terms listed in the thesaurus are used to search for information in the specific information system for which the thesaurus was developed. Since this is an indexing and abstracting database, users need to request the relevant information from the National Library, or repeat the information search in a different full text such as the SA E-Publications database which is available through Sabinet.

The Centre for African Studies’s (University of Leiden) thesaurus (http://thesaurus.ascleiden.nl/) is a good example of a thesaurus which was developed to facilitate the organisation and retrieval of documents that form part of Centre’s collection. The thesaurus has a search box as well as an alphabetical search option. I clicked on “R” and the following thesaurus entry was revealed.

 African Studies Thesaurus

rights of the accused         Search catalogue

Scope noteA class of rights that apply to a person in the time period between when they are formally accused of a crime and when they are either convicted or acquitted, generally based on the maxim of 'innocent until proven guilty' and including the right to a fair trial, the right to counsel and the right to communicate.

Used fordefendants' rightshabeas corpus

Broader termscivil and political rights

Related termslegal procedureoffenderspresumption of innocence

Subject category10.05 CRIMINAL LAW, CRIMINAL PROCEDURE

This thesaurus entry shows an example of a scope note, the terms the entry term is used for, and a broader as well as some related terms. It also places the term within a hierarchical subject category

Page 6: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

in the library collection. By clicking on the “search catalogue” link, I could retrieve all the documents that were linked to this thesaurus entry in the library catalogue.

Thesauri are not necessarily used to describe the contents of written documents. In her new book, Titangos (2013) described the Santa Cruz Public Library, California’s History Photograph Project (LHPP) which aimed at making more than 979 historical photographs in the Santa Cruz Library collection available online. To organise the photographs in this new digital collection, a Thesaurus for Graphic Materials (TGM) team was appointed. The team came up with a list of terms that could be used for this purpose, but they soon found that more information was needed to make the photographs accessible. One photograph, no. 0125, had three words written on its face: “Laurel bull donkey.” No thesaurus term or subject heading term could adequately describe this photograph. It was then decided to compose a footnote to explain the photograph in more detail. The following illustration shows the database entry for the “Laurel bull donkey”. Note the assigned keyword resembles a subject heading term. The descriptions that are added to the entries for photographs in this database, now supports the retrieval of these entities in that they provide the user with more information on the photograph. Based on the additional information, the user can then decide on whether the retrieved photograph is relevant to the information search or not.

Title: 0125Summary: Laurel bull donkey.Keywords: Industries--LumberDescription: Laurel bull donkey. A donkey was an engine used to hoist logs onto flatcars bound for the mill. Cables could be extended on spools run through the woods for a distance of up to several miles.Date: 1890'sPlace: LaurelSources of Information: Notes on back of photo; Article on this website, see link belowRelated Articles:o Felling the Giants, [Lumbering in the Mountains] o Industrial Development: Lumber; Lime; Fishing

Page 7: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

aa-001

Ansley Kullman Salz holding a sample of the leather made by his company, the A. K. Salz Tannery

in Santa Cruz, California.Subject Headings and

Keywords:Industries--Tanneries--Salz Tannery,Portraits--Men, Leather

Industry,Leather Garments, Leather Goods,Hides and Skins, Hats, Clothing

and Dress, Eyeglasses

aa-002

The tanoak used to dye leather was stored in drying sheds across

Highway 9 from the main Salz Tannery complex.

Subject Headings and Keywords:Industries--Tanneries--Salz

Tannery,Leather Industry, Drying Sheds, Barns,Storage

Facilities, Smokestacks,Equipment

E

The above two photographs are further examples of how additional information was provided to describe the photograph and to illustrate the use of subject headings and keywords to describe the photographs.

ONTOLOGIESCurrás (2010, pp. 20-22) cites a number of wide ranging definitions for ontologies which describe ontologies as catalogues, a means to capture human knowledge based on common sense; as groups of concepts; a general framework which can display coherent organisation; the marriage of symbols used in natural language and the entities that they represent in the real world. However, he found the definition by Marco the most comprehensible: an ontology “is the systematic description of a specific domain in accordance with the entities and processes that allow the description of ‘all’ things and processes”. This definition is supported by Hedden (2010, p. 12) when she describes ontologies as “a level of abstraction of data models, analogous to hierarchical and relational models.”

The following figure is used by Currás (2010, p. 23) to illustrate the differences and similarities between ontologies and thesauri. An analysis of the figure shows that the main difference lies in the structure and the existing relationships among terms. Thesaurus terms are hierarchically arranged whereas the arrangement of terms in ontologies take certain characteristics and properties of the terms in consideration.

Page 8: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

Hedden (2010, p. 12) explains that there can be any number of domain-specific types of relationship pairs. The example she gives includes owns/belongs to; produces/is produced by, and has members/is a member of. Currás (2010, p. 22) uses the MICROKOSMOS system as an example to show how principal and subordinate classes are established:

Objectso Physical ordero Mental ordero Social order

Eventso Physical ordero Mental ordero Social order

Propertieso Attributes (objects or events)o Relationships (with each other)

The structure of ontologies is therefore aimed at providing an order and a relation of terms which are based on certain characteristics and properties (Curras, 2010, p. 23).

Currás observed that ontologies are useful when they are applied to translation machines since they serve as a nexus between the words of intervening languages in order to find similarities or equivalencies. This view is supported by Hedden (2010, p. 14) when she observes that ontologies are becoming very important in semantic search engine deployment in specialised industries.

Page 9: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

TAXONOMIESThe concept “taxonomy” means the science of classifying things. Hedden (2010, pp. 137-138) explains that the concept was traditionally used for the classification of plants and animals, such as the Linnaean classification system. However, it has lately become the preferred term for any hierarchical classification or categorisation system. According to Hedden (2010, p. 138), the main difference between a thesaurus and a taxonomy lies in the hierarchical relationships among terms. For example, a given term in a thesaurus may or may not have a broader/narrower term relationship with another term whereas all terms in taxonomies belong to a single large hierarchy that encompasses all concepts of a certain class, category, or aspect. Furthermore, terms in a thesaurus can have an equal relationship with other terms, e.g. dog breeds and cat breeds. Considering taxonomies’ strict hierarchical structure, there can be no equal relationships in taxonomies.

Hedden (2010, p. 138) explains that taxonomies’ structures are sometimes referred to as “trees” and the terms that are included in the taxonomy as “nodes”.

A Google search for taxonomies, revealed the existence of taxonomies in different subject fields. Examples of subject related taxonomies that were described in Wikipedia include science, education, business and economics, information science and safety.

ScienceTwo scientific taxonomies that are discussed in Wikipedia include the Linnaus classification system and a more modern biological classification system which is based on the Linnaus system: .

(http://en.wikipedia.org/wiki/File:Linnaeus_-_Regnum_Animale_(1735).png)

Page 10: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

Biological_classification_L_Pengo_vflip.svg

Wikipedia also grouped folk taxonomies under science taxomies. These taxonomies are vernacular naming systems. They represent the way in which people describe and organise their natural surroundings and are generated from social knowledge and are used in everyday speech.

EducationIn the field of education, Bloom’s Taxonomy of learning in action seems to be an important taxonomy. His taxonomy of “learning in action” standardises learning objectives in an educational environment. These are then subdivided into three “domains”. That is the cognitive, affective and psychomotor domains. Through this division, Bloom had hoped to motivate educators to focus on all three domains in their teaching. There seems to be a number of depictions based on his taxonomy. I found his “wheel” in which he depicted “learning in action” quite different and interesting. In his wheel, he listed a number of verbs and grouped them according to different types of assessment.

Page 11: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

http://en.wikipedia.org/wiki/Bloom%27s_Taxonomy

Taxonomies in business and economics

In the business and economics field, corporate taxonomies are increasingly being used in business information management systems, especially in their content management and knowledge management systems. According to Wikipedia (2014), corporate taxonomies reflect the hierarchical classification of entities of interest of an enterprise and they are used to classify documents, products, processes, knowledge fields and human groups. Hedden (2010, p. 138) noted that these taxonomies may or may not have the hierarchical structure that is generally associated with traditional taxonomies such as the science taxonomies that were discussed above. She also found that the taxonomies that are found on public websites should not have more than three or four levels. The reasons she gives is that users are unfamiliar with a site typically only have the patience to search through that many levels.

SafetySafety taxonomies are standardised sets of terminologies which are used by safety and health care workers. These taxonomies aim at standardising the terminology in these fields to avoid confusion among safety and health care workers. Wikipedia (2014) indicates that there exists numerous safety taxonomies which analyse and classify human error and accident causes. One example which is discussed in depth in Wikipedia is the Human Factors Analysis and Classification System (HFACS). This system identifies the human causes of an accident and provides a tool to assist in the investigation process and is used in accident prevention training. Four different levels of analysis is reflected in the HFACS taxonomy: unsafe acts, preconditions for unsafe acts; unsafe supervision; and organisational influences. Each of these levels are then further subdivided into more categories.

Information and computer scienceThis is the last category of subject related taxonomy that was identified in Wikipedia that will be discussed. Of the four different taxonomies in this category, I found “Taxonomies for search engines” to be extremely relevant to this discussion. Currás (2010, pp. 46-48) refers to taxonomies as virtual taxonomies and cybernetic taxonomies. He describes a virtual taxonomy as an “intelligent

Page 12: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

agent or an intelligent meta-search engine to be used for web pages”. As explained by Vicient, Sànchez and Moreno (Vicient, Sanchez, & Moreno, 2013), taxonomies, thesauri and concept hierarchies are crucial components of any information retrieval system. Currás (2010, p. 47) uses the following figure to illustrate taxonomies in computing.

Hedden (2010, p. 203) identified two different ways in which controlled vocabularies or taxonomies support Internet searches. These are

through nonpreferred terms or synonym rings, or as browsable taxonomies.

As explained by Hedden (2010, pp. 201-202), ordinary search engines generally don’t make use of taxonomies as it is basically impossible to create and maintain taxonomies that would organise all the information that is available on the web. However, the search engine software for single sites, such web directories may incorporate taxonomies. The directory for Microbial Life Educational Resources (http://serc.carleton.edu/microbelife/resources/index.html) is an example of a site which uses a browsable taxonomy.

Page 13: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

Refine the ResultsSubject: Biology

287 matches General/Other Astrobiology   92 matches Biogeochemistry   125 matches Diversity   141 matches Ecology   613 matches Evolution   211 matches Microbiology   814 matches Molecular Biology   174 matches

Resource Type Activities   123 matches Assessments   12 matches Course Information   20 matches Datasets and Tools   29 matches Audio/Visual   154 matches Computer Applications   18 matches Pedagogic Resources   61 matches Scientific Resources   700 matches Biographical Resources   4 matches Policy Resources   14 matches

Extreme Environments Alkaline   54 matches Acidic   56 matches Extremely Cold   53 matches Extremely Hot   116 matches Hypersaline   63 matches High Pressure   57 matches High Radiation   24 matches Anhydrous   32 matches Anoxic   66 matches Altered by Humans   66 matches

Ocean Environments Coastal and Estuarine   170 matches Shallow Sea Floor/Continental Shelf   30 matches

This taxonomy uses broad categories to organise website information and thereafter lists taxonomy terms that are used within each category. A hypertext link indicating the number of matches that are available for the specific term is included. The number of matches is hypertexts links and clicking on them transports the user to the actual sources of information.

The synonym rings for each concept in a controlled vocabulary include terms that are likely to be searched and terms that are likely to appear in the content. Hedden (2010, p. 203) explains that synonym rings do not display the taxonomy terms in the user interface, whereas taxonomies that do distinguish between preferred and nonpreferred terms usually display the preferred terms. Hedden (2010, p. 204) provides the following example of how a synonym ring supports an information search for a concept:

Users might enter:

Oil industry

Synonym ring contains all:Oil industryOil & gas industryOil and gas industry

Text may contain:

Page 14: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

Oil & gas industryOil & gas industriesPetroleum industry

Oil & gas industriesOil and gas industriesPetroleum industryOil companiesBig oilOil producersPetroleum companies

Oil and gas industryOil companiesBig oilOil producers

However, due to modern automated indexing technologies, search engines have become more sophisticated. According to Hedden (2010, p. 205), automated indexing technologies generally follow two basic approaches: information extraction and auto-categorisation.

Information extraction, also known as web mining, is a technique that is used by search engine crawlers to collect data from sources and add them to the search engines’ indexes (Web Data Mining.net 2014). The data is then typically collected from the metadata in the websites’ headers (these are not visible to the users) and other hyperlinks within a website. The data mining software focuses on identifying which key names, concepts, and data in the metadata and text of the documents are significant in comparison with those with a mere passing mention (Hedden H. , 2010, p. 205). Hedden (2010, p. 205) compares this process to book indexing as it, according to her, seeks to identify significant names and concepts within chunks of texts. She also notes that data extraction or data mining does not necessarily use a taxonomy but when it does, it usually uses a simply synonym ring.

Auto-categorisation on the other hand, seeks to categorise each document based on what it is fundamentally about. In order to do so, Rouse (2005) noted that web mining software uses data patterns from the information that was retrieved for a specific query to identify similar data patterns in other sources which could also be relevant to the search. The data patterns identified in this manner, then form the parameters which are then applied to new information searches. By continuously “learning” from new data patterns that can be linked to data patterns already identified in its databases, search engines build their own web taxonomies. Hedden (2010, p. 205) compares this process of auto-categorisation to database indexing where one or more taxonomy or thesaurus terms are assigned to describe the subject content of a document.

Similarities and differences between taxonomies and thesauri

Currás (2010, pp. 51-52) identified some similarities and differences between taxonomies and thesauri. According to him, the most obvious similarities lies in the fact that they are both controlled vocabularies which are used in an information retrieval system. They are also used for the systematisation of knowledge using scientific, logical and coherent methods that are established through a set of predetermined rules. Both thesauri and taxonomies are pre- and post-coordinate systems made up of terms that are derived from documents.

The first and primary differences between thesauri and taxonomies that were identified by Currás (2010, pp. 51,53) is that information technologies are almost exclusively used to develop taxonomies whereas thesauri can be constructed manually. In instances where computer programs are used to construct thesauri, an in-depth knowledge of the mechanisms, techniques, theory and practice of thesaurus construction is still needed. These differences can also be seen in the use that are made of

Page 15: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

thesauri and taxonomies. Thesauri as used by information specialists whereas taxonomies are almost exclusively used by computer specialists working within the business field.

The existing differences and similarities between taxonomies and thesauri are illustrated in the following figure (Curras, 2010, p. 52).

FOLKSONOMIES

The term “folksonomies” actually stands for “folk taxonomies” which suggests that they are created by ordinary information users as opposed to experts in a subject field (Goh, 2012, p. 75). This is why Hedden (2010, p. 193) could explain folksonomies as being created and used by authors and/or users of information content. Stock and Stock (2013, p. 611) contributed to this description when they described “folksonomies” as the free allocation of keywords by anyone and everyone in an information system. Goh (2012, p. 75) and Hassan-Montero and Hererro-Solana (2006) uses the concepts “social tagging” and “collaborative tagging” to describe the process of assigning keword tags to documents. This phenomenon is also known as social bookmarking, collaborative tagging, social classification, social indexing, or ethnoclassification (Hedden H. , 2010, pp. 194-195).

According to Hassan-Montero and Herrero-Solana (2006), folksonomies are a form of crowdsourced (meta) data which provides a different mode of access to the content in digital libraries. The

Page 16: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

inclusion of tags in an information system therefore facilitates the taggers’ future access to resources (Macgregor & McCulloch, 2006).

The main difference between thesauri, ontologies, taxonomies and folksonomies lies in the development of these vocabularies, in who creates them and their structure. Whereas thesauri, ontologies and taxonomies are created by experts in the field of information organisation, folksonomies are created by users of information and the language of the user becomes important in their development. Thesauri are mainly developed by humans whereas ontologies, modern taxonomies and folksonomies are computer generated. Lastly, folksonomies reflect no hierarchical structure and there are no directly specified parent-child like relationships. These are merely a set of terms that are used by a group of people to describe information sources.

There seem to be three different aspects as important in the development of folksonomies:

the tags (i.e. the words that are used to describe a document); the documents that need to be described; and the users who perform the indexing task.

Social tagging and the creation of social networks Stock and Stock (2013, p. 613) explain that the users and the documents are sort of connected in a social network where the documents are thematically linked if they have been indexed via the same tags and the documents are also coupled via shared users. The users are similarly thematically connected if they use the same tax and are coupled via shared documents. They explain the development of social networks through social tagging as follows:

Documents are generally indexed via several tags and with differing degrees of frequencies. When two tags co-occur in a single document, they are regarded to be interlinked. By using interlinked tags, information systems compile tag clusters which represent networks of folksonomies.

Personomies develop from indexed documents that were tagged by the same person and these personomies then support folksonomy-based recommender systems where the information system (search engine) makes recommendations to their users for documents, for users and for tags. In these instances the personomy is chosen as a point of reference and the recommended tags are tags that were previously used by the current user.

Websites or services that make use of social tagging include social bookmarking management sites such as Delicious (delicious.com) Connotea (www.connotea.org), and Diigo (www.diigo.com), Flickr (www.flickr.com) and Facebook (facebook.com). The option in social library catalogues such as LibraryThing.com and online vendors such as Kalahari.net to review a book or an information resource online that you have read or bought is nothing other than a form of social tagging. These reviews are then used to “promote” the book to other possible readers or consumers. However, it is not only commercial and social catalogues that allow for the use of social tagging. The following is an edited version of the entry in the Unisa Library’s catalogue for the book by Hedden

The accidental taxonomist / Heather Hedden

Page 17: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

 Printed MaterialHedden, Heather.Medford, N.J. : Information Today, c2010

Community TagsAdd a Tag

When I click on the “Add a Tag” button, the system required of me to identify myself before I could tag the record and submit my tag to the system.

Advantages and disadvantages of folksonomies

Stock and Stock (2013, p. 617) listed some of the advantages of folksonomies that were identified by Peterson (2006) and Shirky (2005). These include:

tags represent user-specific interpretations of documents which in turn allow users to interpret documents from various points of views. These could be scientific, ideological or cultural.

tagging can represent a kind of quality control: the more people tag a document, the more important the content appears to be. When used as a quality control measure, folksonomies support users in two ways. First by supporting the retrieval of documents that are relevant to an information search and secondly when they browse an information system to see what is available or exploit serendipitous information discovery.

In addition to the advantages that were identified by Peterson and Shirky, Hedden (2010, p. 195) also identified some advantages folksonomies have over taxonomies. These include:

folksonomies reflect trends, are up to date, and can monitor change and popularity folksonomies are cheaper and quicker to develop than building and maintaining a taxonomy they are responsive to user needs they facilitate democracy (as in votes for popular content and popular tags), the distribution

of tasks, and the building of virtual communities of shared interest and knowledge.

The disadvantages Peterson, Shirky and Hedden listed of folksonomies include:

lack of precision: different word forms and abbreviations are used for the same concept. There is no control of synonyms and homonyms, typos are frequent.

users have different tasks and approach documents with different motives and the documents are located in different cognitive contexts, but they do not share a common indexing level.

tags can be biased as users may disagree with prior tagging users may index documents in their own language (e.g. Cape Town versus Kaapstad) without

bothering to translate.

Page 18: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

homonyms that span different languages are not separated, e.g. Gift in German (poison) and English (present).

users don’t always distinguish between content indexing and formal descriptions. tags could contain value judgments (e.g. stupid, or nice) tags could describe planned activities (e.g. to read) rather being evaluative of the content syncaegorematical tags, e.g. tagging a photo in Facebook with “me” spam tags which have nothing to do with the contents of the document but which are

intended to mislead users.

Peters (2006 in Stock & Stock 2013, p. 618) is of the view that the exclusive use of folksonomies in professional environments such as corporate knowledge management systems cannot be recommended . However, if folksonomies are combined with other methods of knowledge representation, their advantages outweigh the disadvantages. Furthermore, to be effective social tagging requires a mass of user involvement.

CONCLUSIONIn this presentation, I addressed a number of controlled vocabularies and some not so controlled vocabularies and discussed the role each of these have in information organisation and retrieval. The discussion on modern taxonomies and folksonomies hopefully also helped to create and understanding of how intelligent search engines come up with all the suggested sources that could possibly be relevant to information search in.

BibliographyBenzon, W. (1996). Culture as an evolutionary arena. Journal of Social and Evolutionary Systems,

19(4), 321-362.

Burger, M. (2005). Thesaurus construction. In J. A. Kalley, E. Schoeman, & M. Burger (Eds.), Indexing for southern Africa: a manual compiled in celebration of ASAIB's first decade 1994-2004 (pp. 159-188). Pretoria: University of South Africa.

Curras, E. (2010). Ontologies, taxonomies and thesauri in systems science and systematics. Oxford: Chandos.

Fourie, I., & Burger, M. (2005). Verbal subject description. In J. A. Kalley, E. Schoeman, & M. Burger (Eds.), Indexing for southern Africa: a manual compiled in celebration of ASAIB's first decade 1994-2004 (pp. 53-67). Pretoria: University of South Africa.

Garcia, P. A., Martin-Moncunill, D., Sanchez-Alonso, S., & Garcia, A. F. (2014). A usability study of taxonomy visualisation user interfaces in digital repositories. Online Information Review, 38(2), 284-304.

Goh, D. H. (2012). Collaborative search and retrieval in digital libraries. In G. G. Chowdhury, & S. Foo (Eds.), Digital libraries and information access : research perspectives (pp. 69-82). Chicago: Neal-Schumann.

Page 19: Taxonomies, folksonomies, ontologies? What are they and … C…  · Web viewLeather Garments, Leather Goods, ... lack of precision: different word forms and abbreviations are used

Hassan-Montero, Y., & Herrero-Solana, V. (2006). Improving tag-clouds as visual information retrieval intervaces. Proceedings o International Conference on Multidisciplinary Information Sciences and Technologies.

Hedden, H. (2010). Controlled vocabularies, thesauri, and taxonomies. In J. Perlman, & E. L. Zafran (Eds.), Index it right: advice from the experts (pp. 135-154). Medford, N.J.: Information Today.

Hedden, H. (2010). The accidental taxonomist. Medford, NJ: Information Today.

Hwang, S. Y., Yang, W. S., & Ting, K. D. (2010). Automatic index construction for multimedia digital libraries. Information Processing and Management, 46, 295-307.

Macgregor, G., & McCulloch, E. (2006). Collaborative tagging as a knowledge organisation and resource discovery tool. Library Review, 55(5), 291-300.

Mamassion, L. (2010). Through the looking glass: a freelance perspective on database indexing. In J. Perlman, & E. L. Zafran (Eds.), Index it right!: advice from experts. Vol. 2 (pp. 99-110). Medford, NJ: Information Today in association with the American Society for Indexing.

Rouse, M. (2005). Web mining. Retrieved 3 22, 2014, from http://searchcrm.techtarget.com/definition/Web-mining

Stock, W. G., & Stock, M. (2013). Handbook of information science. Berlin: De Gruyter.

Titangos, H. L. (2013). Local community in the era of social media technologies: a global approach. Oxford: Chandos.

Vicient, C., Sanchez, D., & Moreno, A. (2013). An automatic approach for ontology-based extraction from heterogeneous textual resources. Engineering Applications of Artificial Intelligence, 26(3), 1092-1106.

Web data mining. (n.d.). Retrieved 3 22, 2014, from Web Data Mining.net: http://www.web-datamining.net/