key word indexing and their types with example

10
Keyword Indexing University of Calcutta 11/21/2014 Name -Sourav Sarkar. Roll no- 32

Transcript of key word indexing and their types with example

Page 1: key word indexing and their types with example

Keyword Indexing University of Calcutta

11/21/2014 Name -Sourav Sarkar. Roll no- 32

Page 2: key word indexing and their types with example

2 | P a g e

Definition of Keyword Indexing: An indexing system without controlling the vocabulary may be referred as ‘Natural Language Indexing’ or sometimes as ‘Free Text Indexing’. Keyword indexing is also known as Natural Language or Free Text Indexing. ‘Keyword’ means catchword or significant word or subject denoting word taken mainly from the titles and / or sometimes from abstract or text of the document for the purpose of indexing. Thus keyword indexing is based on the natural language of the documents to generate index entries and no controlled vocabulary is required for this indexing system. Keyword indexing is not new. It existed in the nineteenth century, when it was referred to as a ‘catchword indexing’. Computers began to be used to aid information retrieval system in the 1950s. H P Luhn and his associates produced and distributed copies of machine produced permuted title indexes in the International Conference of Scientific Information held at Washington in 1958, which he named it as Keyword-In-Context (KWIC) index and reported the method of generation of KWIC index in a paper. American Chemical Society established the value of KWIC after its adoption in 1961 for its publication ‘Chemical Titles’.

Uses of Keyword Index: A number of indexing and abstracting services prepare their subject indexes by using keyword indexing techniques. They are nothing but the variations of keyword indexing apart from those mentioned above.

Page 3: key word indexing and their types with example

3 | P a g e

Some notable examples are 1.Chemical Titles; 2. BASIC (Biological Abstracts Subject In Context); 3. Keyword Index of Chemical Abstracts; 4. CBAC (Chemical Biological Activities); 5. KWIT (Keyword-In-Title) of Laurence Burkeley Laboratory; 6. SWIFT (Selected Words in Full Titles); and 7. SAPIR (System of Automatic Processing and Indexing of Reports).

Types of Keyword Indexing:

1. KWIC (Keyword-In-Context) Index: H P Luhn is credited for the development of KWIC index. This index was based on the keywords in the title of a paper and was produced with the help of computers. Each entry in KWIC index consists of following three parts:

Keywords Significant or subject denoting words which serve as approach terms.

Page 4: key word indexing and their types with example

4 | P a g e

Context: Keywords selected also specify the particular context of the document (i.e. usually the rest of the terms of the title).

Identification or Location Code: Code used (usually the serial numbers of the entries in the main part) to provide address of the document where full bibliographic description of the document will be available.

KWIC Indexing Process

KWIC indexing system consists of three steps

Step I : Keyword selection

Step II : Entry generation

Step III : Filin

The Operational Stages of KWIC Indexing

Consist of the Following

a) Mark the significant words or prepare the ‘stop list’ and keep

it in computer. The ‘stop list’ refers to a list of words, which are

considered to have no value for indexing / retrieval. These may

include insignificant words like articles (a, an, the), prepositions,

conjunctions, pronouns, auxiliary verbs together with such

general words as ‘aspect’, ‘different’, ‘very’, etc. Each major search

system has defined its own ‘stop list’.

b) Selection of keywords from the title and / or abstract and /

Page 5: key word indexing and their types with example

5 | P a g e

or full text of the document.

c) KWIC routine serves to rotate the title to make it accessible

from each significant term. In view of this, manipulate the title or

title like phrase in such a way that each keyword serves as the

approach term and comes in the beginning (or in the middle) by

rotation followed by rest of the title.

c) KWIC routine serves to rotate the title to make it accessible

from each significant term. In view of this, manipulate the title or

title like phrase in such a way that each keyword serves as the

approach term and comes in the beginning (or in the middle) by

rotation followed by rest of the title.

d) Separate the last word and first word of the title by using a

symbol say, stroke [/] (sometime an asterisk ‘*’ is used) in an

entry. Keywords are usually printed in bold type face.

e) Put the identification / location code at the right end of each

entry; and finally

f) Arrange the entries alphabetically by keywords.

Example of KWIC indexing

Title -Classification of Books in a University Library (with

identification code 1279

Step I: Classification Books University Library

Step II: CLASSIFICATION of Books in a University Library 1279

Books in a University Library/Classification of 1279

Page 6: key word indexing and their types with example

6 | P a g e

UNIVERSITY Library/Classification of Books in 1279

LIBRARY/Classification of Books in University 1279

Step III: Books in a University Library/Classification of 1279

CLASSIFICATION of Books in a University Library 1279

LIBRARY/Classification of Books in a University 1279

UNIVERSITY Library/Classification of Books in a 1279

The keyword may also be in the centre as follows:

Classification of BOOKS in a University Library 1279

University Library CLASSIFICATION of Books in a 1279

In a University LIBRARY/Classification of Books 1279

of Books in a UNIV. LIBRARY/Classification 1279

2. KWOC (key-word out-of-context) Index The KWOC is a variant of KWIC index. Here, each keyword is

taken out and printed separately in the left hand margin with the

complete title in its normal order printed to the right.

Examples,

Title: Computerisation of Libraries in India Format 1

COMPUTERISATION Computerisation of libraries in India 1289

INDIA Computerisation of libraries in India 1289

LIBRARIES Computerisation of libraries in Indian 1289

Page 7: key word indexing and their types with example

7 | P a g e

Format 2

COMPUTERISATION

Computerization of libraries in India 1289

INDIA

Computerisation of libraries in India 1289

LIBRARIES

Computerisation of libraries in India 1289

These entries are then filed in an alphabetical sequence in the file of the KWOC index.

It should be noted that the changing of format in KWOC index has provided only limited improvement. Since it follows the same indexing technique there is hardly any difference in its retrieval efficiency.

3. KWAC (key-word Augmented-in-

context) Index: KWAC also stands for ‘key-word-and-context’. In many cases, title cannot always represent the thought content of the document co-extensively. KWIC and KWOC could not solve the problem of the retrieval of irrelevant document. In order to solve the problem of false drops, KWAC provides the enrichment of the keywords of the title with additional keywords taken either from the abstract or from the original text of the document and are inserted into the title or added at the end to give further index entries. KWAC is also called enriched KWIC or KWOC. CBAC (Chemical Biological Activities) of BIOSIS uses KWAC index where title is enriched by another title like phrase formulated by the indexer.

Example

Page 8: key word indexing and their types with example

8 | P a g e

A title of a document is ‘Expert System’. Here in this case the title is not clearly expressing the contents of the document. So the abstract of the document or even the contents itself may be consulted to find the significant words, which should be added to the title to make it expressive. E.g. the above example may result in, Expert System in Library then the index should be prepared either by KWIC or by KWAC system.

4. Key-Term Alphabetical (KEYTALPHA)

In the Key-Term Alphabetical index, keywords are arranged side by side without forming a sentence. Entries are prepared containing only keywords and location excluding the context.

Example Computerisation of libraries in India

The KEYTALPHA index entries are:

COMPUTERISATION, INDIAN, LIBRARIES 1289

INDIA, LIBRARIES, COMPUTERISATION 1289

LIBRARIES, COMPUTERISATION, INDIA 1289

Advantages:

1) The principal merit of keyword indexing is the speed with which it can be produced. 2) The production of keyword index does not involve trained indexing staff. What is required is an expressive title coextensive

Page 9: key word indexing and their types with example

9 | P a g e

to the specific subject of the document. 3) Involves minimum intellectual effort. 4) Vocabulary control need not be used. 5) Satisfied

Disadvantages:

1) Most of the terms used in science and technology are standardized, but the situation is different in case of Humanities and Social Sciences. 2) Related topics are scattered. The efficiency of keyword indexing is invariably the question of reliability of expressive title of document as most such indexes are based on titles. 3) Search of a topic may have to be done under several keywords. 4) Search time is high. 5) Searchers very often lead to high recall and low precision. 6) Fails to meet the exhaustive approach for a large collection are the current approaches of users.

Search Strategy for Keyword Indexes In the keyword indexes significant terms of the titles of

documents are arranged alphabetically, each having its context and the identification number. There is no vocabulary control and, therefore, related or identical subjects are scattered

Page 10: key word indexing and their types with example

10 | P a g e

throughout the index file. There is no reference system to connect or correlate the related or identical topics. While formulating search strategy, these limitations should be kept in mind. The user should search under the synonyms of the words and also under the related terms. When titles are improved and supplemented by the editors, the search yields better results. The keyword indexes do not provide for the coordination of two or more search words. In search strategy this limitation should also be kept in mind. Also the users of these indexes should be prepared to search under the terms with alternative, spelling singular plurals, synonyms and near synonyms. Because of the uncontrolled vocabulary, the number of search terms is considerably enlarged necessitating more search efforts.

Conclusion Despite the deficiencies, the keyword index has been quite

popular during the last four decades. A number of evaluation studies have indicated that keyword indexes may offer several advantages over others. The continued growth of machine readable database has shown that the use of keyword indexes works well. The problem of un-expressive titles is solved to a considerable extent by editorial intervention. It is true that Key Word Indexes as such will not facilitate comprehensive search. Production of any index taking care of comprehensive search takes time, money and effort. Key Word Index was never envisaged to provide comprehensive subject index. It is a mechanism of providing quick and specific subject approach to information which Luhn envisaged it to be.