INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

29
November 2009 INIS Training Seminar 1 International Atomic Energy Agency INIS Training Seminar INIS Training Seminar Subject Analysis, Thesaurus und Subject Analysis, Thesaurus und Computer Assisted Indexing Computer Assisted Indexing 23 – 27 November 2009 Vienna, Austria Alexander Nevyjel Head, Content Management Group

description

INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing. Alexander Nevyjel Head, Content Management Group. 23 – 27 November 2009 Vienna, Austria. Introduction to Subject Analysis. - PowerPoint PPT Presentation

Transcript of INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

Page 1: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009 INIS Training Seminar 1

International Atomic Energy Agency

INIS Training SeminarINIS Training Seminar

Subject Analysis, Thesaurus undSubject Analysis, Thesaurus undComputer Assisted IndexingComputer Assisted Indexing

23 – 27 November 2009

Vienna, Austria

Alexander NevyjelHead, Content Management Group

Page 2: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 2 International Atomic Energy Agency

Introduction to Subject AnalysisIntroduction to Subject Analysis

• Subject Analysis should be carried out whenever possible by subject specialists with a good knowledge of the subject matter and a familiarity with the subject analysis tools of the respective database (subject categories, thesaurus, subject analysis rules)

• Steps of Subject Analysis• subject classification

• abstracting

• subject indexing

Page 3: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 3 International Atomic Energy Agency

Subject ClassificationSubject Classification

• The main topic of the document determines the primary subject category

• If there are other significant topics, one or more secondary subject categories can be assigned in addition

Page 4: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 4 International Atomic Energy Agency

AbstractingAbstracting

• Each input item should contain an English abstract(exception: short communications)

• Abstracts in other languages are optional

• If an author abstract is available, it should be checked by the subject specialist, and edited, if necessary

• An abstract should be as informative as possible

• Emphasize what is novel about the information in the original document

Page 5: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 5 International Atomic Energy Agency

ThesaurusThesaurus

„A thesaurus is a terminological control device used in translating from the natural language of documents, indexers or users into a more constrained system language. It is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge“

This definition has been adopted by UNESCO„Guidelines for the establishment and development of monolingual

thesauri“, UNESCO, SC/W/255, Paris, September 1973

Page 6: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 6 International Atomic Energy Agency

The Thesaurus and its StructureThe Thesaurus and its Structure

Relationship Sy Cross reference

hierarchical BT broader term (level 1, 2,...)hierarchical NT narrower term (level 1, 2,...)

affinitive RT related term

preferential UF used for (reciprocally USE ...)

preferential UF+ used for multiple(reciprocally USE ... AND ...)

preferential SF seen for(reciprocally SEE ... OR ...)

Page 7: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 7 International Atomic Energy Agency

Subject IndexingSubject Indexing

Subject indexing means analysing the information content of a piece of literature and expressing the meaningfull information content in the language of the database using the controlled vocabulary of the Thesaurus

• Understanding of the content --> subject specialist

• Familiarity with Thesaurus and indexing rules

• Select a set of descriptors that describes the subject content of the piece of literature

Page 8: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 8 International Atomic Energy Agency

Procedures for IndexingProcedures for Indexing

• Carefully read the title and abstract and scan the body of the piece of literature

• scan the full text (introduction, table of content, tables,

graphs, figures, conclusion) to find information items missing from the abstract or requiring more precision

• Identify the concept(s) about which the piece of literature contains useful information

• Translate the concepts into descriptors

• Avoid overindexing

Page 9: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 9 International Atomic Energy Agency

Proposed Terms (Technical Note 175)Proposed Terms (Technical Note 175)

If no suitable descriptor exists in the Thesaurus for the retrieval of a usefull concept, make a proposal for a new one, containing the following:

• Proposed term

• Proposed word block of the term (in particular proposed BTs)

• Potential forbidden terms pointing to this proposed descriptor

• Scope note when appropriate

• Explanation and justification for the proposal

• One or more sample records

Page 10: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009 INIS Training Seminar 10

International Atomic Energy Agency

The purpose of subject indexing isThe purpose of subject indexing is

to enable useful retrievalto enable useful retrieval

Page 11: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 11 International Atomic Energy Agency

Computer-assisted Indexing - CAIComputer-assisted Indexing - CAI

• Kick-off Meeting Jan 2004

• Implementation and Customisation Jun 2004

• Production Indexing from Jun 2004 ongoing

• CAI version 1.0 final acceptance Aug 2004

• Tuning of the system from Aug 2004 ongoing

• CAI batch processing for Member States Dec 2004

• CAI online from remote for MS Nov 2007

Page 12: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 12 International Atomic Energy Agency

CAI Thesaurus extensionCAI Thesaurus extension

“Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more

USE relations• CAI internal only • not exported to INIS production system• not exported to FIBRE • not printed in any appearance of the thesaurus • support identification of descriptors in the free text

Page 13: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 13 International Atomic Energy Agency

Hidden Terms: CompoundsHidden Terms: Compounds

Descriptor hidden term free text

MAGNESIUM BORIDES MgB_2 MgB2

MAGNESIUM CARBONATES MgCO_3 MgCO3

MAGNESIUM HYDRIDES MgH_2 MgH2

IRON BROMIDES iron dibromideIRON BROMIDES iron tribromideARSENIC IONS As"3"- As3-

ACETYLENE C_2H_2 C2H2

ACETALDEHYDE C_2H_4O C2H4O

ACETIC ACID C_2H_4O_2 C2H4O2

approx. 1400 hidden terms (expected 3000)

Page 14: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 14 International Atomic Energy Agency

Hidden Terms: IsotopesHidden Terms: Isotopes

Descriptor hidden term free text

CESIUM 137 Cesium 137, Cesium-137"1"3"7cs 137Cs137 caesium 137 Caesium, 137-Caesiumcaesium 137 Caesium 137, Caesium-137137 cesium 137 Cesium, 137-Cesium137 cs 137 Cs, 137-Css 137 Cs 137, Cs-137cs"1"3"7 Cs137

cs137 Cs137CESIUM 138 "1"3"8"mcs 138mCs

cs"1"3"8"m Cs138m

approx. 22.400 hidden terms

Page 15: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 15 International Atomic Energy Agency

Hidden Terms: Elementary ParticlesHidden Terms: Elementary Particles

Descriptor hidden term free text

B QUARKS bottom quarks

T QUARKS top quarks

ELECTRON NEUTRINOS #nu#_e νe

MUON NEUTRINOS #nu#_#mu# νμ

TAU NEUTRINOS #nu#_#tau# ντ

RHO-770 MESONS #rho#-770 ρ-770

OMEGA-782 MESONS #omega#-782 ω-782

KAONS NEUTRAL K"0 K0

KAONS NEUTRAL SHORT-LIVED K"0_S K0S

KAONS NEUTRAL LONG-LIVED K"0_L K0L

approx. 300 hidden terms

Page 16: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 16 International Atomic Energy Agency

Hidden Terms: UK/US Spellings Hidden Terms: UK/US Spellings

Descriptor hidden term

A CENTERS a centresACTIVITY METERS activity metresANALOG COMPUTERS analogue computersANESTHESIA anaesthesiaARCHAEOLOGY archeologyAUSTRIAN ORGANIZATIONS austrian organisationsBALLISTIC MISSILE DEFENSE ballistic missile defenceBAYARD-ALPERT GAGES bayard-alpert gaugesBEAM ANALYZERS beam analysersBEHAVIOR behaviourCATALOGS catalogues

approx. 800 hidden terms

Page 17: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 17 International Atomic Energy Agency

Hidden Terms: Diacritics and Countries Hidden Terms: Diacritics and Countries

Descriptor hidden termDiacritics:

BAECKLUND TRANSFORMATION backlund transformationBRUECKNER MODEL bruckner modelBRUNSBUETTEL REACTOR brunsbuttel reactorMOESSBAUER EFFECT mossbauer effect

Country Names:CAMBODIA kampucheaCOTE D'IVOIRE ivory coastGREECE hellasMYANMAR burmaSYRIA syrian arab republicTHAILAND siam

approx. 250 hidden terms

Page 18: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 18 International Atomic Energy Agency

Hidden Terms: Other Spellings Hidden Terms: Other Spellings

Descriptor hidden termSingular/Plural

FUNGI fungusFUNGI fungusesG MATRIX g matricesG MATRIX g matrixes

Reverse SequenceATOM-MOLECULE COLLISIONS atom-molecule scatteringATOM-MOLECULE COLLISIONS molecule-atom scatteringATOM-MOLECULE COLLISIONS atom-molecule reactionsATOM-MOLECULE COLLISIONS molecule-atom reactionsATOM-MOLECULE COLLISIONS atom-molecule interactionsATOM-MOLECULE COLLISIONS molecule-atom interactions

approx. 900 hidden terms

Page 19: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 19 International Atomic Energy Agency

CAI Thesaurus ExtensionCAI Thesaurus Extension

• Thesaurus• Valid Descriptors 21.826

• Forbidden Terms 9.009

• CAI • Hidden Terms 34.381

• Total 65.216

Terminological Knowledge Base

Page 20: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 20 International Atomic Energy Agency

Further Improvements necessary Further Improvements necessary

• “+” and “-“ signs

• K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS

• Case sensitivity

• TiN TIN (instead of TITANIUM NITRIDES)

• gas GALLIUM SULFIDES

• “…who is the …” WHO (World Health Organization)

• Verbs versus Nouns

• “… this leads us to …” LEAD

• “… this leaves it ….” LEAVES

• Homographic terms

• Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS

• Nuclear Reactions, e.g. 14N(γ,α)10B

• Targets

• Beams

• Reactions

Page 21: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 21 International Atomic Energy Agency

CAI InteractiveTraining of CAI

Records with FullIndexing

INIS Verification andProduction System

CAI Offline/Batch

Records withCAI-suggested

Descriptors

INIS SubjectAnalysis Module

Input fromMember States

FullIndexing

Proposed Terms/No Indexing

Electronic Recordsfrom Publishers

Proposed Terms/No Indexing

CAI-Workflow

Interactive CAI ProcessingBatch Mode

Conventional Processing

Page 22: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 22 International Atomic Energy Agency

Page 23: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 23 International Atomic Energy Agency

CAI Batch and Online ProcessingCAI Batch and Online Processing

• Input: MemSt-CC-yymmdd-xxxxxxxxxxx

• MemSt is a standard prefix (meaning “member state”)

• CC is the country code

• yymmdd is the date when the file was generated

• xxxxxxxxxxx is any additional identification

• Examples• MemSt-AR-041203-thisismytestfile

• MemSt-FR-041212-fileidentification

Page 24: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 24 International Atomic Energy Agency

CAI Batch ProcessingCAI Batch Processing

• Output: _MemSt-CC-yymmdd-xxxxxxxxxxx

• These files will carry the CAI suggested descriptors in tag 800, preceded by the string

##CAI suggestions##;

• Example:• 800^##CAI suggestions##; DESCRIPTOR1;

DESCRIPTOR2; DESCRIPTOR3; …….

• sent back to the member state for reviewing

Page 25: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 25 International Atomic Energy Agency

CAI Batch and Online ProcessingCAI Batch and Online ProcessingReviewing ProcessReviewing Process

• Delete all suggested descriptors which are too general

• Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature

ranges,...

• nuclear reactions

• chemical compounds, alloys, etc.

• CAI is cleaning up BT/NTs clean up BT/NTs from manual additions

• Clean up suggestions from homographic terms

Page 26: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 26 International Atomic Energy Agency

CAI Batch and Online ProcessingCAI Batch and Online ProcessingFinalisation ProcessFinalisation Process

CAI batch• When reviewing of the record completed:

Delete “##CAI suggestions## “

• When reviewing of all records completed: Submit file to “INIS Input Box”

CAI online• When reaching the last record:

press “export and exit” button

• File goes directly to INIS production system, or if required, sent back to Member State for reviewing

Page 27: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 27 International Atomic Energy Agency

CAI Production StatisticsCAI Production Statistics01-06-2004 until 31-08-200901-06-2004 until 31-08-2009

CAI Production Statistics (01-06-2004 until 31-08-2009)

 

2004

2005 2006 2007 2008

2009

TotalJun-Dec Jan-Aug

AIP 19859 17827 19557 9657 8249 4108 79257

ANS     813 1256     2069

Elsevier 3124 23809 35716 32175 26993 18625 140442

IOPP 3291 8751 8059 7973 10526 8355 46955

IAEA 2131 2171 3984 4445 4843 2532 20106

Springer         6113 1000 7113

MemSt     660 65 3045 3105 6875

Total 28405 52558 68789 55571 59769 37725 302817

Page 28: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 28 International Atomic Energy Agency

CAI Batch Processing StatisticsCAI Batch Processing Statistics2005 until 31-08-20092005 until 31-08-2009

  2005 2006 2007 2008 2009/1-8 Total

AR 141 4 53     198

AU 224         224

BG 32   199 151 43 425

CN 299 2319 2314 2959 3059 10950

DE 363 644 1019 879 607 3512

ET     13017 9186 4062 26265

FR 138 721       859

JP 11     32   43

LT   39 69     108

MY 133 270 205 112 61 781

US   97 46     143

UZ 359 396 43     798

VN 8 16   83 82 189

others 306 105       411

Total 2014 4611 16965 13402 7914 44906

Page 29: INIS Training Seminar Subject Analysis, Thesaurus und Computer Assisted Indexing

November 2009INIS Training Seminar 29 International Atomic Energy Agency

CAI online for Member StatesCAI online for Member Statesintroduced in July 2007introduced in July 2007

• Tested by• China• Germany• France• India• Japan• Switzerland• Uruguay

• Regularly in use by• Argentina• Brazil• China• Czech Republic• Japan• Switzerland

CAI online and CAI batch are now regular CAI online and CAI batch are now regular services for Member Statesservices for Member States