INIS Training Seminar - International Atomic Energy … November 2005 INIS Training Seminar 1 INIS...
Transcript of INIS Training Seminar - International Atomic Energy … November 2005 INIS Training Seminar 1 INIS...
1
November 2005 INIS Training Seminar 1
INIS Training Seminar14-18 November 2005
Subject Analysis, Thesaurus and Computer-assisted Indexing
Alexander NevyjelDatabase Production and Development Group
INIS Unit, INIS&NKM Section, IAEA
November 2005 INIS Training Seminar 2
Introduction to Subject Analysis
Subject Analysis should be carried out whenever possible by subject specialists with a good knowledge of the subject matter and a familiarity with the subject analysis tools of the respective database (subject categories, thesaurus, subject analysis rules)Steps of Subject Analysis
subject classificationabstractingsubject indexing
2
November 2005 INIS Training Seminar 3
Subject Classification
The main topic of the document determines theprimary subject categoryIf there are other significant topics, one or moresecondary subject categories can be assigned in addition
November 2005 INIS Training Seminar 4
Abstracting
Each input item should contain an English abstract(exception: short communications)Abstracts in other languages are optionalIf an author abstract is available, it should be checked by the subject specialist, and edited, if necessaryAn abstract should be as informative as possibleEmphasize what is novel about the information in the original document
3
November 2005 INIS Training Seminar 5
ThesaurusWhat is a Thesaurus ?
„A thesaurus is a terminological control deviceused in translating from the natural languageof documents, indexers or users into a more constrained system language. It is a controlled and dynamic vocabulary of semantically and generically related terms which covers a specific domain of knowledge“
This definition has been adopted by UNESCO„Guidelines for the establishment and development of monolingual
thesauri“, UNESCO, SC/W/255, Paris, September 1973
November 2005 INIS Training Seminar 6
The Thesaurus and its Structure
Relationship Sy Cross reference
hierarchical BT broader term (level 1, 2,...)hierarchical NT narrower term (level 1, 2,...)
affinitive RT related term
preferential UF used for (reciprocally USE ...)preferential UF+ used for multiple
(reciprocally USE ... AND ...)preferential SF seen for
(reciprocally SEE ... OR ...)
4
November 2005 INIS Training Seminar 7
Subject Indexing
Subject indexing means analysing the information content of a piece of literature and expressing the meaningfull information content in the language of the database using the controlled vocabulary of the ThesaurusUnderstanding of the content --> subject specialistFamiliarity with Thesaurus and indexing rulesSelect a set of descriptors that describes the subject content of the piece of literature
November 2005 INIS Training Seminar 8
Procedures for Indexing
Carefully read the title and abstract and scan the body of the piece of literaturescan the full text (introduction, table of content, tables, graphs, figures, conclusion) to find information items missing from the abstract or requiring more precisionIdentify the concept(s) about which the piece of literature contains useful informationTranslate the concepts into descriptorsAvoid overindexing
5
November 2005 INIS Training Seminar 9
Proposed Terms (Technical Note 175)
If no suitable descriptor exists in the Thesaurus for the retrieval of a usefull concept, make a proposal for a new one, containing the following: Proposed termProposed word block of the term (in particular proposed BTs)Potential forbidden terms pointing to this proposed descriptor Scope note when appropriate Explanation and justification for the proposal One or more sample records
November 2005 INIS Training Seminar 10
The purpose of subject indexing is
to enable useful retrieval
6
November 2005 INIS Training Seminar 11
Computer-assisted Indexing
Kick-off Meeting Jan 2004Implementation and Customisation Jun 2004Production Indexing from Jun 2004 ongoingCAI version 1.0 final acceptance Aug 2004Tuning of the system from Aug 2004 ongoingCAI version 1.10 kick-off Dec 2004CAI version 1.10 acceptance Apr 2005RetrievalWare pilot Aug 2005CAI Thesaurus extension planned Jan 2006
November 2005 INIS Training Seminar 12
CAI Thesaurus extension
“Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. handled similar to “forbidden terms” with one or more USE relationsCAI internal only not exported to INIS production systemnot exported to FIBRE not printed in any appearance of the thesaurus support identification of descriptors in the free text
7
November 2005 INIS Training Seminar 13
Hidden Terms: Compounds
Descriptor hidden term free text
MAGNESIUM BORIDES MgB_2 MgB2MAGNESIUM CARBONATES MgCO_3 MgCO3MAGNESIUM HYDRIDES MgH_2 MgH2IRON BROMIDES iron dibromideIRON BROMIDES iron tribromideARSENIC IONS As"3"- As3-
ACETYLENE C_2H_2 C2H2ACETALDEHYDE C_2H_4O C2H4OACETIC ACID C_2H_4O_2 C2H4O2
approx. 1400 hidden terms (expected 3000)
November 2005 INIS Training Seminar 14
Hidden Terms: Isotopes
Descriptor hidden term free text
CESIUM 137 Cesium 137, Cesium-137"1"3"7cs 137Cs137 caesium 137 Caesium, 137-Caesiumcaesium 137 Caesium 137, Caesium-137137 cesium 137 Cesium, 137-Cesium137 cs 137 Cs, 137-Css 137 Cs 137, Cs-137cs"1"3"7 Cs137
cs137 Cs137CESIUM 138 "1"3"8"mcs 138mCs
cs"1"3"8"m Cs138m
approx. 22.400 hidden terms
8
November 2005 INIS Training Seminar 15
Hidden Terms: Elementary ParticlesDescriptor hidden term free text
B QUARKS bottom quarksT QUARKS top quarksELECTRON NEUTRINOS #nu#_e νe
MUON NEUTRINOS #nu#_#mu# νµTAU NEUTRINOS #nu#_#tau# ντRHO-770 MESONS #rho#-770 ρ-770OMEGA-782 MESONS #omega#-782 ω-782KAONS NEUTRAL K"0 K0
KAONS NEUTRAL SHORT-LIVED K"0_S K0S
KAONS NEUTRAL LONG-LIVED K"0_L K0L
approx. 300 hidden terms
November 2005 INIS Training Seminar 16
Hidden Terms: UK/US SpellingsDescriptor hidden term
A CENTERS a centresACTIVITY METERS activity metresANALOG COMPUTERS analogue computersANESTHESIA anaesthesiaARCHAEOLOGY archeologyAUSTRIAN ORGANIZATIONS austrian organisationsBALLISTIC MISSILE DEFENSE ballistic missile defenceBAYARD-ALPERT GAGES bayard-alpert gaugesBEAM ANALYZERS beam analysersBEHAVIOR behaviourCATALOGS catalogues
approx. 800 hidden terms
9
November 2005 INIS Training Seminar 17
Hidden Terms: Diacritics and CountriesDescriptor hidden term
Diacritics:BAECKLUND TRANSFORMATION backlund transformationBRUECKNER MODEL bruckner modelBRUNSBUETTEL REACTOR brunsbuttel reactorMOESSBAUER EFFECT mossbauer effect
Country Names:CAMBODIA kampucheaCOTE D'IVOIRE ivory coastGREECE hellasMYANMAR burmaSYRIA syrian arab republicTHAILAND siam
approx. 250 hidden terms
November 2005 INIS Training Seminar 18
Hidden Terms: Other SpellingsDescriptor hidden term
Singular/PluralFUNGI fungusFUNGI fungusesG MATRIX g matricesG MATRIX g matrixes
Reverse SequenceATOM-MOLECULE COLLISIONS atom-molecule scatteringATOM-MOLECULE COLLISIONS molecule-atom scatteringATOM-MOLECULE COLLISIONS atom-molecule reactionsATOM-MOLECULE COLLISIONS molecule-atom reactionsATOM-MOLECULE COLLISIONS atom-molecule interactionsATOM-MOLECULE COLLISIONS molecule-atom interactions
approx. 900 hidden terms
10
November 2005 INIS Training Seminar 19
CAI Thesaurus Extension
ThesaurusValid Descriptors 21.953Forbidden Terms 9.411
CAI Hidden Terms 29.237
Total 60.601
Terminological Knowledge Base
November 2005 INIS Training Seminar 20
Further Improvements under Development
“+” and “-“ signs K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS
Case sensitivityTiN TIN (instead of TITANIUM NITRIDES)gas GALLIUM SULFIDES“…who is the …” WHO (World Health Organization)
Verbs versus Nouns“… this leads us to …” LEAD“… this leaves it ….” LEAVES
Homographic termsSolutions SOLUTIONS or MATHEMATICAL SOLUTIONS
Nuclear Reactions, e.g. 14N(γ,α)10BTargets BeamsReactions
11
November 2005 INIS Training Seminar 21
C A I In te rac tiveT ra in ing o f C A I
R ecords w ith F u llIndex ing
IN IS V erifica tion a ndP roduc tion S ys tem
C A I O ffline /B a tch
R ecord s w ithC A I-sugges ted
D escrip to rs
IN IS S ub jec tA na lys is M odu le
Inpu t fromM e m ber S ta tes
F u llIndex ing
P roposed T erm s /N o In dex ing
E lec tron ic R e cordsfrom P ub lishe rs
P roposed T erm s/N o Inde x ing
CAI-Workflow
Interactive CAI ProcessingBatch Mode
Conventional Processing
November 2005 INIS Training Seminar 22
12
November 2005 INIS Training Seminar 23
CAI Batch Processing StatisticsNov 2004 – November 2005
Country Records FilesAR Argentina 133 7AU Australia 443 2BD Bangladesh 2 1BG Bulgaria 27 1BR Brazil 10 1CH Switzerland 58 4CN China 294 3DE Germany 363 11FR France 243 3JP Japan 6 1MK Macedonia 107 1MY Malaysia 125 3SE Sweden 27 1TH Thailand 15 1UZ Uzbekistan 144 2
Total 1997 42
November 2005 INIS Training Seminar 24
CAI Batch Processing
Input: MemSt-CC-yymmdd-xxxxxxxxxxxOutput: _MemSt-CC-yymmdd-xxxxxxxxxxx
MemSt is a standard prefix (meaning “member state”)CC is the country code yymmdd is the date when the file was generated xxxxxxxxxxx is any additional identification
ExamplesMemSt-AR-041203-thisismytestfileMemSt-FR-041212-fileidentification
13
November 2005 INIS Training Seminar 25
CAI Batch Processing
Output: _MemSt-CC-yymmdd-xxxxxxxxxxx
These files will carry the CAI suggested descriptors in tag 800, preceded by the string
##CAI suggestions##; Example:
800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; …….
sent back to the member state for reviewing
November 2005 INIS Training Seminar 26
CAI Batch ProcessingReviewing Process
Delete all suggested descriptors which are too generalAdd relevant descriptors which were not found
numerical values, e.g. pressure ranges, temperature ranges,...nuclear reactionschemical compounds, alloys, etc.
CAI is cleaning up BT/NTs clean up BT/NTs from manual additionsClean up suggestions from homographic termsDelete “##CAI suggestions## “Submit file to “INIS Input Box”