Condorcet Final Report

28 January 2000

Bas van Bakel Reinier T. Boon Nicolaas J.I. Mars Erik Oltmans

Vossius LaboratoryDepartment of Computer Science

University of TwenteEnschede, the Netherlands

Technical Report CTIT TR-00-02

This report is the last of four annual reports on Condorcet, an information retrieval (IR) pro-ject carried out at the Vossius Laboratory, Department of Computer Science, University ofTwente, the Netherlands. The research project started officially on 1 September 1995, andformally ended on 31 December 1999.

In this report we present the overall results of the Condorcet project. This report shouldbe read together with the Ph.D.-thesis of Erik Oltmans A knowledge-based approach to robustparsing, defended at the University of Technology on 28 January 2000.

The research arrived out in the Condorcet project has been funded by the Dutch TechnologyFoundation STW, grant number TIF.3441.

Enschede, 28 January 2000

Nicolaas J.I. Mars

AddressesAll authors but one have now moved; they can be reached at the addresses given below. Allinquiries for general information may be directed to Nicolaas J.I. Mars:

Department of Computer ScienceUniversity of TwenteP.O. Box 217, 7500 AE EnschedeThe Netherlands

E-mail [email protected] other authors can be reached by E-mail:Bas van Bakel [email protected] T. Boon [email protected] Oltmans [email protected]


1 Introduction 4

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 The contents and organization of this report . . . . . . . . . . . . . . . . . . . . 5

2 Evaluation of the indexing process 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Comparison of indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Document 88008547 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Document 88008549 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.3 Document 88008554 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.4 Document 88009075 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.5 Document 88009078 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2.6 Document 88018585 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.7 Document 88018590 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.2.8 Document 88019360 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2.9 Document 88021951 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2.10 Document 88100203 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.2.11 Document 88104710 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38


2.2.12 Document 88174154 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Conclusions and recommendations 46

3.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

A Publications 50

A.1 Archival publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

A.2 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

B About the authors 53


Chapter 1


1.1 Introduction

The Condorcet1 project was funded by the Dutch Technology Foundation (STW) through theWerkgemeenschap Informatiewetenschap.2 The project was carried out at the Vossius Laboratoryof the University of Twente.

The main objective of the Condorcet project was to design, build and evaluate a proto-type automated indexing system, supporting Information Retrieval from large, reality-levelvolumes (tens of thousands) of documents. The prototype has been tested on some 400 doc-uments from two scientific domains: mechanical properties of engineering ceramics as a field ofengineering, and epilepsy as a subfield of medicine.

The Condorcet system is concerned with semi-automatic indexing of documents, thus produ-cing document representations, to be used in matching user requests to these representations.More specifically, the Condorcet system indexes scientific documents by mapping titles andabstracts3 (henceforth referred to as descriptions) of the documents to concepts and relations,defined in modern versions of classical indexing thesauri, i.e. ontologies. It does so by usingthree kinds of knowledge: domain knowledge, linguistic knowledge and indexing know-ledge.

1The Condorcet project is named after Marie Jean Antoine Nicolas Caritat, Marquis de Condorcet (1743-1794).Condorcet, a French mathematician and social philosopher, designed a system for classifying scientific results.He rejected Linnaeus’ approach of exhaustive enumeration and used instead a system in which relations aredefined implicitly. He also envisaged mechanical means to manipulate the system. The approach to structuredindexing concepts advocated in the Condorcet project is in many respects similar to Condorcet’s ideas.

2The original proposal was dated 28 January 1994. The project was formally approved by STW on 23 January1995, Decision 3852.

3Title and abstract are parts of a Document Description. Descriptions of documents also contain informationon author(s), source, year of publication, etc. To improve readability, we will use the term description when wespeak of title and abstract.


1.2 The contents and organization of this report

Simultaneous with the publication of this report, a Ph.D.-thesis entitled A knowledge-basedapproach to robust parsing has been published by Erik Oltmans. That thesis contains a detaileddescription of the aims of the Condorcet project and of the system architecture developedover the project’s course, as well as a systematic evaluation of the linguistic component ofthe Condorcet system.

As is appeared wasteful to repeat all that information here, we have chosen to write this finalreport as a complement to the thesis. In particular, the overall evaluation of the Condorcetsystem with respect to its task of supporting Information Retrieval, a topic not covered inthe thesis, will be found here.

The reader is advised to read Oltmans’ thesis first, and then refer to the present report for theoverall evaluation, conclusions and recommendations. The thesis and this report togetherdocument the final insights obtained in the project. Some of the material in the older annualreports has become outdated because of the progress in knowledge obtained.

In the next chapter of this report we discuss the overall evaluation process of the Condorcetsystem. After that, we draw general conclusions about the success of the Condorcet system,and present recommendations for the use and possible further development of the system.


Chapter 2

Evaluation of the indexing process

2.1 Introduction

The goal of the Condorcet project is to develop a semi-automatic method for assigning struc-tured index concepts to scientific documents, on the basis of their title and abstract. In thischapter we discuss how well we have achieved this goal, and how we have determined that.

An important obstacle in evaluating Condorcet’s performance is the lack of an acceptedstandard for indexing. Human indexers have been shown to be highly idiosyncratic in theirwork, and thus not suitable as a reference. Automated indexing systems are obviously moreconsistent but very hard to come by.

On closer examination, we encounter difficulties on two levels. One difficulty is that there isno agreement on which index terms (or concepts) should be assigned to a given document.The second is that there are no standards on what are acceptable levels of average perform-ance. Both problems are not surprising: depending on the goal of an indexing (or better: aretrieval) system, different performance levels may be required.

We have tried to solve both problems at the same time by comparing Condorcet’s outputwith three alternatives: the indexing performed by Elsevier in its Embase database, the in-dexing performed by the National Library of Medicine for Medline, and finally the indexingby a human indexer who benefitted from knowledge of the indexing performed by the othersystems; this human will be called the Oracle for obvious reasons. In a few cases, this Or-acle has also looked at the full article, to determine whether the restriction in the Condorcetsystem of using title and abstract only is an important one.

To avoid the problem of the lack of standards for the performance, and to make the compar-ison as informative as possible, we decided to perform an in-depth experiment on a limitednumber of documents, and to discuss the results obtained for each document.


In the following sections, you will therefore find for each of a dozen1 documents the follow-ing items:

• The title and abstract of the document, as used as input to the Condorcet indexingprocess

• The indexing assigned by Condorcet without using natural-language analysis, illus-trating the contribution of the natural-language analysis component

• The indexing assigned by Condorcet in its full glory, forming the result to be evaluated

• The Elsevier indexing

• The Medline indexing

• The indexing by the Oracle

• A discussion of the differences between the various indexing systems

After discussing these twelve documents, we can draw our overall conclusions.

The notation used to present the various indexing procedures should be obvious, with a fewminor remarks:

• In some cases, the UMLS used as the ontology for our indexing, has disambiguatedidentical terms referring to different concepts; these are indicated by an integer inangle-brackets;

• In the Oracle-indexing, those simple concepts that also occur as arguments in a struc-tured concepts (and are therefore redundant), are given in parentheses.

1For one document, no indexing by the National Library of Medicine is available.


2.2 Comparison of indexing

2.2.1 Document 88008547

ID: 88008547TI: Increased levels of amino acid neurotransmitters in the inferior colliculus of the genetic-ally epilepsy-prone ratAB: Previous studies have shown an increase in the number of GABAergic and totalneurons in the inferior colliculus (IC) of the genetically epilepsy-prone rat (GEPR). Aminoacid analysis of the central nucleus of the IC, as well as cerebellum, sensorimotor, temporal,and occipital cerebral cortices of GEPRs with high pressure liquid chromatography showedsignificant increases in the levels of GABA, taurine and glutamate. The IC of GEPRdisplayed a 2.3-fold increase in GABA as compared to that of non-epileptic rats, a 2.4-foldincrease of taurine, and a 1.9-fold increase of glutamate. In addition, taurine and glutamatewere increased in the sensorimotor and temporal cortex, respectively. These results areconsistent with previous anatomical data on the GABAergic system in the IC and provideadditional information. The increase in taurine and glutamate in the IC indicates that otherneurotransmitters could be involved in the mechanisms of seizure activity.

Indexing according to Condorcet without NLE

affects( amino acids, inspiratory capacity )affects( analysis, inspiratory capacity )affects( chromatography: liquid, inspiratory capacity )affects( gaba, inspiratory capacity )affects( glutamate <1>, inspiratory capacity )affects( glutamate <1>, seizures )affects( glutamates, inspiratory capacity )affects( glutamates, seizures )affects( inspiratory capacity, rats )affects( inspiratory capacity, seizures )affects( neurotransmitters, inspiratory capacity )affects( neurotransmitters, seizures )affects( seizures, inspiratory capacity )affects( taurine, inspiratory capacity )affects( taurine, seizures )amino acid neurotransmitterscentralcerebellumcerebral cortexhighincreasedinferior colliculusinvolvedneuronsnucleus: nospressure


Indexing according to Condorcet

amino acid neurotransmittersamino acidsanalysiscerebellumcerebral cortexchromatography: liquidgabaglutamate <1>glutamatesincreasedinferior colliculusinspiratory capacityneuronsneurotransmittersnucleus: nosratsseizurestaurine

Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

AnimalBrain/metabolism/physiopathologyEpilepsy/genetics/*metabolismGABA/*metabolismGlutamates/*metabolismInferior Colliculus/*metabolism/physiopathologyRatsRats, Mutant Strains/*metabolism/physiologySupport, Non-U.S. Gov’tSupport, U.S. Gov’t, P.H.S.Taurine/*metabolism


Indexing according to the Oracle

(epilepsy)(neurotransmitters)affects(epilepsy, neurotransmitters)cerebellumcerebral cortexchromatographyepileptic seizuresgabaglutamate <1>glutamatesinferior colliculusratstaurine


The article is not easy to index using structured concepts, as the authors are not explicitabout causal mechanisms; they mainly describe their observations.

The relations found using Condorcet without NLE–in essence presenting all permutationsof concepts found in the sentences that survive type checking–are vague and wrong. Theabbreviation IC is wrongly expanded into inspiratory capacity, a problem that should be pre-ventable given the use of the correct expansion inferior colliculus in the text.

The Elsevier indexing does not explicitly mention epilepsy (but the specialized term audio-genic seizure should help remedy that omission). The term audiogenic seizure is not to befound in the abstract, suggesting that information from the full article has been used.

The NLM indexing best expresses the essential elements of this article, as clearly expressed(in this case) by its title.

Condorcet does rather well; the concept increased is unfortunately present in the UMLS, butnot very informative in isolation. The concept epilepsy is sorely missing (but implied byseizures).


2.2.2 Document 88008549

ID: 88008549TI: The anticonvulsant action of nafimidone on kindled amygdaloid seizures in ratsAB: The anticonvulsant effectiveness of nafimidone (1-[2-naphthoylmethyl]imidazolehydrochloride) was evaluated in the kindled amygdaloid seizure model in rats. Nafimid-one (3.1-120 mg/kg i.p.) was evaluated at 30 min in previously kindled rats using boththreshold (20 mu A increments) and suprathreshold (400 mu A) paradigms. Nafimidone(25-50 mg/kg) significantly reduced suprathreshold elicited afterdischarge length andseizure severity only at doses with some prestimulation toxicity. The maximum anticon-vulsant effectiveness for the 25 mg/kg i.p. dose of nafimidone was seen between 15 and 30min utilizing a suprathreshold kindling paradigm. Nafimidone did not significantly elevateseizure thresholds at the doses tested; however, nafimidone (3.1-50 mg/kg) reduced theseverity and afterdischarge duration of threshold elicited seizures in a non-dose responsemanner. Drug-induced electroencephalographic spikes were seen in both cortex andamygdala in most kindled rats receiving 100-120 mg/kg i.p. within 30 min of dosing beforeelectrical stimulation. The frequency of spike and wave complexes increased in most ofthese animals leading to drug-induced spontaneous seizures and death in approximately25% before electrical stimulation. This study has demonstrated that although nafimidonecan modify both threshold and suprathreshold elicited kindled amygdaloid seizures, itlacks significant specificity in this model of epilepsy.

Indexing according to Condorcet without NLE

affects( death <1>, animals )affects( death <1>, seizures )affects( electric stimulation, death <1> )affects( electric stimulation, seizures )affects( epilepsy, seizures )affects( nafimidone, epilepsy )affects( nafimidone, kindling (neurology) )affects( nafimidone, seizures )affects( nafimidone, toxicity )affects( seizures, animals )affects( seizures, death <1> )affects( seizures, epilepsy )affects( seizures, toxicity )affects( toxicity, seizures )amygdaloid bodyanticonvulsantsdeath <2>frequencyincreasedmanifestation_of( epilepsy, seizures )manifestation_of( seizures, epilepsy )modelratsseen


Indexing according to Condorcet

affects( nafimidone, seizures )affects( nafimidone, toxicity )amygdaloid bodyanimalsanticonvulsantsdeath <1>death <2>electric stimulationepilepsyfrequencyincreasedkindling (neurology)modelratsseenseverityspecificity

Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

Amygdaloid Body/physiopathologyAnimalAnticonvulsants/*therapeutic useDose-Response Relationship, DrugElectroencephalographyImidazoles/*therapeutic useKindling (Neurology)/*drug effectsMaleNaphazoline/*therapeutic useNaphazoline/analogs & derivativesRatsRats, Inbred StrainsSeizures/*drug therapy


Seizures/physiopathologyTime Factors

Indexing according to the Oracle

(epilepsy)(nafimidone)electroencephalographykindling (neurology)ratsseizurestreats(nafimidone, epilepsy)


One could argue that the Oracle has done too much interpretation here: there is no explicitmention of a therapeutic effect of nafimidone on epilepsy, but only a suppressing effect onkindled seizures. Condorcet expresses that nicely through its first affects-relation (kindledseizures, unfortunately, are not present in the UMLS ontology, preventing the even betterdescription affects( nafimidone, kindled seizures)).

There is general agreement between the four serious attempts at indexing here.


2.2.3 Document 88008554

ID: 88008554TI: Multietiological determinants of psychopathology and social competence in childrenwith epilepsyAB: The purpose of this investigation was to inquire into the multietiological determinantsof psychopathology and social competence in children with epilepsy. The relationshipbetween behavioral functioning as assessed by the Child Behavior Checklist and a varietyof biological, psychosocial, medication and demographic risk factors was investigated in asample of 183 children with epilepsy aged 6-16. Several risk factors were found to be relatedto each behavioral measure. The results are discussed both in terms of their implications formodels of psychopathology in epilepsy as well as their relationship to previous findings inthe epilepsy/psychopathology field.

Indexing according to Condorcet without NLE

child behaviorcompetencedomainfindingsmeasuresoccurs_in( epilepsy, child )occurs_in( epilepsy, psychopathology )occurs_in( psychopathology, child )result_of( epilepsy, epilepsy )result_of( epilepsy, psychopathology )result_of( psychopathology, epilepsy )risk factors

Indexing according to Condorcet

child behaviorcompetencefindingsmeasuresoccurs_in( epilepsy, child )occurs_in( psychopathology, child )risk factors

Indexing according to Elsevier (copyright Elsevier)



Indexing according to NLM (copyright NLM)

AdolescenceAge FactorsChildChild Behavior Disorders/*etiologyChild Behavior Disorders/psychologyEpilepsy/*psychologyEpilepsy/complicationsEpilepsy/drug therapyFamily CharacteristicsFemaleHumanMaleRisk FactorsSocial Environment

Indexing according to the Oracle

behavior <1>behavior <2>childepilepsypsychopathology


This is a mainly descriptional study, and thus not easy to index using structured concepts.

Condorcet ’beats’ Elsevier here by explicitly stating that the disease epilepsy studied hereoccurs in children. The weak typing of the UMLS ontology is reflected in the second occurs-relation in the Condorcet-indexing; obviously the two occurrences of occurs in are quite dif-ferent.

The NLM indexing has (as in several other cases) rather uninformative combinations ofterms, like male and female here.


2.2.4 Document 88009075

ID: 88009075TI: Cocaine-induced place conditioning: Importance of route of administration and otherprocedural variablesAB: It has been shown that pretreatment with dopamine (DA) receptor blockers disruptsthe effect of intravenously (IV) and intracerebrally (ICV), but not intraperitoneally (IP)administered cocaine on place preference conditioning (PPC). The present study wasundertaken to further evaluate possible differences between IV and IP cocaine PPC. Tothis end, several factors which may differentially influence IV and IP cocaine PPC wereexamined. Firstly, dose-response effects were studied. Intravenous cocaine producedPPC within a narrow dose range (0.5-2.5 mg/kg). Animals receiving IV injections of 5and 10 mg/kg cocaine experienced convulsions and did not show PPC. For IP cocaine a10-fold increase in dose (10 mg/kg) and twice the number of training trials was requiredin order to obtain PPC equal in magnitude to that with IV cocaine (0.5 mg/kg;# two trials).Cocaine PPC was retained at least 1 month. Following IV cocaine preference developedfor the side associated with the drug regardless of whether the conditioning was to theleast or most preferred side. After IP cocaine, preference developed for the drug side onlywhen the drug was paired with the least preferred side. Rats trained with IV, but not IP,cocaine significantly preferred the drug familiar side to a novel compartment. Preferencefor the IV or IP cocaine side developed regardless of whether testing was carried out in thedrugged or undrugged state, excluding possible state-dependent effects as an explanationof the cocaine PPC. The results show PPC procedure to be a valid test for evaluating re-warding properties of IV cocaine. However, they fail to show rewarding effects of IP cocaine.

Indexing according to Condorcet without NLE

administration <1>administration <2>affects( cocaine, conditioning (psychology) )affects( dopamine, conditioning (psychology) )animalsconvulsionsdisrupts( dopamine, conditioning (psychology) )drugsequalinjectionsivprocedures <1>ratstesttestingtraining <1>training programstwo


Page 18: Condorcet Final Report - · Condorcet Final Report 28 January 2000 Bas van Bakel Reinier T. Boon Nicolaas J.I. Mars Erik Oltmans Vossius Laboratory Department

Indexing according to Condorcet

administration <1>administration <2>affects( cocaine, conditioning (psychology) )animalsconvulsionsdisrupts( dopamine, conditioning (psychology) )drugsinjectionsivprocedures <1>ratstesttesting

Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

AnimalCocaine/*pharmacologyCocaine/administration & dosageConditioning, Operant/*drug effectsCues/drug effectsDose-Response Relationship, DrugInjections, IntraperitonealInjections, IntravenousMaleRatsRats, Inbred Strains

Indexing according to the Oracle

(cocaine)affects(cocaine, conditioning (psychology))animal <1>animalsconditioning (psychology)


Page 19: Condorcet Final Report - · Condorcet Final Report 28 January 2000 Bas van Bakel Reinier T. Boon Nicolaas J.I. Mars Erik Oltmans Vossius Laboratory Department


Even the Oracle could not express the essence of this article using the limited expressivenessof UMLS and its relations.

Elsevier has included epilepsy, even though the term does not occur in the abstract; this maybe due to a jumping to conclusions: not all seizures indicate epilepsy.

The NLM male does not come from the abstract either.

The administration procedures (IV, ICV, IP) are insufficiently represented in UMLS to allowexpressing their differential effect.


2.2.5 Document 88009078

ID: 88009078TI: Decreased susceptibility to local anesthetics-induced convulsions after paradoxical sleepdeprivationAB: The effect of 50% convulsant (CD-50) and 50% lethal (LD-50) doses of lidocaine,bupivacaine and pentylenetetrazol were determined in mice stressed by paradoxicalsleep deprivation (PSD). The results showed a reduced mortality after high doses of localanesthetics and decreased seizure susceptibility induced by bupivacaine after 48 h and72 h of PSD. On the other hand this stressful manipulation increased the susceptibility topentylenetetrazol-induced convulsions. These data may suggest a different mechanismof action for these drugs. Possible alterations in drug metabolism or on aminergic trans-mission after PSD may be involved in the protection to the toxic effect of the local anesthetics.

Indexing according to Condorcet without NLE

affects( anesthetics: local, metabolism <1> )affects( anesthetics: local, metabolism <2> )affects( anesthetics: local, seizures )affects( bupivacaine, seizures )affects( bupivacaine, sleep: rem )affects( convulsants, sleep: rem )affects( convulsions, sleep: rem )affects( lidocaine <1>, sleep: rem )affects( lidocaine <2>, sleep: rem )affects( manipulation: nos, convulsions )affects( pentylenetetrazole, sleep: rem )affects( sleep: rem, convulsions )affects( sleep: rem, mice )causes( anesthetics: local, seizures )causes( bupivacaine, seizures )decreaseddisease susceptibilitydrugshandhighincreasedinducedinvolvedlocalmortality <1>mortality <2>pharmacology <2>sleep deprivationtransmission


Indexing according to Condorcet

affects( manipulation: nos, convulsions )anesthetics: localcauses( bupivacaine, seizures )convulsantsdisease susceptibilitydrugshandinvolvedlidocaine <1>lidocaine <2>metabolism <1>metabolism <2>micemortality <1>mortality <2>pentylenetetrazolepharmacology <2>sleep deprivationsleep: remtransmission

Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

Anesthetics, Local/*pharmacologyAnimalBupivacaine/pharmacologyBupivacaine/toxicityConvulsions/*physiopathologyLidocaine/pharmacologyLidocaine/toxicityMaleMicePentylenetetrazole/pharmacologyPentylenetetrazole/toxicitySleep Deprivation/*physiologySleep, REM/*physiologySupport, Non-U.S. Gov’t


Indexing according to the Oracle

(sleep deprivation)affects(sleep deprivation, seizures)bupivacaineepileptic seizureslidocaine <1>lidocaine <2>micepentylenetetrazole


The three substances studied are duly noted by all indexing procedures, as is the sleepdeprivation and the experimental animal.

The essence of the article (well expressed in the title) requires more than a binary relation toexpress.


2.2.6 Document 88018585

ID: 88018585TI: Computed tomographic scans in posttraumatic epilepsyAB: The occurrence of posttraumatic epilepsy was studied in 219 patients who had had acomputed tomographic (CT) scan within three days after a civilian head trauma. Posttrau-matic epilepsy was observed in 13 patients. All of them had focal brain damage shownby CT scan. The predicting power of both clinical risk factors and CT scans was ana-lyzed by multiple logistic regression. Only an intracerebral hemorrhage and intracerebralhemorrhage plus satellite extracerebral hematoma proved significantly associated withposttraumatic epilepsy. This result has important implications in the design of posttrau-matic prophylaxis trials.

Indexing according to Condorcet without NLE

associated_with( cerebral hemorrhage, cerebral hemorrhage )associated_with( cerebral hemorrhage, epilepsy )associated_with( cerebral hemorrhage, hematoma )associated_with( epilepsy, cerebral hemorrhage )associated_with( epilepsy, hematoma )associated_with( hematoma, cerebral hemorrhage )associated_with( hematoma, epilepsy )brain injuriesclinicalfocallogistic regressionmultipleoccurrenceoccurs_in( epilepsy, patients )occurs_in( head injuries, patients )power (psychology)prevention & controlrisk factors

Indexing according to Condorcet

associated_with( cerebral hemorrhage, epilepsy )brain injurieshead injurieshematomalogistic regressionoccurs_in( epilepsy, patients )power (psychology)prevention & controlrisk factors


Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

*Tomography, X-Ray ComputedBrain/*radiographyCerebral Hemorrhage/complicationsCerebral Hemorrhage/radiographyEpilepsy, Post-Traumatic/*radiographyEpilepsy, Post-Traumatic/etiologyHumanRegression AnalysisRisk Factors

Indexing according to the Oracle

(epilepsy)(head injuries)causes(head injuries, epilepsy)cerebral hemorrhagetomography, emission-computedtomography, radionuclide-computedtomography, transmission computedtomography, x-ray computed


The essence (CT-scans are a better predictor of posttraumatic epilepsy than are clinical riskfactors) requires a deeper representational structure than available to us to express.

Overall, there is good agreement between the indexing procedures.


2.2.7 Document 88018590

ID: 88018590TI: Ambulatory cassette electroencephalography of psychiatric patientsAB: Indications and results of ambulatory cassette electroencephalography obtained on133 hospitalized psychiatric patients were reviewed. Interictal epileptiform abnormalitieswere detected in 15 patients (11%), of whom six had an established diagnosis of epilepsy.Actual seizures were recorded in two patients, of whom one had an established diagnosisof epilepsy. Subclinical seizure activity was detected in only one patient, who also exper-ienced overt seizures. Routine screening of psychiatric patients with ambulatory cassetteelectroencephalography does not seem to be justified, although the test can provide usefuladjunctive evidence to support the diagnosis of epilepsy and clarify the nature of suspiciousclinical episodes in selected patients.

Indexing according to Condorcet without NLE

clinicaldiagnosis <1>diagnosis <2>electroencephalographyindicationsoccurs_in( abnormalities <1>, patients )occurs_in( abnormalities <2>, patients )occurs_in( epilepsy, patients )occurs_in( seizures, patients )psychiatricroutinescreening <1>screening <2>testtwo

Indexing according to Condorcet

diagnosis <1>diagnosis <2>electroencephalographyepilepsyindicationsoccurs_in( abnormalities <1>, patients )occurs_in( abnormalities <2>, patients )occurs_in( seizures, patients )routinescreening <1>screening <2>test


Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

AdolescenceAdultAgedChildChild, PreschoolElectroencephalography/*methodsEpilepsy/diagnosisEpilepsy/physiopathologyFemaleHospitalizationHumanMaleMental Disorders/*physiopathologyMiddle AgeSeizures/diagnosisSeizures/physiopathology

Indexing according to the Oracle

(diagnosis <1>)(diagnosis <2>)(electroencephalography)(epilepsy)diagnoses(electroencephalography, epilepsy)patientspsychiatricseizures


This example shows that no matter how elaborate a predefined ontology may be, there willalways be a need to extend it. In this case, electroencephalography as recorded by ambulat-ory cassette recorders is the issue, but this concept is not present in the UMLS.

The NLM shows the noninformative conjunction of child, preschool child, adolescence, adult,middle age, aged as well as male, female.


2.2.8 Document 88019360

ID: 88019360TI: Dependence as a limiting factor in the clinical use of minor tranquillizersAB: The recognition that all minor tranquillizers carry the risk of dependence has had asignificant impact in their prescription over the years. But it has only recently had the sameimpact on the prescribing of benzodiazepines because their dependence risks were notrecognized until late. Approximately one third of all patients prescribed a benzodiazepineregularly for six weeks or longer will experience withdrawal symptoms if the drug iswithdrawn suddenly after this time. Even if the drug is withdrawn gradually withdrawalsymptoms may still lead to demands for further prescription. The major change in prescrib-ing has been towards shorter and intermittent treatment so that tolerance is reduced andwithdrawal symptoms avoided. This is appropriate for acute anxiety reactions but moredifficult for longer term anxious and depressive neurotic disorders, which have a muchlonger natural history. Continuing evidence that other drugs not specifically marketed forthe relief of anxiety, particularly the antidepressants, are effective in relieving this anxietyhas led to increased prescription of antidepressants. Some patients may also be helpedby treatment with beta-blocking drugs and new agents such as buspirone which have nosignificant dependence potential. There has also been a move away from drug treatmentto psychological treatments for anxiety as a consequence of concern over dependence. Forsome conditions, particularly medical ones such as spasticity and epilepsy, benzodiazepinesmay be considered for long-term treatment. They may also be regarded as necessary formore severe psychiatric disorders, usually as an adjunct to other therapy. In such instancesthe dependence risk is acknowledged but the benefits of treatment are considered tooutweigh them. There may also be patients who are dependent on benzodiazepines butthe alternative of withdrawing the drug may lead to dependence on a more dangerousdrug such as alcohol. In such cases it is reasonable to regard continued prescription ofthe benzodiazepine as the least dangerous course of action. It is important to maintain aperspective of dependence on minor tranquillizers, particularly as attitudes are in danger ofbeing distorted by excessive media attention. To date there is no evidence that dependenceon benzodiazepines leads to any dangerous long term sequelae although there is concernover their effects on higher cognitive function. Nevertheless, the dangers of barbiturates,alcohol and nicotine are so much greater that it would be unfortunate if public concern led toexcessive restrictions on the use of benzodiazepines. Newer compounds that are as effectiveas the benzodiazepines and do not carry the same dependence risks are needed urgentlybut to date there are none that satisfy these requirements. It is salutary for clinicians toremind themselves that all drugs effective in the treatment of anxiety have proved to bemajor drugs of dependence and it seems unlikely that new compounds are going to escapethe same fate.

Indexing according to Condorcet without NLE

acuteaffects( antidepressive agents, anxiety )


affects( benzodiazepines, physiology <2> )affects( benzodiazepines, sequelae )affects( drugs, anxiety )affects( physiology <2>, sequelae )affects( prescriptions, anxiety )affects( sequelae, physiology <2> )affects( therapy <1>, anxiety )affects( therapy <1>, tolerance <2> )affects( therapy <1>, tolerance: nos )affects( treatment <1>, anxiety )affects( treatment <1>, tolerance <2> )affects( treatment <1>, tolerance: nos )alcoholsattentionattitudebarbituratesbenzodiazepinebuspironeclinicaleffectiveepilepsyincreasedleadleadslongmajormedicalmedication prescribingmental disordersminornatural historyneurotic disordersnicotine <1>nicotine <2>occurs_in( anxiety, anxiety )patientspotentialriskseverespasticitythirdtimetunica mediautilizationwithdrawal symptoms

Indexing according to Condorcet

alcoholsantidepressive agentsanxietyattentionattitudebarbituratesbenzodiazepine


Page 29: Condorcet Final Report - · Condorcet Final Report 28 January 2000 Bas van Bakel Reinier T. Boon Nicolaas J.I. Mars Erik Oltmans Vossius Laboratory Department

benzodiazepinesbuspironedrugseffectiveepilepsyleadleadsmedication prescribingmental disordersnatural historyneurotic disordersnicotine <1>nicotine <2>patientsphysiology <2>potentialprescriptionsrisksequelaespasticitytherapy <1>timetolerance <2>tolerance: nostreatment <1>tunica mediautilizationwithdrawal symptoms

Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

AnimalAnti-Anxiety Agents/*adverse effectsHumanSubstance-Related Disorders/*psychology


Indexing according to the Oracle

alcoholsantidepressive agentsbarbituratesbenzodiazepinebuspironedrug dependencedrugsepilepsymental disordersspasticitytranquilizing agentswithdrawal symptoms


This, obviously, should not count as an abstract, but as a full article. Nevertheless, the vari-ous substances mentioned explicitly are well detected by all procedures.


2.2.9 Document 88021951

ID: 88021951TI: Evaluation of the anticonvulsant and biochemical activity of CGS 8216 and CGS 9896 inanimal modelsAB: CGS 8216, a benzodiazepine-receptor ligand with inverse agonistic properties, andCGS 9896, which possesses partial agonistic or mixed agonist-antagonist properties werecompared in a number of epilepsy models. The effect of CGS 9896 on the decrease in GABAlevels induced by isoniazid was also investigated. CGS 9896 inhibited the kindling processin rats in that it delayed the development of overt seizures and the increase in the durationof after-discharges. In a genetic rat model characterized by absence-like EEG patterns, CGS9896 dose-dependently suppressed these spontaneously occurring discharges, while CGS8216 had no effect. However, CGS 8216 antagonized the anticonvulsant action of CGS 9896.CGS 9896 protected mice against seizures induced by beta-vinyllactic acid, whereas CGS8216 shortened the latency period before convulsions occurred. CGS 9896 retarded theonset of convulsive fits caused by isoniazid without preventing the decrease in GABA levelsproduced by that drug. These results confirm the anticonvulsant activity of CGS 9896 anddemonstrate the inverse agonistic activity of CGS 8216. The profile of CGS 9896 in the abovetests suggests that it might be an effective anticonvulsant, primarily in absence-type seizures.

Indexing according to Condorcet without NLE

affects( anticonvulsants, seizures )affects( cgs 9896, development )affects( cgs 9896, kindling (neurology) )affects( cgs 9896, seizures )affects( development, kindling (neurology) )affects( development, rats )affects( development, seizures )affects( kindling (neurology), development )affects( kindling (neurology), rats )affects( kindling (neurology), seizures )affects( seizures, development )affects( seizures, kindling (neurology) )affects( seizures, rats )animal <1>animalsbiochemicalcauses( acids, convulsions )causes( acids, seizures )causes( cgs 8216, convulsions )causes( cgs 8216, seizures )causes( cgs 9896, convulsions )causes( cgs 9896, seizures )delayeddischargedrugseffective


electroencephalographyepilepsyevaluation <1>evaluation <2>evaluation <3>gabainducedinteracts_with( cgs 8216, cgs 9896 )interacts_with( cgs 9896, cgs 8216 )isoniazidlatency period (psychology)ligandsmanifestation_of( seizures, development )manifestation_of( seizures, kindling (neurology) )micemixedmodel

Indexing according to Condorcet

acidsanimal <1>animalsanticonvulsantscgs 8216cgs 9896convulsionsdelayeddevelopmentdischargedrugselectroencephalographyepilepsyevaluation <1>evaluation <2>evaluation <3>gabainducedisoniazidkindling (neurology)latency period (psychology)ligandsmicemodelratsseizures

Indexing according to Elsevier (copyright Elsevier)



Indexing according to NLM (copyright NLM)

AnimalAnticonvulsants/pharmacology/*therapeutic useBrain/drug effects/*metabolism/physiopathologyDiazepamElectric StimulationEpilepsy/chemically induced/*drug therapy/metabolismGABA/*metabolismMalePyrazoles/pharmacology/*therapeutic useRatsRats, Inbred Strains

Indexing according to the Oracle

(cgs 9896)animal <1>animalsanticonvulsantsbenzodiazepinebeta-vinyllactic acidcgs 8216gabaisoniazidkindling (neurology)micemodelratsseizuresseizures, absencetreats(cgs 9896, epilepsy)


Two substances, indicated both by their code as well as their functional mechanism, are men-tioned. It is far beyond the state of machine interpretation to decipher the precise workingof the two from this abstract.

The different customer populations aimed at by Elsevier and NLM become clear in this ex-


ample: Elsevier provides the proper chemical names for the two substances, while NLMomits them.


2.2.10 Document 88100203

ID: 88100203TI: Effects of zonisamide in children with epilepsyAB: The effects of zonisamide (1,2-benzisoxazole-3-methanesulfonamide: AD-810) werestudied in 50 children with epilepsy, ranging in age from 3 months to 20 years (mean, 10.5years). The types of epilepsy were primary generalized in one case, secondary generalizedin 32, and partial in 17. The initial dose was 1-6 mg/kg/day and the dose was increasedto 1.5-15 mg/kg/day. Four cases (8%) showed a complete disappearance of seizures andthirteen patients (26%) had a disappearance rate of 50% or more of seizures. Disappearanceor improvement of seizures was obtained in 31% of the cases of generalized epilepsyand in 41% of the cases of partial epilepsy. Zonisamide was effective in 39% of cases ofLennox-Gastaut syndrome. Seizures completely disappeared in three of the four newcases. Spike discharges disappeared or significantly decreased in 22% of the cases that hadundergone electroencephalograms. The blood levels of zonisamide were 10.8-18.8 mu g/mlin the three new cases when the seizures were controlled. Side effects such as drowsiness,ataxia, and salivation were observed in 42% of the children, more particularly in childrenreceiving polypharmacy.

Indexing according to Condorcet without NLE

adverse effectsaffects( epilepsy: generalized, epilepsy: partial )affects( epilepsy: generalized, seizures )affects( epilepsy: partial, epilepsy: generalized )affects( epilepsy: partial, seizures )affects( seizures, epilepsy: generalized )affects( seizures, epilepsy: partial )affects( seizures, seizures )affects( zonisamide, epilepsy )affects( zonisamide, epilepsy: petit mal: myoclonic astatic )ageblood <1>blood <2>completecontrolleddecreaseddischargeeffectiveelectroencephalogramgeneralizedincreasedmanifestation_of( ataxia, salivation )manifestation_of( drowsiness, ataxia )manifestation_of( drowsiness, salivation )manifestation_of( seizures, seizures )occurs_in( ataxia, child )occurs_in( epilepsy, child )patients


polypharmacyresult_of( seizures, seizures )secondary <1>secondary <2>

Indexing according to Condorcet

adverse effectsaffects( zonisamide, epilepsy )affects( zonisamide, epilepsy: petit mal: myoclonic astatic )ageblood <1>blood <2>controlleddecreaseddischargedrowsinesselectroencephalogramepilepsy: generalizedepilepsy: partialgeneralizedincreasedoccurs_in( ataxia, child )occurs_in( epilepsy, child )patientspolypharmacysalivationseizures

Indexing according to Elsevier (copyright Elsevier)


Indexing according to NLM (copyright NLM)

*** This document is not indexed by Medline ***


Indexing according to the Oracle

(ataxia)(drownsiness)(epilepsy)(salivation)(zonisamide)childcomplicates(zonisamide, ataxia)complicates(zonisamide, drowsiness)complicates(zonisamide, salivation)epilepsy, generalizedepilepsy, partialepilepsy, petit mal, myoclonic astaticepileptic seizurestreats(zonisamide, epilepsy)


This is probably the document where Condorcet has performed best. The structured conceptaffects(zonisamide, epilepsy) (and the more specific variant) expresses an important topic of thisarticle. The Oracle would have preferred the even more specific relation complicates ratherthan affects. However, the index concept adverse effects is an acceptable second-best way ofexpressing the relation.

Judging by the amount of text in the abstract, the curative aspect of zonisamide is even moreimportant, as expressed by the Oracle in treats(zonisamide, epilepsy). Affects, of course, couldmean both treats as well as complicates.

The rather ad-hoc composition of the UMLS used as our ontology gives rise to a number ofrather useless index concepts: age, blood, controlled, decreased, increased, patients.

All side-effects of the drug specified in the abstract are present in Condorcet’s output: ataxia(as argument in the relation), drowsiness, salivation. As has been said before, the relationoccurs in is generally not very informative.

The Elsevier indexing expresses well the three side effects, but for unclear reasons missescompletely the drug name zonisamide.

For this document, the Oracle also studied the full text of the article. This has given riseto only two additional remarks. The first remark is that the text shows the article to be the(alleged) first to discuss the use of zonisamide in children rather than in adults or animals.There is no obvious way to express this fact clearly through the indexing concepts; childdoes express the fact that children are studied but does not emphasize the contrast madebetween them and adult patients. The second remark is that the list of side effects given inthe abstract is a subset of a longer list in the article. After ordering these side effects in orderof decreasing frequency of occurrence, the three most frequent ones were deemed to deserve


mentioning in the abstract (an arbitrary but defendable decision).

In summary: Condorcet performs here as well as possible given the limitations of the onto-logy used.


2.2.11 Document 88104710

ID: 88104710TI: Primidone and phenobarbital during lactation period in epileptic women: Total and freedrug serum levels in the nursed infants and their effects on neonatal behaviorAB: A total of 35 newborns whose mothers had been treated with either primidone(PMD), phenobarbital (PB) or a combination of one of these drugs with other antiepilepticdrugs were included in this study. Fetal/maternal serum concentration ratios at birth,milk/serum concentration ratios and neonatal half-lives were determined for PMD, PB andphenylethylmalondiamide (PEMA). Steady-state serum levels of PMD in 2 nursed infantswere 2.5 and 0.7 mu g/ml, respectively, for PEMA values of 1.4 and 0.4 mu g/ml. PBsteady-state concentrations ranged between 2.0 and 13.0 mu g/ml (6 infants). Maternal PBserum protein binding did not change during and after pregnancy. Neonatal free-fractionvalues at birth were similar to maternal values: 63.2 ± 17.2% (n = 11). In the postnatalperiod, however, PB free-fraction values rose to more than 90% in some infants. In one case,neonatal free concentrations of PB were even higher during the 1st week after birth thanthe corresponding maternal values. Symptoms of sedation were observed in these neonatesfor which elevated free-fraction values of PB could be responsible. Behavior problems,such as withdrawal symptoms, were observed in neonates who eliminated PMD with shorthalf-lives, while other infants with longer PMD half-lives or slower elimination because ofnursing showed no such symptoms.

Indexing according to Condorcet without NLE

affects( behavior <1>, lactation )affects( behavior <1>, women )affects( drugs, behavior <1> )affects( drugs, lactation )affects( lactation, behavior <1> )affects( lactation, women )affects( phenobarbital, behavior <1> )affects( phenobarbital, lactation )affects( primidone <1>, behavior <1> )affects( primidone <1>, lactation )affects( primidone <2>, behavior <1> )affects( primidone <2>, lactation )behavior <2>birthconcentrationelevatedmanifestation_of( behavior problems, excretion: functional )manifestation_of( symptoms, behavior problems )manifestation_of( symptoms, excretion: functional )manifestation_of( withdrawal symptoms, behavior problems )manifestation_of( withdrawal symptoms, excretion: functional )milkmothersnursing <1>


occurs_in( behavior problems, infant )occurs_in( behavior problems, infant: newborn )phenylethylmalondiamidepregnancyprotein bindingresult_of( behavior problems, excretion: functional )result_of( behavior problems, nursing <2> )result_of( excretion: functional, behavior problems )sedationserumshortvalues

Indexing according to Condorcet

behavior <1>behavior <2>birthconcentrationdrugselevatedexcretion: functionalinfantlactationmilkmothersnursing <1>nursing <2>occurs_in( behavior problems, infant: newborn )phenobarbitalphenylethylmalondiamidepregnancyprimidone <1>primidone <2>protein bindingsedationserumsymptomsvalueswithdrawal symptomswomen

Indexing according to Elsevier (copyright Elsevier)



Indexing according to NLM (copyright NLM)

ADDITIONAL MESHBehavior/*drug effectsEpilepsy/*metabolismFemaleHEADINGS:Half-LifeHumanInfant, NewbornMilk, Human/metabolismPhenobarbital/*pharmacokineticsPhenobarbital/bloodPhenobarbital/pharmacologyPrimidone/*pharmacokineticsPrimidone/bloodPrimidone/pharmacologySubstance Withdrawal Syndrome/metabolism

Indexing according to the Oracle

(drug withdrawal symptoms)adverse effectsaffects(primidone, behavior <1>)affects(primidone, behavior <2>)behavior <1>behavior <2>blood serumcauses(primidone, drug withdrawal symptoms)drug combinationsepilepsyinfant, newbornlactationmilk, humanphenobarbitalphenylethylmalonamideprimidone


The article (at least according to the abstract) is mainly a description of observations andmeasurements in a specific clinical setting; little interpretation is given. A number of anti-epileptic drugs is mentioned, and these are duly noted in all indexing systems.

The relation studied in this article is that between the anti-epileptic drugs taken by a motherand the drug levels (and the levels of metabolites) and the side-effects caused by them intheir newborn babies. That relation is too complex to be expressed by any of the index-ing systems. All have to make do with approximating the relation by giving a set of con-cepts indicating lactation, newborn infant, and epilepsy. The Oracle assigns one relationcauses(primidone, drug withdrawal symptoms, which does not make clear that the drug and thesymptoms occur in different persons.

In the Condorcet indexing, the cluster of concepts lactation, milk, mothers, women and nurs-ing obviously contains redundancy. The women could have been eliminated by use of anontology showing that all mothers are women; nursing and lactation are almost synonym-ous, and milk should have been recognized as the essential stuff of lactation. We clearlysee the limitations of the use of binary relations only in this case. An obvious next stepwould be to allow a complex structure like causes(occurs in(antiepileptic drugs, mothers), occursin(drug withdrawal symptoms, new-born infants)).

Looking at the full text of this article, the Oracle agrees with Elsevier that pharmacokinetics isan important issue in this case, necessitating the additional keyword. It also becomes clearthat assigning indexing terms is always done (by humans) with a particular user group inmind. In this case, this is illustrated by the lack of indexing terms referring to the analyticalmethods and tools used and mentioned in the article. Whereas medical bibliographic data-bases aim to include all names of drugs mentioned even in passing in an article, the samedoes not (yet?) apply to equipment types and brands. A Condorcet-like system could, ofcourse, easily be extended to provide this service.

In summary: Condorcet performs rather well in this case.


2.2.12 Document 88174154

ID: 88174154TI: Neuroendocrine effects of chronic administration of sodium valproate in epilepticpatientsAB: In 10 untreated epileptic patients, we evaluated the functional integrity of thehypothalamic-pituitary axis before and during chronic treatment with sodium valproate,a gamma-aminobutyric acid-mimetic compound. The GH response to L-dopa (250-500mg po) was absent in 3 and severely impaired in 2 of the 10 patients though being, on theaverage, only slightly lower in the epileptic subjects than in normal controls. Conversely,the GH rise following GHRH (0.5 mu g/kg body weight, iv) was normal in 9 of thepatients. A significant blunting of the GH response to L-dopa occurred in the 7 patientsinitially responsive after 6 months of sodium valproate (P < 0.05). The GH response toGHRH also underwent an evident though not significant attenuation. The ACTH andthe ACTH/cortisol elevations elicited by metyrapone (35 mg/kg body weight infusedover 4 h), and CRH (1 mu g/kg body weight, iv), respectively, normal before treatment,were significantly impaired (P < 0.05, < 0.01) during antiepileptic therapy. Prolactinand TSH dynamics following metoclopramide (0.1 mg/kg body weight, iv) and TRH200 mu g iv) remained normal over the whole study period. Growth arrest ensued in 1patient after 6 months of sodium valproate and disappeared after drug withdrawal. Theseobservations point to a defective hypothalamic control of GH secretion in some epilepticpatients. They also indicate that chronic administration of sodium valproate, hence activ-ation of central gamma-aminobutyric acid system, results in a blunting of the stimulatedGH and ACTH secretion. Occasionally, a reversible arrest of skeletal growth may also ensue.

Indexing according to Condorcet without NLE

administration <1>administration <2>affects( gaba, secretion )affects( growth <1>, substance withdrawal syndrome )affects( growth <2>, substance withdrawal syndrome )affects( sodium valproate, growth <1> )affects( sodium valproate, growth <2> )affects( sodium valproate, secretion )affects( sodium valproate, substance withdrawal syndrome )affects( substance withdrawal syndrome, growth <1> )affects( substance withdrawal syndrome, growth <2> )arrestaxisbody weightcentralchroniccompoundcontrolcorticotropiniv


levodopametoclopramidemetyraponenormaloccurs_in( substance withdrawal syndrome, patients )prolactinreversiblestate of impairmenttherapy <1>thyrotropintreatment <1>

Indexing according to Condorcet

administration <1>administration <2>arrestaxisbody weightcompoundcontrolcorticotropingabagrowth <1>growth <2>ivlevodopametoclopramidemetyraponenormalpatientsprolactinsecretionsodium valproatestate of impairmentsubstance withdrawal syndrometherapy <1>thyrotropintreatment <1>

Indexing according to Elsevier (copyright Elsevier)



Indexing according to NLM (copyright NLM)

AdolescenceChildCorticotropin/bloodCorticotropin/secretionEpilepsy/*drug therapyEpilepsy/bloodEpilepsy/physiopathologyFemaleHumanHypothalamo-Hypophyseal System/physiopathologyLevodopa/administration & dosageMaleProlactin/bloodProlactin/secretionSomatotropin-Releasing Hormone/administration & dosageSomatotropin/bloodSomatotropin/secretionSupport, Non-U.S. Gov’tThyrotropin/bloodThyrotropin/secretionValproic Acid/*administration & dosage

Indexing according to the Oracle

(neurosecretory systems)(sodium valproate)adverse effectsaffects(sodium valproate, neurosecretory system)corticotropincorticotropin-releasing hormoneepilepsygabagrowth retardationhydrocortisonehypothalamuslevodopametoclopramidemetyraponepituitary glandprolactin


A large number of drugs are mentioned in the abstract, and all indexing services mentionmost of them, in some cases after normalizing their names.

Medline provides a number of not very informative keywords, including adolescence,child, human, male, female. The important side-effect mentioned in the article of growth-retardation is missed by Medline, present in Embase, and approximated in Condorcet bygrowth.

Condorcet in this case provides a number of indeed applicable but not very useful indexconcepts like administration, arrest, axis, compound, control, normal, therapy, treatment;these could, according to the Oracle, be eliminated from the ontology, or at the very least berestricted to usage in relations only.

In summary: Condorcet gives a quite acceptable description of the aboutness of this article,comparable to Embase and better than Medline.


Chapter 3

Conclusions and recommendations

3.1 Conclusions

In the first annual report we defined the IR problem as “that of realizing an IR system thatperforms at better overall costs than existing systems”. Two approaches to this problem weredistinguished, referred to as the ‘classical’ approach (in which index terms are taken from athesaurus or classification list) and ‘approach B’ (which derives the document representationfrom the document itself by stemming and stop word removal).

Condorcet has adopted the classical approach, and proposed to combat the high costs of in-dexing by partly automating the indexing process, which has been one of the main researchthemes. Another major theme was to employ modern versions of the traditional sourcesof index terms, thesauri and classification systems, which are called ontologies. Condorcet’sapproach to IR can be thus summarized as follows:

• index concepts rather than terms are assigned to documents;

• the concepts can be (and very often will be) structured; and

• assignment of indexing concepts to documents is done semi-automatically on the basisof analysis of the document’s title and abstract, by making intensive use of domainknowledge, linguistic knowledge and indexing knowledge.

The first two items aim to offer a better indexing language. The index concepts are takenfrom pre-defined concept systems: the ontologies mentioned above. They are structured in away that goes considerably beyond the possibilities offered by current classification systemslike thesauri. The third item intends to combat the high costs of indexing that characterizescurrent classical information retrieval projects.

Condorcet has concentrated on the indexing process, under the (tacit) assumption that


matching is relatively simple (or that matching is a research topic in its own right). Weare now in a position to determine whether we have achieved the goals envisioned at thebeginning of the Condorcet project.

As has been said before, evaluation can be performed on several levels:

• on the linguistic level

• on the indexing level

• on the document retrieval level

• on the user level

Each successive level in this list requires the preceding one for its successful operation, butcannot be reduced to it. Furthermore, there is a (steep) increase in the difficulty of evaluatingperformance in this ordering of levels.

The Condorcet project has been restricted to the first two levels. The linguistic level hasbeen discussed exhaustively in Oltmans’ thesis. The second level, the indexing level, will befurther discussed here, on the basis of the comparison of various indexing systems presentedin the preceding chapter.

The approach used by Condorcet depends critically on the availability of an all-encompassing, detailed and well-thought-out set of indexing concepts: an ontology. Thequality of this ontology has an immediate effect on the quality of the indexing process.

For materials science, one of the original two test domains for Condorcet, we found that theexisting thesauri in this domain were not suitable for use in Condorcet. For medicine, thesituation was different, thanks to the vision of the National Library of Medicine in develop-ing its Unified Medical Language System (UMLS).

However, even the UMLS is not without its problem. One problem is most likely causedby its history. UMLS was originally developed by merging and integrating several existingcollections of keywords using for indexing medical literature or case records. Even thoughexplicit distinctions are made between concepts and terms (and variants of terms), the res-ulting collection is not sufficiently well-defined. In the case studies, we identified severaloccurrences of rather uninformative concepts like increase. In isolation, such terms havelittle value. In context with other concepts (say, blood pressure), they are more informative,but then the precise relation between the two concepts should be made explicit.

The set of relations given in UMLS (of which a subset is used in Condorcet) is insufficientlymature. Much more work will need to be done on identifying which relations are importantto express the aboutness of a medical publication.

On the basis of the case studies reported in the previous chapter (and other comparisons notreported here) we can summarize our conclusions as follows:


• For assigning unstructured keywords (as is commonly done in successful bibliographicaldatabases like Medline and Embase), Condorcet performs at least as well as manualindexers;

• Condorcet is more consistent than human indexers, a property that is of value whereidentifying all substances used in a study are to be identified;

• Condorcet currently lacks the knowledge to determine which of the keywords iden-tified in the linguistic procedure should be kept and which ones are insufficiently in-formative to merit presentation;

• Condorcet does a reasonable job in assigning structured keywords;

• The expressiveness of the system of binary structured keywords is less than we hadhoped for.

It is impossible to evaluate the cost-effectiveness of the Condorcet approach at this time (butwe will still attempt to do so). Constructing the Condorcet system has taken several person-years. However, that high costs could (and should) be amortized over several applications.Thanks to its modular structure, the Condorcet system can be adapted rather easily to otherdomains. The main cost in such a modification will be in adapting or creating the ontologyneeded.

The cost of actually using the completed Condorcet system to index documents is trivial(see the run times reported in Oltmans’ thesis). Thus, as envisaged, the cost-effectiveness ofCondorcet will be better the larger the collection of documents to be indexed becomes.

3.2 Recommendations

In this section, we will indicate ways in which our approach could be extended or other-wise modified to match its performance with the goals of an organization desiring to useautomated indexing.

In the previous section, we emphasized the importance of a good ontology of the domainto be indexed. Even for medicine, which –thanks to the availability of UMLS– is way aheadof other fields, the existing material leaves much to be desired. However, the approachfollowed in developing UMLS, with gradual iterative improvement through internationalcollaboration, has much to commend itself. Such collaboration will help in achieving thegoal of an internationally agreed-upon, standardized ontology for a domain. The fact thatan ontology is a set of concepts rather than of terms, will help (but not completely) eliminatenational differences in the ontologies. Standards like XML can contribute to easy exchangeand reuse.

An important innovation in the Condorcet project has been the use of structured sets of con-cepts, rather than isolated concepts. As we mentioned in the case studies, the expressiveness


of the binary relations allowed in Condorcet is not always sufficient to indicate the about-ness of documents. On the other hand: allowing a much richer representation (for instanceby allowing recursion) will most likely lead to unsurmountable problems in the linguisticanalysis. Whether a carefully selected set of a priori defined more complex structures couldhelp, remains an important research issue.

On a more practical level, some improvements to the indexing process can be obtained byrefining the lexicon. In particular, the handling of abbreviations should be strengthened. Notheoretical difficulties are foreseen with that task.

Condorcet is currently limited to analyzing one sentence at a time. This leads sometimes tomissed concepts, although not quite as often as we had feared originally (due to the highredundancy, even of abstracts). The obvious next step would be to analyze whole abstracts,resolving the references between the sentences. An intermediate step, well worth consider-ing, would be to analyze pairs of neighboring sentences: if most anaphoric references in asentence are to the immediately preceding or following sentence, this simpler type of ana-lysis should be sufficient.

In a few cases, we have seen that the aboutness of a document is not completely ascertainablefrom its title and abstract. Short of the remedy of improving the quality of titles and abstracts,analyzing the full article could be attempted. However, the objections to that approach givenin our original proposal six years ago still stand: such material is much more difficult toanalyze, as being less to the point, containing illustrations and formulae, and even jokes. Webelieve analyzing this material using the Condorcet approach will be doomed to fail.

A final problem requiring further work is that of defining the goal of indexing. The existingbibliographic databases have evolved to serve a certain type of use and of user, but the effectthis should have on the type of indexing is not made explicit. An example of this occurred inone of the case studies in the previous chapter, where a number of chemical substances werementioned in the abstract. Embase identified all of them; Medline much less so, correctlyreflecting their intended user group. Another, more complicated example would be thedifference between wishing to retrieve an article about a disease, versus any article usinga certain methodology.

Any organization desiring to invest substantially in developing a semi-automatic indexingsystem a la Condorcet should first define as precisely as possible what types of retrieval areto be supported, and what the goals for precision and recall are. If these goals are both sethigh, and the collection of documents is very large, and the domain is (conceptually) stable,Condorcet’s approach is worth considering.


Appendix A


In the course of the Condorcet project, a number of publications have appeared. A completebibliography of the project is given here.

An archival publication is defined here as being a publication that has been fully refereedon the basis of the full text and published in a publicly available source (thus excludingworkshop notes and the like), or a paper on invitation for a conference with widely dis-tributed proceedings. All other public material has been assembled under “Miscellaneous”,section A.2.

Appendix B

About the authors

The Condorcet project started in September 1995 and was completed on 28 January 2000.The project team at one time consisted of six researchers; the four authors of this reportcontinued till the end of the project.

Dr. B.J. (Bas) van Bakel (1964) obtained his M.A. degree in General Linguistics at the Uni-versity of Nijmegen in 1989, writing M.A. theses on the fields of Generative Grammarand Computational Linguistics.

From 1990 until 1994 he worked as a Ph.D. student at the Department of Language &Speech of the University of Nijmegen, on a research project involved in developing aNatural Language Processing system for automatic semantic analysis of English texts.The result, the English Language Semantic Analyzer (ELSA), is described in Van Bakel’sPh.D. thesis entitled A Linguistic Approach to Automatic Information Extraction.

In Condorcet, he was project manager. His research was concerned with semanticanalysis and index concept generation.

Since August 1999, he has been with Daidalos BV in Zoetermeer.

Ing. R.T. (Reinier) Boon (1969) graduated in engineering at the Higher Informatics Edu-cation of the Enschede Polytechnic in 1993, with a final-year assignment on lexiconmanagement at the Knowledge-Based Systems Group of the University of Twente.

He spent approximately one year on an ESPRIT project on object-oriented distributedmulti-media databases as a member of the database group of the University of Twente.After fulfilling his civil service, he rejoined the Knowledge-Based Systems Group tobecome scientific programmer on the Condorcet project. N.J.I. (Nicolaas) Mars (1949) obtained his M.Sc. degree in Electrical Engineeringand his Ph.D. degree in Technical Sciences in 1974 and 1982, respectively, both at theUniversity of Twente. His Ph.D.-thesis was entitled Computer-augmented analysis of elec-troencephalograms in epilepsy.

From 1972 until 1976 he worked as a statistician/computer programmer at Buro Goud-appel en Coffeng in Deventer. From December 1976 until November 1980 he was


with the Bio-informatics Group, Department of Electrical Engineering, University ofTwente. From November 1980 until October 1984 he was a Researcher at the MedicalInformatics Group, University of Leiden. From October 1985 until October 1989 hewas a ZWO Constantijn en Christiaan Huygens Fellow. This fellowship allowed himto move into the field of Artificial Intelligence. As part of this re-direction, he spent the1985/86 academic year at Yale University, New Haven, CT, USA. On 1 September 1986he became a part-time, and on 1 November 1989 a full-time Professor of ComputerScience at the University of Twente, in expert systems and in knowledge-based sys-tems, respectively. Since November 1990, he is also Advisory Professor of ComputerScience at East-China Normal University, Shanghai, P.R. China. Since September 1996,he is also Deputy Director of the Netherlands Institute for Scientific Information Ser-vices (NIWI) in Amsterdam. Professor Mars carries final responsibility for the project.His research interests in Condorcet include upscaling, machine-readable lexica, andontological engineering.

Dr. J.A.E. (Erik) Oltmans (1967) obtained his M.A. degree in Computational Linguistics atthe University of Nijmegen in 1994. His M.A. thesis was concerned with updating theAMAZON-parser and transforming it into the AGFL-formalism. In 1995 Oltmans wasan employee at the Department of Informatics of the University of Nijmegen. Duringthat time he generated the 324,000 words AMAZON-lexicon. In Condorcet, Oltmanswas researcher in linguistic engineering. His research is concerned with syntactic andsemantic analysis of English, as well as investigating techniques for robust NaturalLanguage Processing. The work Oltmans did in the Condorcet project resulted in hisPh.D.-thesis A knowledge-based approach to robust parsing.

Since January 1, 2000, Oltmans has been with the Telematics Institute in Enschede, TheNetherlands.