Expert Systems for Document Retrieval: Problems in Capturing

Extrait de la Revue Informatique et Statistique dans les Sciences humaines XXIV, 1 à 4, 1988. C.I.P.L. - Université de Liège - Tous droits réservés.

Expert Systems for Document Retrieval:Problems in Capturing Synonym Relations

from the Experts'

Robert L. OAKMAN

A key problem in designing information retrieval systems is making theinformation contained in the system available quickly and easily, with aminimum of training or preparation required by the user. New lisers maybe experts in their fields and have no problems with the mechanics of usingthe computer; yet they may still have diflicu1ty in retrieving informationbecause they are not familiar with the particular concept structures used inthe information system, such as basic terms and synonym relations. In sornecases, the requestor may fail ta find information simply because he or she refersto a concept different from the ones included in the system.

Instead of requiring the user to learu the terminology and structure of theretrieval system, it seems preferable to make the system capable of matchingthe terIns in the user's query with those already in the bibliographical database.Techniques derived from artilicial intelligence and knowledge-based systems areDOW heing used ta design user-friendly natural-language analysis programs forinformation retrieval systems in an attempt ta solve these problems. Softwarepackages such as LIFER and CLOUT notify the researcher whenever the systemdoes not recognize a word or phrase included in a query. They prompt theperson to deline the new expression in terms of keywords already in the system.When a satisfactory delinition is reached, the new word or phrase is added to thesystem vocabulary, so that it can be freely used in the future. The researcher

• This paper grows out of a short report, "Constructing Synonym and ImplicationRelations of Concept Terms: Empirical Considerations ll

, written by C. F. Baker, G. Biswas,J. C. Bezdek, and me and included in the Proceedings of the Third University of SouthCarolina Computer Science Symposium, March 31- April!, 1986. l wish to acknowledge thecontribution and helpful collaboration of these colleagues in this research.


292 ROBERT L. OAKMAN

gradually tailors the system to his or her individual vocabulary, and the addedvocabulary may he stored either in an individual profile for each user or in acommon vocabulary list. However 1 mûst users really waut ta spend time gettinganswers from the database, not constructing it.

This approach seems mûst appropriate for systems in wh.ich a few majorusers with a clear idea of the types and organization of the data within thesystem will be using the system regularly. In such cases, effort spent incustomizing the system for its few users will pay for itself in better results.

Snch solutions are not 50 successful for information retrieval systems in ageneral library, with a large Humber of users, none of whom use the systemvery regularly. This is especially true for users who are seeking knowledge inan unfamiliar field. Unacquainted with either the vocabulary or the structureof knowledge in a field, they will be unable to supply correct terminologyin a bibliographical query and are likely ta use incorrect terms or ta makeincorrect assumptions about subordination and superordination of one conceptta another in the unfamiliar field. If naïve users enter incorrect definitions orsynonyms into the system, they will cause problems for themselves in the futureand will make the system work worse for others, if their mistakes are includedin common vocabulary lists.

"Expert systems", based on knowledge gathered from people about howthey make decisions, may offer computer assistance in controlling the qualityof bibliographical searches for the general case. Scholars in the humanitiesand social sciences often complain that current computerized bibliographicalservices like DIALOG or BRS require search requests which are too specificand crisp to match the more subjective nature of their requestsj consequently,the results are often disappointing.' In Computer Science at South Carolinawe believe that an Uexpert-systems" approach, which one writer has called "anautomated copy of humall expertise" 2

1 may he more productive. Like aIl expertsystems, our program must cOlltain information concerlling how people thillkabout the problem (the expert 's knowledge or "knowledge base" of faets andrelations among them) and how they use their knowledge to make decislons(the "inference engine"; methods of using relations for problem solving).

For our computer-hased document retrieval, we are trying to addressthe subjective nature of language with the insights of "fuzzy logic" andcomputational linguistics. We are working on a prototype system which

1 P. CHAMPLIN (1985), "The Online Search: Sorne Pitfalls and Perils", Research Quarterly,'Vinter Issue, pp. 213~17.

2 B. D'AMBROSIO (1985), "Expert Systems ~ Myth or Reality?" Byte, January, p. 282.


EXPERT SYSTEMS 293

aeeepts a bibliographieal query as an English sentence and parses it intoa seareh request that is then subnùtted to the database of bibliographiealinformation. Curreutly the grammar is being upgraded from rather primitivepattern matching using an augmented transition network to a case grammarbased on the semantie clues embodied in English prepositional phrases.

The program also accepts such Hfuzzy" subjective language as a requestfor "five important, recent articles on euthanasia" by treating recent andimportant as fuzzy variables on a scale of a to 1. Five is a crisp variabledefining clearly how many references are being sought. But recent is IIfuzzy"and will be weighted in the seareh request by dates as follows: articles from1987 or 1986 (signifieanee 1.0), 1985 (signifieanee .9), 1984 (signifieanee .8),1983 (signifieance .7), etc. Similarly, important will be weighted by the valueof the article as pereeived by the person who abstraeted it for the database. Byfollowiug through the bibliographieal database of concepts, dates, number ofarticles sought, relevanee, etc., the system will produce a set of bibliographiealitems for the user's perusal. He or she may then modify the request bymaking it more general or specifie, depending on what the program originallyfound.

Il is the otller end of OUr system, the eapturing of expert knowledge forinclusion in the bibliographical data bank, that 1 want to focus on here. Forexample, we are trying to find out how experts in a field decide that a requestfor euthanasia would want to piek up articles eataloged under mercy killingand probably even medical ethics about death and dying. We are working onideas ahout synollyms and how one verbal request implies another in a persoll'smind.

Il seems best to get initial voeabulary items for the database from bona /ideexperts and to huild accurate definitions of terms, a reasonable range of possiblesynonyms, and correct information on hierarchical and implicational semanticrelations into the knowledge base from the beginning. The system ean then bedesigned to use this knowledge to bridge the gap between the voeabulary of theuser's query and the concepts aetually appearing in the knowledge base.

Let us look more closely at an example. A linguist who wants papers ontransfo"mational grammar would probably also like to see titles dealing withdeep structure, the base level of tranformational language theory. If a paperwas not marked for the keyword transfol'nwtional grammar but was noted to bestrongly about deep structure (for instance, 0.7 on a seale of 0 to 1), a requestorlooking for transformational grammar would probably want to tonsider thisarticle. Synonym matrices between related techllical concepts and terms needto he constructed 50 that a user's request for transformational grammar rnayimply to sorne extent a request for deep strudure as a concept terrn and



search for it accordingly.. Recent research by colleagues at South Carolina3

on automated document retrieva! with fuzzy logic has been promising in thisregard, especially in the chaining of fuzzy numerical relations for synonyms andsemantic implications among related concepts.

Several authors have suggested the need to get tables of numerieal valuesfor synonym or implicational relations among terms, but never quite explainhow these values are to be obtained from natural language. The founder offuzzy logic, L.A. Zadeh, suggested in 1971 that "fuzziness of meaning can betreated quantitatively, at least in principle". 4 Burghard Rieger notes, however,that this foundation has been laid more theoretieally than practieally:

The empirical side of it, concerning questions of how the meaningof a term described as a fuzzy set may he deteeted, or how themembership-grades may he ascertained and associated with theclements of a descrlptor set in a particular case, these questionshave Dot even been touched upon.

What we need is a way to get experts to develop the concepts and fuzzyimplication relations for inclusion in our bibliographie database, which can thenbe analyzed by our system of numerieal chaining among the fuzzy relations. Totest the practicality of asking experts in a field to specify these synonym audimplication relations, we set up a small statistical experilllent with a set ofsimilar linguistic terllls, such a grammar, deep structure, and surface structure.For each pair of terms, a group of nine linguists, eight students in a graduatelinguistics course and their teacher, were asked to choose a numerical coefficientto define the degree of implication. Specifica1ly, the linguists were asked on thequestionnaire to Uindicate the extent to which the term in column A could hereplaced in a bibliographie query by the term in column B, on a scale of 0 to100, where a 100 indicates that replacement would always be acceptable, and ao that it would never he". One of the respondents returned the questionnairewith two 100's aud the rest of the items left blank, and tItis response had to bediscarded. The replies given by the remaining eight, together with the range,mean, and standard deviation for each item and for each respondent, are shownin Table 1.

The results are noticeahly inconsistent and often opposite from one personto allother for the same word pair (see, for example, syntax and surfacestructure). Of the 30 pairs of terms, those interviewed agreed within 50 points

3 G. BISWAS, J.C. BEZDEK, M. MARQUES, and V. SUBRAMANIAN (1986), "Knowledge·Assisted Document Retrieval", Journal of ASIS, in press.

4 Quoted in B. RIEGER (1979), "Fuzzy Structural Semantics: On a Generative Madel ofVague Natural Language Meaningn, in Progress in Cybernetics and System" ResearchJ eds.R. Trappl, F. Hanika, and F. Pichler, Vol. V, pp. 495-503.


EXPERT SYSTEMS 295

Table 1Results of Survey on Replaceability of Linguistic Terma Using Numerical Responses

Respondent E, S, S, S3 S, S, S, S, Range Mean Std.Dev.

Term-pairgr-syn 50 20 80 40 20- 80 47.50 21.65gr-ds 10 10 0 10 0- 10 7.50 4.33syn-gr 50 20 90 80 20- 90 60.00 27.39syn-ss 90 100 0 90 0-100 70.00 4Q.62syn-ds 50 10 0 10 0- 50 17.50 19.20tran-ds 0 60 0 90 0- 90 37.50 38.97ps-ss 0 0 0 70 0- 70 17.50 30.31ss-ds 0 0 0 0 0- 0 .00 .00ds-tran 0 10 0 90 0- 90 25.00 37.75ds-ss 0 0 0 0 0- 0 .00 .00tran-syn 80 50 50 30 30- 80 52.50 17.85gr-ss 95 0 60 20 0- 95 43.75 36.64syn-tran 90 0 50 30 0- 90 42.50 32.69syn-ps 20 50 60 30 20- 60 40.00 15.81gr-tran 80 10 50 20 10- 80 40.00 27.39gr-ps 20 10 50 20 10- 50 25.00 15.00tran-ps 20 0 60 0 0- 60 20.00 24.49ds-gram 20 50 40 40 20- 50 37.50 10.90ss-ps 0 0 80 0 0- 80 20.00 34.64ps-gr 20 10 50 30 10- 50 27.50 14.79tran-ss 0 40 0 0 30 10 50 0 0- 50 16.25 19.32ss-gr 10 80 0 90 95 25 80 0 0- 95 47.50 39.69ss-syn 60 100 0 80 90 50 60 40 0-100 60.00 29.58ss-tran 0 40 0 20 30 75 60 30 0- 75 31.88 24.74ps-ds 100 20 60 70 100 100 10 0 0-100 57.50 39.61ps-syn 30 50 60 80 20 50 50 30 20- 80 46.25 17.98tran-gr 5 10 20 50 50 25 50 40 5- 50 31.25 17.46ps-tran 0 30 1 90 0 25 50 0 0- 90 24.50 30.26ds-syn 80 10 0 10 30 50 40 40 0- 80 32.50 24.37ds-ps 100 30 60 80 100 0 20 2 0-100 48.75 39.19

Respondent E, S, S, S3 S, S, S, S,By person:Min 0 0 0 0 0 0 10 0Max 100 100 90 90 100 100 80 40Mean 31.75 32 18.55 52.5 49.5 29.5 51 20Std. Dev. 36.44 30.59 30.67 35.76 35.81 27.83 15.78 15.81

Abbreviations:gr = grammar syn = syntax ds =deep structuress = surface structure ps = phrase structure tran = transformationFor example "ss-ps" represents replacîng the term "surface structure" withthe term "phrase structure" in a query.



of each other only on 9 pairs, while 4 pairs were given ratings ail the way fromoto 100. On the lOO-point scale, standard deviations are greater than 30 for11 of the 30 pairs. The highest and lowest means for individuals, 18.6 aud 52.5,show that there is also considerable variation frOID persan ta persan in overallperception of strenght of implication relations.

Granted its limited nature, the experiment suggests that experts find itdifficult to deline numerical degrees of synonymity or content implicationsamong such sets of related terms, at least in linguistics, with any level ofconsistency. In faet, our results corroborate exactly those of carBer researcherslike Szlovits and Pauker5

, whose work we found later.In 1978 they wrote: "... while people seem quite prepared to give qualita

tive estimates of likelihood, they are often notoriously unwilling to give precisenumerical estimates ta outcomesll

• Rememher that one linguistics student wasso intimidated by the numerical requirement of the survey that she only attempted two pairs and quit.

Perhaps a better solution would be to try a weighted scale of linguisticvariables like those userl in opinion surveys and psychological questionnaires.For instance, the Likert BeaIe, which typical1y uses five responses, representsa continuum from "strongly agree'l ta llstrongly disagree". R. Beyth-Maromsuggests the use of a similar measure: "... A verbal scale of probabilityexpressions is a compromise between people's resistance ta the use of numbersand the necessity ta have a common numerical scalell

•6 P.P. Bonissone and

K.S. Decker have recently proposed a nine-item verbal scale of probabilities,which can then be mapped onto fuzzy numbers on the [O,lJ intervai usingtrapezoidal functions. 7 A graph of the membership functions and theirsemantic values suggested by Bonissone is shown in figure 1.

To test this new approach) we set up a second experiment) using anonnumerical method of eliciting synollym relations. Like Likert, we employeda continuum of five levels of replaceability, from "completely equiva1ene' to"unrelated") with the three intervening lillguistic choices of ('very similar",Hsimilaru) and "not very similar". Two months after the first experiment,we surveyed again seven of the eight people who turned in full questionnairesbefore) using the same term pairs as data. The directions were the same as those

5 P. SZLOVITS and S.G. PAUKER (1978), llCategorical and Probabilistic Reasoning inMedical Diagnosisll

, Artificial Intelligence Journal, Vol. lI, pp. 115-44.

6 R. BEYTH-MARoM (1982), "How Probable is Probable? A Numerical TaxonomyTranslation of Verbal Probability Expressions", Journal of Forecasting, Vol. 1, pp. 257~69.

1 P.P. BONISSONE and K.S. DECKER (1985), IlSelecting Uncertainty Calculi and Granularity: An Experiment in Trading Off Precision and Complexity", Technical Report, GeneralElectric Research Laboratories, Schenedady, New York.


EXPERT SYSTEMS

\rX1\

0.0

......_ \ ,t----------------\ [-----------,\ - - --\\/ \ l ,\ \:' \1 ,\ \ 1V \/, \ \1i\ X 1 \ \/

"",/\ f\, \ V

\ /\ 1 \ A\ / \ , \ 1\\ f \ 1 \ J\i 1 \ 1,

Term-SetFigure 1

297

1.0

of the lirst survey except for the substitution of the live phrases for numbersas responses. Table 2 summarizes the second set of results after we quantifiedthe live phrases from ,00 ("uurelated") to 1.00 ("completely equivalent"), withintermediate values of .25, .50, and .75.

This time we found much more similarity among respondents. Forthe 30 pairs of linguisties terms, 18 were within a numerical range of .50 fromeach other, which means that they were within two linguistic gradations of oneanother; before, only half that many met this criterion, No set of word pairswas marked over the full spectrum of "equivalenel to uunrelated)', whereasthis situation had occurred with the numerical questionnaire. SimilarlYI standard deviations from mean vaines dropped appropriately. In only one casedid the responses to a word pair have a standard deviation greater than .30(for the terms, phrase structure/surface structure); earlier there had been 11 ofthe 30 pairs with these high variations among the responses.

On the basis of our small statistical surveysl we cau see that it is betterto ask experts to decipher similarities or differences between word groups orconcepts nsing a choiee along a linguistie continuum. Cleady this yields moreconsistent and apparently accurate implicational relations. We may posit thatthe live levels of similarity and difference might be quantified as fuzzy functionssimilar to Bonissone's approach for the implication matrices neecled in our fuzzyretrieval database.

Yet questions remain about the ability of the experts ta make reallyconsistent synonym and implication tables) even with a sliding linguistic scale.Our sample tests were modestj yet they verified the views of earlier scholarsabout employing linguistie rather than numerieal estimates from the experts.But how reliable were our second set of relations? For example, only 3 ofthe 8 pairs of terms given ta al! 7 people had mean scores of similaritygreater than ,50 (phrase structure/deep structure, phrase structure/syntax,and deep structure/phrase structure). 1 think ail linguists can perceive the


298

Table 2Second Survey (Using Verbal Responses)

ROBERT L. OAKMAN

Respondent 1 2 3 4 5 6 7 8 Range Mean Std.Dev.E, S, S2 S3 S, S, S. S,Term·pairgr~syn .50 .50 .25 .25- .50 .417 .118gr-ds .50 .50 .25 .25- .50 .417 .118syn-gr .50 .50 .50 .50- .50 .500 .000syn-s8 .25 .75 .25 .25- .75 .417 .236syn-ds .50 .50 .25 .25- .50 .417 .118tran-dg .25 .25 .75 .25- .75 .417 .236ps-ss .00 .75 .25 .00- .75 .333 .312ss-dg .25 .25 .00 .00- .25 .167 .118ds-tran .25 .50 .50 .25- .50 .417 .118dS-S8 .00 .25 .00 .00- .25 .083 .118tran-syu .50 .25 .50 .75 .25- .75 .500 .177gr-s8 .00 .50 .75 .25 .00- .75 .375 .280syn-tran .00 .25 .50 .75 .00- .75 .375 .280syn-ps .50 .25 .50 .50 .25- .50 .438 .108gr-tran .25 .25 .50 .25 .25- .50 .313 .108gr-ps .00 .25 .50 .25 .00- .50 .250 .177tran-ps .00 .00 .25 .50 .00- .50 .188 .207ds-gram .00 .50 .25 .75 .00- .75 .375 .2805S-PS .00 .25 .50 .50 .00- .50 .313 .207ps-gr .00 .25 .50 .50 .00- .50 .313 .207tran-ss .25 .50 .00 .25 .25 .25 .75 .00- .75 .321 .220ss-gr .25 .75 .00 .50 .50 .75 .25 .00- .75 .429 .258ss-syu .25 .75 .00 .75 .50 .50 .50 .00- .75 .464 .247ss-tran .25 .25 .00 .25 .25 .50 .75 .00- .75 .321 .220ps-ds .75 .25 .75 .50 .50 .25 .50 .25- .75 .500 .189ps-syu .25 .75 .25 .75 .25 .50 .75 .25- .75 .500 .231tran-gr .25 .25 .25 .00 .25 .25 .75 .00- .75 .286 .208ps-tran .00 .25 .00 .50 .25 .50 .75 .00- .75 .321 .258ds-syu .50 .50 .00 .50 .50 .50 .75 .00- .75 .464 .208ds-ps .75 .25 1.00 .50 .75 .25 .50 .25 -1.00 .571 .258

Mean Range .52

By person:

Min .00 .25 .00 .00 .00 - .25 .25Max .75 .75 1.00 .75 .75 - .75 .75Mean .325 .463 .175 .375 .338 .450 .563 .384Std.Dev. .211 .198 .286 .230 .163 .150 .192 .238

semantic relation among these pairs for bibliographical purposes. On theother hand, with a mean of score of .464 for surface structure/syntaxl a persanwhose bibliographical query asked about surface strue/ure would probably notretrieved any references categorized under syntax. Ta me these distinctions,when quantified, seem too subtle and open to criticism. One would certainly


EXPERT SYSTEMS 299

never get any references to transformation if one asked for phrase structure(mean score of .321); yet we all know that phrase structures are the buildingbolcks of transformational grammar.

Last year 1 presented preliminary results of our survey at a conferenceon expert systems attended by a number of computer scientists working inthis field. 1 commented as a computational linguist on the positive natureof our results using the linguistie Likert scaie. Then 1 closed with my viewthat semantic reference in language is still too complex to put great faith inthe method of the expert questionnaire for building synonym and implicationtables. One computer scientist building another very large expert system fordocument retrieval disagreed strongly. He argued that our results could havebeen really consistent if we had asked real experts. Six of the seven responde.ntsof survey 2 were ollly graduate studellts in linguistics, along with one professor.According to the computer scientist, we shouJd have asked only professorsto do the questionnairej then we would have got clean and verifiable results.Synonyms wouJd have been readily identified, and words without implicationfor each other would not have beeu confused. Believing in the ethos of theexpert system so strongly, he had to believe it is easy to capture the expert'sknowledge for feeding into the computer. As a linguist, 1 disagree.

Computer scientists and engilleers are usually people of technique morethan philosophy. The are trained to believe that any issue can be treated as aproblem capable of solution. Given the problem, they set out to solve it withthe tools of their trade, sometimes without an adequate sense of the subjectitself. They know how to use their marvelously flexible and powerfuJ machinebut do not know enough about a subject to know the right questions. In myview, the computer scientist's belief in the clear efficacy of the computer modelof the expert system ignores the issue of whether the expert knowledge, in thiscase for content terms for document retrievaI, cau be readily captured. Ourexperimellts suggest that current methods are not as successful as we wouldlike.

In document retrieval with an expert system, we at South Carolina areleft with several interesting problems. Not only must we work to improvethe syntactic analysis of user's queries in order to build search strategies forthe database, but more importantly we must be creative in the knowledgeacquisition of bibliographieal information from experts in a subject field.

Contrary to the views of my computer science critic, 1 suspect that thismay be the greater challenge. Based on our experiments, there is much thatremains to be done. Indeed, thinking in terms of a working expert computermodel of language understanding is opening new reaims for creative linguistiethought for computer scientists and linguists alike.

Expert Systems for Document Retrieval: Problems in Capturing

Documents

Transcript of Expert Systems for Document Retrieval: Problems in Capturing