Approaches to Locating Expertise Using Corporate Knowledge · IAM Group, University of Southampton,...

Copyright © 2003 John Wiley & Sons, Ltd. Int. J. Intell. Sys. Acc. Fin. Mgmt. 11, 185–200 (2002)

LOCATING EXPERTISE 185

Copyright © 2003 John Wiley & Sons, Ltd.

International Journal of Intelligent Systems in Accounting, Finance & Management

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/isaf.232

Int. J. Intell. Sys. Acc. Fin. Mgmt. 11, 185–200 (2002)

INTRODUCTION

In the course of most activities, people face pro-blems that they cannot solve alone. Their naturalresponse is to study past experiences and re-usepreviously acquired knowledge, either from theirown experiences or from resources within theirorganization. Goa et al. (1998) estimated that 90%of industrial design activity is based on variantdesign, whereas in a redesign activity 70% of theinformation is re-used from previous solutions(Khadilkar and Stauffer, 1996). For many pro-blems, access to documentation through hyper-media or similar systems may give adequatesolutions (Crowder et al., 1998). However, formany problems people need to have specific

Approaches to LocatingExpertise Using CorporateKnowledgeRichard Crowder,* Gareth Hughes and Wendy HallIAM Group, University of Southampton, Southampton, UK

ABSTRACT In many organizations people need to locate colleagues with knowledge andinformation to resolve a problem. Computer-based systems that assist users withfinding such expertise are increasingly important to industrial organizations. Inthis paper we discuss the development of Expertise Finders suitable for use withinthe engineering design environment, as illustrated through the use of a scenario.A key feature of this work is that the Expertise Finder returns both recommendedcontacts and supporting documentation. The Expertise Finder bases its results oninformation held within the organization, e.g. on-line publications repositories,human resource records, and not on individually compiled Curriculum Vitaes orother forms of user-maintained records. The recommendations are presented tothe user with due regard to the social context, and are supported by the documentsused to make the recommendation. Copyright © 2003 John Wiley & Sons, Ltd.

* Correspondence to: R. Crowder, IAM Group,Department of Electronics and Computer Science,University of Southampton, Southampton, UK.E-mail: [email protected]/grant sponsor: UK Engineering and PhysicalSciences Research Council; Contract/grant number:GR/M83582.

information above that given by documents aloneto resolve the issue. In this paper, the term exper-tise assumes the embodiment of knowledge andskills within individuals. This definition distin-guishes expertise from an expert. An individualmay have different levels of expertise about dif-ferent topics. Expertise can be topical or pro-cedural and is arranged and valued within theorganization. In some cases, expertise can becaptured from a person and used to populate adatabase. This works very well when the pro-blem is restricted to a very specific domain, e.g.robot maintenance (Auriol et al., 1999). However,for many problems the required expertise canonly be accessed through a social network.

To solve a specific problem people want tofind other people with the required expertisequickly. In many organizations, key personnel(managers, senior employees, information con-cierges; McDonald and Ackerman, 2000) willfacilitate the contacts. Recommender systemsare one approach to automate this process, byaugmenting and assisting the natural expertise-locating behaviour within an organization. A


186 R. CROWDER ET AL.

recommender system that suggests peoplewho have some expertise with a problem holdsthe promise to provide, in a small way, a servicesimilar to these key personnel. Expertise re-commender systems can also reduce the loadon people in these roles and provide alterna-tive recommendations when these people areunavailable.

In the recommendations provided by theExpertise Finder, trust is important; this canbe achieved by showing why people were notrecommended or why a document was not con-sidered so important. A document might seemrelevant based on a full text search but it is actu-ally 20 years old; this is an important factor insome situations, but not in others. The provisionof evidence for its decisions in the form of a listof documents and other data is considered akey Expertise Finder output. This approach con-trasts with a number of reported systems whereWeb-based information is used to provide therecommendation (Bollacker, et al., 1998; Becerra-Fernandez, 2000; Chandrasekaran and Joshi, 2001).Answer Garden 2 (Ackerman and McDonald,1996) has an explicit expertise-location engineand provides computer-mediated communica-tions mechanisms to find others with a range ofexpertise, though the mechanisms a not veryelaborate. A different approach was taken byMcDonnald and Ackerman (1998), who usedsoftware developed by employees to identifytheir expertise in various aspects of softwaredevelopment.

ENGINEERING DESIGN ENVIRONMENT

Our work with Expertise Finders has been tar-geted towards use within the engineering designenvironment. The design environment is currentlyundergoing rapid changes with social and tech-nical drivers. In general, the design environmentis highly distributed in nature and is character-ized by a large number of information sources,which, together with the designers, forms a com-plex sociotechnical system. The paper by Wallaceet al. (2001), discussed an outline for the futurevision of the engineering design environment,and concluded that a range of knowledge man-agement tools would be required to support their

vision. One of the objectives of our work hasbeen to define a future engineering designenvironment, with particular emphasis on thesocial and technical systems that will supportdesigners in their day-to-day activities (Crowderet al., 2003).

In order to determine the future requirementsof the engineering design environment, wegathered a considerable amount of informationthrough a range of techniques, including inter-views with designers and their managers, alloca-tion of function exercises (Clegg et al., 2000), andthe analysis of the current design practice. Theresults were developed into a detailed scenariowhich was evolved through discussions with thecurrent design community. The scenario adoptsa sociotechnical perspective in dealing with thecapture, sharing and re-use of knowledge withinthe design context (Crowder et al., 2003). As such,the ideas and practices outlined throughout thescenario promote a complementary social andtechnical approach to managing knowledge inthe future engineering design process.

The Scenario

The scenario was not intended to be the detailedscript for future work, but rather a resource fordiscussing the design process, for planning theresearch activities to realize the vision and forclarifying the interests and concerns that moti-vate it. In the scenario it is assumed that the tech-nical elements of the future design environmenthave been embodied in an application termedKTfD (Knowledge Tools for Designers). The pro-posed architecture, Figure 1, shows that all theinformation required by the designer can beaccessed through the KTfD desktop. It is recog-nized that populating the databases and the asso-ciated links is a key issue, but outside the scopeof this paper. All objects within the KTfD data-base are version controlled and the system isconfigured so that all documents are storedlocally, ensuring that they are available even ifthe original source is irretrievable. As shown inFigure 1, KTfD is able to access information fromanywhere in the design office, including, if re-quired, through the local wireless network. Theflexibility gives a degree of pervasive computingto the user, as it permits active reconfiguration as



a function of location. It is the widespread inte-gration of corporate systems through the KTfDenvironment that is a cornerstone to the opera-tion of an Expertise Finder. It is worth notingthat the KTfD is not only for knowledge manage-ment, it also has access to the full range of officeand data analysis tools. While the full scenariocovers all aspects of the designers’ activitiesduring the design process and the subsequentinteraction with the knowledge cycle, this paperwill only consider those activities that relatedirectly to Expertise Finders.

In the scenario developed, a design engineerhas been tasked to resolve a problem relating tothe bonding of components in a gas turbine.Using the KTfD desktop, the Team Leader initi-ates the activity by retrieving the debonding prob-lem report from the Product Data Management(PDM) system and defining it as a new KTfDprime issue to be resolved, by opening a newdesign folder. As part of the definition process a

new job number is issued, together with links tothe background information.

Obtaining Previous Work and BackgroundInformationFrom the KTfD workspace the designer is able tolocate final reports relating to previous projectsacross the company, using a conventional searchengine and the PDM system. On opening thereports and browsing the documents, the designeris able to identify the options considered and thereasons for the choices made. The designer doesnot immediately recognize any of the peopleinvolved, but sees that there is a short audio-visual item provided by the original designer ofdiscussion with an adhesives expert. Selectingthe expert present in the discussion, KTfD opensan information window that informs the designerthat the original designer is now a manager inan adjacent area; the designer makes contact withthe manager and arranges a meeting. One of the

Figure 1 The KTfD concept



documents reveals that there was a debondingproblem on a previous product. Browsing thedocuments reveals a primary cause, and the de-signer decides to contact the engineer identifiedagainst that item through KTfD, but first reviewsthe presentations given at various design reviewmeetings.

The people who are contacted by the designermay be considered experts, and they are locatedby past information in the design files, and infor-mation provided by previous users. In many casesthe information provided may be incomplete, orselective, depending on the editing history of thedocuments.

Searching for Colleagues with Prior KnowledgeDuring a review of the original design reportsit quickly becomes apparent that this problemhas been looked at before. The designer entersa query into the Expertise Finder through theKTfD desktop that searches for colleagues thathave prior knowledge of using adhesives to bondmetals within a high-temperature environment.The results reveal a number of possible contactsacross the organization.

As envisaged, the Expertise Finder will allowthe user to find people primarily, not documents,i.e. answering the ‘who knows about . . .’ questions.The system will use design output, stored docu-ments and personnel information to generate atechnical profile of an individual. The solutionwill exploit linking within the information space,e.g. between documents, workflow informationwithin PDM, and an individual’s job profilewithin the human resources databases. Thegeneral requirement for such a system is thatthe architecture should be modular and flexibleand allow queries from multiple viewpoints tobe resolved. Our proposed systems will be dis-cussed in the section ‘Expertise Finding: ProblemDefinition and Context’.

Other Design ActivitiesIt rapidly becomes clear that the problem involvesthe choice of adhesive and the loads to which thebond is subjected. To clarify a number of points,the designer arranges for tests to be undertakenon a number of samples. The results are down-loaded directly to the workstation, allowingtheir rapid analysis. In the design process, a large

number of elements are drawn together; hence,KTfD provides facilities to prioritize tasks. Asthe work progresses, the results of discussions,contact and queries are added to the record ofthe design process; these could include test re-sults for a material laboratory, requests to manu-facture test components and e-mails to externalsuppliers for support and quotations. As thisinformation is processed and stored, KTfD willadd any required metadata, allowing the infor-mation to be used subsequently by the designer,and during its retrieval by any search queries,including those from the Expertise Finder.

At the regular review meeting, the team is ableto review the current situation regarding theresolution of the problem with senior staff. It isclear that a number of key sub-issues have yet tobe resolved; as the meeting progresses it is clearthat a slight change of emphasis is required. Asthe decisions are made, the designer edits thedesign rationale, by rejecting a number of issues,and defining a number of other issues that re-quire more information. At the conclusion of thereview meeting all the work-package modifica-tions and queries are distributed immediately tothe team. In addition, the system will identifyany issue or activity that has slipped with refer-ence to the initial time plan. The use of KTfDin the context of a meeting brings a number of sig-nificant advantages, namely all design docu-mentation can be presented electronically at themeeting and annotated immediately as required.Feedback to the design team is instantaneous, asthe meeting notes are produced on the fly andare fully cross-referenced to the design item.Experienced designers routinely attend designreviews; this information can be used by theExpertise Finder algorithm to rank people’sexperience in a particular subject. Part of theirrole is to coach less-experienced designers ingood approaches to design problems. This face-to-face approach enables experience that is hardto document, or which has not been documented,to be transferred.

The final report describes the problem beingconsidered and the details of the solution chosen,with the analysis of the problem, and the alter-natives explored as linked rationale graphs.Exporting a snapshot of the active documents inthe KTfD database generates the final version of



the final report to be submitted for approval. Thedesigner(s) creates the final report by linking inappropriate pieces of text and graphics containedin it to a wider design rationale graph. This con-sists of a network of issues, proposed answers,arguments and quantitative selection criteria, withattached e-mails, faxes, models, test result files,etc. The on-line final report and design rationaleare easily browsed, with the ability to follow linksin either direction between the two documents,and again can be used to locate expertise.

Feedback on the Scenario’s View ofExpertise Finding

In our discussions with practising designers andmanagers, the overall impressions of the scenariowere very positive about the concepts and pro-posed implementation of the Expertise Finder.It was clear that there were a number of reserva-tions regarding its implementation across a majormultinational company, in particular with regardto the accuracy and freshness of the electronicdata. Although this is a critical issue, it is more areflection of the company’s Knowledge Manage-ment policy than on the limitation of an Exper-tise Finder. It should be recognized that, withinthe KTfD concept, automated capture of infor-mation will resolve this problem for current,but not for legacy, designs. In any case, if theinformation and knowledge is ‘hoarded’ by anindividual there is no way it can be used by anExpertise Finder, or by any other information tool.

In reviews, current designers were particularlysupportive of the notion of supplying both ex-perts and supporting documents. This ensuresthat the user can make informed, justified deci-sions about who to contact.

EXPERTISE FINDING: PROBLEMDEFINITION AND CONTEXT

When attempting to find an answer to a problempeople will tend to use the social network aroundthem. It is natural to first ask people nearby ifthey know the answer or if they can recommendsomeone else who may know the answer. Thus,a chain of connections are made utilizing theexperienced members of an organization. As

people are now being moved around organiza-tions at a faster rate and organizations are becom-ing increasingly distributed, this model is startingto fail. There may be no social connection betweenspatially separated groups even though theywork on similar problems. Our approach to Exper-tise Finders attempts to alleviate this by usingthe company’s own resources to recommendpeople to contact. It does not replace the socialnetwork, rather it attempts to speed up theconnection-making process.

The work reported in this paper presentsdetails of two Expertise Finders. The key pro-blem that is being addressed is summarized inFigure 2(a): How does a person located in Site Alocate the best expertise to solve a specific pro-blem? The person’s local network will, in all pro-bability, only extend to within the site; therefore,expertise in other sites cannot accessed. It shouldbe remembered that sites can share commonproblems, but not necessarily be easily accessibleto each other. For example, within the academiccommunity, a question on robotics could easilydraw on expertise from either a Department ofElectrical Engineering or a Department of Cogni-tive Physiology. Although the sites may not forma cohesive social network, they do share commonsets of resources, including e-mail, phone books,publication and report repositories, Figure 2(b).In our approach to Expertise Finder systems, theseinformation repositories are used to identify therequired expert, Figure 2(c).

How do we Identify an Expert?

The identification of an expert can be consideredto be a function of a number of social and tech-nical factors. For example, in an academic organi-zation an expert will be the person who has themost publications, largest number of grants, andextensive experience, either with the current orsimilar organization. In addition, they will tendto hold senior posts.

However, when a person wishes to contact anexpert there are additional social factors that needto be taken into account. For example, within theacademic environment, without social factors thesingle expert will be swamped with queries foreveryone ranging from undergraduates to Vice-Chancellors. In practice, the appropriate person



free to make a valued judgement about whom toapproach. It is for this reason that we make avail-able all the sources used for the recommendationavailable for review.

Figure 2 Overview of the Expertise Finder

depends on the query and the user’s require-ments. Typically, the peer-to-peer approach isconsidered best in the first instance; however,the person requiring the expertise needs to be



As discussed by McDonald and Ackerman(2000), the details matter in successful expertiselocation. The heuristics used to select the expertare bound to the organizational environment.Systems that augment expertise locating must becapable of handling large number of details thatdepend on the specific context and problem.

In practice, an Expertise Finder system is de-signed to mimic the reality of an organization interms of its social structures and information in-frastructure. We have developed two approachesto Expertise Finders: one is based on agent tech-nology discussed in the next section, and the otheris based on Java servlets technology (discussedin the section after next).

IMPLEMENTATION OF AN AGENT-BASEDEXPERTISE FINDER

The implementation of the agent-based ExpertiseFinder consists of a number of Distributed Infor-mation Management (DIM) Agents operatingwithin the Southampton Framework for AgentResearch (SoFAR) (Moreau et al., 2000). SoFARwas developed at the University of Southamptonas an agent framework designed to address theproblems of distributed information management.On each occasion that the Expertise Finder sys-tem is deployed the sources of data available tobe used and their structures will be different.There will be commonalities due to the use ofstandards, such as being able to access a data-base using standard query language or the use ofprotocols such as LDAP. There will still be subtledifferences that require the customization of thesystem. Therefore, it is apparent that the high-level steps that any system should take to iden-tify an expert will be unique on each occasion.

In order to communicate with each other,agents use a shared understanding of a domaincalled an ontology. Ontologies are a conceptualiz-ation of a domain into a form which can be un-derstood both by humans and computers. Awell-known definition is an ontology is an explicitspecification of a conceptualisation (Gruber, 1993).Ontologies provide a mechanism to allow com-munication and interaction about a real-worlddomain. They remove ambiguity from languagethrough careful design. Pragmatically, they allow

us to concentrate on high-level concepts ratherthan spend time on the implementation details,such as communications and data representa-tion. It follows, therefore, that the design of theontology is crucial to implementing an ExpertiseFinder, and careful work was required to under-stand and map the real-world situation correctlyinto the ontological vocabulary. Further technicaldetails of how ontologies are implemented andused in the SoFAR framework can be found inMoreau et al. (2000). The ontologies used in de-sign of the agent-based Expertise Finder weredesigned previously within the IAM Group atSouthampton but extended for this application.They represent the activities and people in ourresearch group. A detailed explanation of theirdesign and implementation can be found in Wealet al. (2001).

Figure 3 shows the system architecture ofthe implementation of the agent-based ExpertiseFinder. The Expertise Finder system consists of amain agent, the Expertise Finder Agent, which usesa set of simpler source agents in some algorithmto determine a list of people and documents torecommend to a user. The Expertise Finder Agentbuilds an answer as XML before transformingthat to HTML for delivery to the user via theWeb server agent. The use of XML allows theExpertise Finder Agent to be reused in other sys-tems and its results transformed as required. Inthe diagram we show all of the agents we have atour disposal, but here we concentrate on the coreinteractions between those outlined in solid lines.

The source agents (the Academic Publicationsand Directory Services agents) are designed torepresent sources of information and data withinthe organization. These can range from the simple(an agent that understands the data stored in theinternal phone book) to more complex know-ledge (such as an agent interface to a publica-tions database). In Figure 3 we include examplesof some of the ontological predicates that theagents support. For instance, the Directory Ser-vices Agent can answer queries about the locationof people or return all of the people with a cer-tain phone number.

The agent-based Expertise Finder applicationwas based on a previous agent application, theDynamic CV (Weal et al., 2001). This applicationused the notion of query recipes to construct an



on-line Curriculum Vitae dynamically. Forinstance, for the CV query, a general informationpage about a person, it would find and use agentsto obtain telephone number, office location, ande-mail address. The answers were combined intoa Web page in which links to new queries wereautomatically added, and thus a user could navig-ate around the information space. Figure 4 showsthe result of a CV query.

The key weakness of the Dynamic CV appli-cation was that the main agent would gatherinformation from source agents following theinstructions of a query template. It would extractthe data and place it onto the Web page with nounderstanding of the results. The Expertise FinderAgent is a total redesign of this, with the expressintention not only of supporting the types ofquery performed by Dynamic CV but also if per-forming complex interactions with Source Agentsin order to build towards a final answer. In theExpertise Finder the Source Agents have been

radically improved and the services they pro-vide have been expanded considerably.

Implementation

The current version of the agent-based ExpertiseFinder has been used to locate people usingthe scientific publication repository within theauthors’ Department. The goal is to aid people tofind experts on a topic amongst the people in thedepartment. A user enters a query on a researchsubject into a Web search page. This query isgiven to the Expertise Finder Agent by the WebAgent. The Expertise Finder Agent first asks thePublications Agent to find publications usingthe search terms. The Publications Agent takesthe query terms from the predicate and uses themto form an SQL query. The query is run on thedepartment publications database. The publica-tion database lists authors by a list of full namesand a corresponding parallel list of full e-mail

Figure 3 Expertise Finder architecture. Typical predicates used are given below the respective agents



addresses. Hence, some understanding of this andsome data translation must be performed. ThePublications Agent uses the Directory ServicesAgent to help identify authors. It then uses theresults of the query to build new Creates Predi-cates and return them to the Expertise FinderAgent. The Expertise Finder Agent will maintaina record of their details, saving duplication ofqueries, and begin to count the number of timesthe person appears in the returned publications.The Expertise Finder Agent will also maintain alist of people not identified, Figure 5.

The final results page is made up of the re-turned publications, the list of authors found witha count of their occurrences and their statuswithin the department. The list of unknownauthors is also returned to allow users to decidefor themselves the usefulness of such informa-tion. In the context of this application this list

consists of people who have left the depart-ment or external collaborators, and is less usefulto the user.

JAVA SERVLETS-BASED EXPERTISEFINDER

The difficulties we faced in improving the orig-inal Expertise Finder system, particularly withregard to scaling, led to a new design emerging.The system was implemented using Java servletsand is interfaced via a simple search-form Webpage. As in the agent-based approach, the userenters a query and the system will return a rankedlist of documents plus a ranked list of people tocontact. On entering a technical query the sys-tem’s goal is to find documents and people per-tinent to that query according to some predefined

Figure 4 The Dynamic CV agent system found agents to fill in query templates. There was no attempt tounderstand or use the information that is returned



strategy. Figure 6 shows the system architectureof our current implementation of the Java servlets-based Expertise Finder.

When the system is invoked by a user’s sub-mission a predefined Expertise finding strategy willbe activated. This is a piece of code that decideswhich components to use and how to manipulatethe results they return. The strategy will employvarious other components in a sequence. The firsttypes of component are termed generators. Theseperform the search and will produce lists of docu-ments or people. The second set of componentsare termed scorers. These will take the existingresults and give them new scores based on a

predefined algorithm. The strategy keeps all ofthe scores that each component produces. Thesystem then computes a final score for eachdocument or person. This is done using a ratingssystem for each of the scores. The ratings cancome from the strategy itself or from an indi-vidual’s preferred ratings.

Figures 7 and 8 show the final results of a query;in the current implementation these are shownon the same Web page. The Web page shows aranked list of documents followed by a rankedlist of people. Each is given a score that is repre-sented by a horizontal bar graph. The score isthe result of a series of queries to find and score

Figure 5 The results of the prototype Expertise Finder, the first 10 publications used to rank the experts aregiven as an unranked list



results produced by generators and give a newscore for each result depending on some otheralgorithm. For instance, the same list of documentsfound by the full text search engine could begiven a score based on their age. New documentsmight be given a higher score than older docu-ments. This design implies an implementationrelationship between scorers and generators andthat each component is not totally independent.

The Expertise Manager runs a particular strat-egy to solve the query. The strategy manages theexecution of the components in a certain orderand manages the manipulation of the data theyreturn. For instance, a scorer might need certaininformation about certain documents or can onlyfunction on a subset of results. Again, the strategycomponent will be highly specific to the applica-tion and the organization within which it is run.The Expertise Finder Manager chooses whichstrategy to run at query time.

For every result found, the system keeps all thescores for that particular entry as well recordingthe origin of that score; hence, for each documentor person located a result object is created. In theAPI, a result object is comprised of a source objectplus an array of score objects. The source objectrefers to the document or person and the score

Figure 6 Java servlets-based Expertise Finder

documents and people from various sources.The final score is an accumulation of all of thesescores according to an interchangeable strategythat can be influenced by the user. Therefore, thefinal score is relative and not normalized acrossstrategies or users.

The servlet component is responsible forgenerating the finished pages and providing thenecessary responses to the Web browser. It passesthe query to the Expertise Manager. This acts asthe central hub and is responsible for starting thevarious components and running the requisitesearch strategy.

In the architecture the generators produce newresults, either documents or people. They typi-cally do this by interfacing with existing systemswithin an organization, such as databases or docu-ment repositories. Generators will produce lists ofresults from a query and may provide a score foreach result. For instance, a generator based on asearch engine might return a list of documentsand the relevancy score given by the search en-gine. In this system being discussed in this paperall scores are integers between zero and ten. Thefinal scores can be any value; this is in keepingwith the nature of the system, that the final scoresare only relative and not normalized. Scorers take



Figure 7 List of recommended documents; score is given by a bar graphic

objects are the scores given by each component inthe current strategy.

The final list of results is given to the Score-Compiler, which is responsible for producing thefinal score. To produce this final score, a list ofratings or multiplication factors is required. Theratings is a list of values used as multiplicationfactors for each type of score. The ScoreCompilerperforms the simple calculation of multiplyingeach score by its factor and then summing theresults. It adds this to each result as a new scoreentry with the origin name ‘Final’. The strategyprovides the ratings values but can also obtain oralter them by using the preferences system andhence allow for individual user intervention inthe final result.

The preferences system is a general purposeway of storing preferences for each user. Asimple API maintains a unique ID for each user,managed using cookies, and allows the store ofany key-value pair in a database against this ID.Hence any part of the system can store and re-trieve arbitrary data for an individual user. Thiscan be used to override default values for theratings given by the strategy.

The finished results are ranked and then usedto produce the final display. This service isprovided by a renderer component. The currentcomponent produces HTML, but other rendererscould provide other types of data, such as XML.

The data source manager controls access tounderlying database connections in order to



Each technical report database has been placedinto MySQL and a generator written to performfull text searches of these resources. This hasrequired careful design, as there are a total of300 000 entries in the two databases. Unless careis taken, then the time to query such resourcescan be considerable and the number of resultsreturned can easily overwhelm the rest of the sys-tem. We currently use the full text search capab-ilities of MySQL and use the relevancy scores itproduces.

We use the phone book database as a peoplegenerator. Again, the raw data we received wasplaced into a MySQL table. For each documentfound by the document generators an attempt is

Figure 8 List of people with contact information; score is given by a bar graphic

help cope with large-scale deployments of thesystem.

Implementation

The current system is built around data suppliedby a large manufacturing company. The majorsource comes in the form of databases of internalpublications. These databases list details on in-ternal technical reports produced at individualcompany sites. We also have a source of phonenumbers and locations for all the employees of thecompany; this database has over 20 000 entries.

The system we have developed comprises twodocument generators and a number of scorers.



made to match the author against the phone book.This entails understanding the correct fieldsthat hold the author data, that the formattingof the fields do not match and that the authorfield of the document databases are free text andhighly inconsistent. This is a prime example ofwhere the strategy code is required to performraw manipulation of the data in order to performthis matching process.

If the author is matched and the departmentthey have given in the report database matchesthat in their phone book entry then the peoplegenerator deems to have a strong match for thatperson, because they have not moved depart-ment since the report was written, and returns ahigh score. If the person is matched but not inthe same department then the score is less. Ifthere is no match then no score is returned. Thiscrude algorithm can be improved, but datasources are required to aid in tracing and match-ing people. As such resources become available,this generator could be improved or replacedwithout altering the rest of the system. At such atime the scoring algorithm may need rebalancingto take account of the improvement of the accu-racy of this component.

We have developed a number of demonstratorscoring components. One examines the date ofthe documents and gives them a score basedon their age. Recent documents are given higherscores. The databases contain data for approxi-mately 20 years of work, so this is an importantway to re-rank the data. Again, there have beenissues with attempting to work with free textfields and trying to understand the format of thedocument dates.

The new Expertise Finder framework has beendesigned with the specific aim of allowing searchstrategies and ranking algorithms to be inter-changed. This allows the system to be heavilycustomized to the particular social system foundwithin an organization. This is achieved by break-ing down the process into steps performed byinterchangeable components.

During the building of this new system it wasrealized that there cannot be general purposeontologies through which all components com-municate. There needs to be highly application-specific code in the system to deal with thelow-level interchange and manipulation of data

from the various sources in the system. The strat-egy code is the way of dealing with this problem.The code in this component is totally specific to acertain application of the framework and manipu-lates raw data from sources. Therefore, it is reli-ant on certain sources being available to work.

It is unavoidable that the strategy code is inter-linked with the available components found at aparticular site. It is not possible to design a singlesystem that can be installed at a site with noadditional work. Components need to be writtento match available sources and the strategy needsto be written to match the social situation.

Instead of data exchange via ontologies thenew system has a simple way to represent data.The data produced by one generator or scorer mayor may not be useful or recognizable to another.Therefore, the data are represented simply bya unique ID and the source of that ID. If anothercomponent wishes to make use of that datathen it needs to obtain it from the original com-ponent. This produces yet more interdepend-encies in the system design, but clear APIs toeach component help here. For instance, a scor-ing component might rank documents based ontheir date. Therefore, the generator that suppliedthose documents needs to be able to supply datesof documents via its API.

By dealing with the absolute minimum of datathe system can handle relatively large numbersof results. In the previous system, large amountsof data were generated as ontologies were in-stantiated and exchanged. A great deal of thisdata was often redundant to the final require-ments. The decision to provide minimum resultsbut provide full provenance is a trade off be-tween results that are interchangeable againstresults that allow for speed and low overheads.The default usage of the system is to provideonly a limited number of results, for instance thetop 20 documents and people; therefore, it is muchfaster just to extract the required data for those40 results at the end of the run.

DISCUSSION

This paper reports the development of twoExpertise Finders, the initial version based onagents and the subsequent version using a more



simplistic architecture based around the need toadapt the search strategy heavily to the organi-zation. The current servlet-based Expertise Finderproduces results of limited value to designersdue to problems associated with the data used.Two particular problems are worth noting. First,in identical databases supplied by two companysites, the fields were similar but not exactly thesame. In addition, neither the publication northe phonebook data could be guaranteed to becorrectly maintained, nor can the entries in anyfields be guaranteed to be ‘correct’. For instance,people could edit their own entries in the phonebook to give themselves any title or enter anyname for their particular work group. Althoughthis information was stored correctly elsewherein other sources, it was not available for this work.These problems did demonstrate the dangersof individuals configuring information withoutregard to the overall knowledge managementrequirements of the organization.

Even with these limitations our work diddemonstrate the infrastructure in use and suc-cessfully allowed the strategy to utilize variousdocument generators and scorers before comput-ing a score. We are currently working on produc-ing more meaningful scoring components.

The notion of trust in the results has been akey point made by our industrial partners at everystep of the design process. For instance, in ourfirst Expertise Finder implementation the agentstried to match author names in a publicationsdatabase against the live staff database. Thosenames that failed to match were returned tothe user along with the names of those whichdid match. This crude example showed usersexactly what was happening within the systemand taught us a great number of implementationlessons.

The current system attempts to solve thisproblem with its transparency. Developers andusers can see what scores are being given to eachresult and can see how these scores are broughttogether. All results are kept, along with theirorigin, for precisely this reason. It allows users toevaluate for themselves the whole process andto go into more detail on what one particularcomponent has produced. This allows for pos-sibilities such as saving the results of a query forcaching and recalculating.

The process of finalizing the details of a par-ticular strategy will happen in two phases. Therewill need to be an initial requirements capturephase in which the users, or specific users withexperience, are asked to explain the sources andmethods that they would use to find an expert.From this, a strategy component can be builtthat reflects a best attempt to turn this experienceinto algorithms and scoring schemes. We willthen require an evaluation loop to refine thisstrategy.

One tool to help this process would be to beable to visualize all of the score sets for eachresult. For instance, a Web page output could listall of the scores for each result as well as listingthe ratings given to each contributing score. Thiswill allow users to look deeper into the results,spot anomalies and have a much greater trust inthe final results.

As expected, our experiences have clearlyshown that to produce an Expertise Finder tosupport designers within a manufacturing organ-ization is a non-trivial undertaking. The mainproblem is the sheer scale and complexity ofthe organization we have been working with.Initially, it took a considerable amount of timebefore we felt we understood our partner andtheir actual processes. The second is the age andquality of the data we have been able to access.In this particular organization the knowledge iskept for many years, in many forms. Each area ofthe company uses and produces different typesof information, so no single solution will beappropriate organization-wide, a major factor inthe evolution of our design.

In conclusion, our work has shown that it ispossible to develop Expertise Finders that relyon information currently available to organiza-tions. With our approach, companies do not needto undertake a knowledge audit across the organ-ization to ‘frontload’ the system. In practice,the viability of our approach is dependent on theability to access all the information available inelectronic format. With many organizations thiscan be rather spasmodic, as the IT infrastructurehas evolved over time with a range of conflictingpriorities. Even with these caveats we believethat the introduction of this approach is feasible,though the challenges may be more of a socialthan technical nature.



ACKNOWLEDGEMENTS

This work was funded in part by the UniversityTechnology Partnership for Design, which is acollaboration between Rolls-Royce, BAE SYS-TEMS and the Universities of Cambridge, Shef-field and Southampton. The UK Engineering andPhysical Sciences Research Council fundedthe work under grant number GR/M83582. Theviews and conclusions contained herein are thoseof the authors and should not be interpretedas necessarily representing official policies orendorsements, either expressed or implied, ofthe EPSRC or any other member of the UTP.

References

Ackerman M, McDonald D. 1996. Answer garden 2:merging organisational memory with collaberativehelp. In Proceedings of ACM 1998 Conference on Com-puter Supported Cooperative Work, CSCW 96. ACM:New York; 97–105.

Auriol E, Crowder RM, McKendrick RJ, Rowe R,Knudsen T. 1999. Integrating case-based reasoningand hypermedia documentation: an application forthe diagnosis of a welding robot at Odense steelshipyard. In Proceeding of the International Conferenceon Case-Based Reasoning (ICCBR’99).

Becerra-Fernandez I. 2000. The role of artificialintelligence technologies in the implementation ofpeople-finder knowledge management systems. InProceedings of the 2000 American Association for Artifi-cial Intelligence Spring Workshop, AAAI, March.

Bollacker K, Lawrence S, Giles C. 1998. Citeseer:an autonomous Web agent for automatic retrievaland identification of interesting publications. InProceedings of the Second International Conference onAutonomous Agents, Sycara K, Wooldridge M (eds).ACM: 116–123.

Chandrasekaran P, Joshi A. 2001. An expertiserecommender using Web mining. In Proceedings ofAAAI 14th Annual International Florida Artificial Intel-ligence Research Symposium.

Clegg C, Gray M, Waterson P. 2000. The charge ofthe byte brigade and a socio-technical response.International Journal of Human–Computer Studies 52(2):235 –251.

Crowder R, Wills G, Heath I, Hall W. 1998. Hypermediainformation management: a new paradigm. In 3rdInternational Conference on Managing Innovation inManufacture, University of Nottingham; 329–334.

Crowder R, Bracewell R, Clegg C, Hall W, Hughes G,Kerr M, Moss M, Knott D, Wallace K, Waterson P.2003. A future vision for the engineering designenvironment: a future sociotechnical scenario. InProceedings of ICED 03, Stockholm.

Goa Y, Zeid I, Bardez T. 1998. Charcteristics of aneffective design plan to support re-use in case-basedmechanical design. Knowledge-based Systems 10: 337–350.

Gruber TR. 1993. Toward principles for the design ofontologies used for knowledge sharing. Technicalreport KSL-93-04, Knowledge Systems Laboratory,Stanford University.

Khadilkar D, Stauffer L. 1996. An experimental evalu-ation of design information reuse during conceptualdesign. Journal of Engineering Design 7(4): 331–339.

McDonald D, Ackerman M. 1998. Just talk to me: afield study of expertise location. In Proceedings ofACM 1998 Conference on Computer Supported Coopera-tive Work, CSCW 98. ACM: New York; 315–324.

McDonald D, Ackerman M. 2000. Expertise reco-mmender: a flexible recommendation system andarchitecture. In ACM 2000 Conference on ComputerSupported Cooperative Work. ACM: New York; 231–240.

Moreau L, Gibbins N, DeRoure D, El-Beltagy S,Hall W, Hughes G, Joyce D, Kim S, Michaelides D,Millard D, Reich S, Tansley R, Weal M. 2000. SoFARwith DIM agents: an agent framework for distrib-uted information management. In Proceedings ofPAAM00, Manchester, UK.

Wallace K, Clegg C, Keane A. 2001. Visions for engi-neering design: a multidisciplinary perspective. InProceedings of ICED 01, Glasgow.

Weal M, Hughes G, Millard D, Moreau L. 2001. Openhypermedia as a navigational interface to ontologicalinformation. In Proceedings of Hypertext’01, Davis H,Douglas Y, Durand D (eds). ACM: 227–236.

Approaches to Locating Expertise Using Corporate Knowledge · IAM Group, University of Southampton,...

Documents

Transcript of Approaches to Locating Expertise Using Corporate Knowledge · IAM Group, University of Southampton,...