Knowledge portals and the emerging digital knowledge workplace

Knowledge portalsand the emergingdigital knowledgeworkplace

by R. MackY. RavinR. J. Byrd

A fundamental aspect of knowledgemanagement is capturing knowledge andexpertise created by knowledge workers as theygo about their work and making it available to alarger community of colleagues. Technology cansupport these goals, and knowledge portals haveemerged as a key tool for supporting knowledgework. Knowledge portals are single-point-accesssoftware systems intended to provide easy andtimely access to information and to supportcommunities of knowledge workers who sharecommon goals. In this paper we discussknowledge portal applications we havedeveloped in collaboration with IBM GlobalServices, mainly for internal use by GlobalServices practitioners. We describe the roleknowledge portals play in supporting knowledgework tasks and the component technologiesembedded in portals, such as the gathering ofdistributed document information, indexing andtext search, and categorization; and we discussnew functionality for future inclusion inknowledge portals. We share our experiencedeploying and maintaining portals. Finally, wedescribe how we view the future of knowledgeportals in an expanding knowledge workplacethat supports mobility, collaboration, andincreasingly automated project workflow.

All human work, even the most physical labor, in-volves cognitive capabilities, but the hallmark

of human work in the latter part of the twentieth cen-tury emphasizes knowledge work—solving problemsand accomplishing goals by gathering, organizing, an-alyzing, creating, and synthesizing information andexpertise. Knowledge work is performed by individ-uals who belong to communities of interest, whereknowledge is shared and accumulated. Knowledgemanagement (KM) refers to the methods and tools

for capturing, storing, organizing, and making acces-sible knowledge and expertise within and across com-munities. Communities of interest may be scientific,academic, business-oriented, or government-based.We focus here on the corporate environment, sincethis is where KM is most self-consciously addressed,and where supporting technologies are expandingmost rapidly.

At the broadest level (to paraphrase Prusak1), KMrefers to all the tools, technologies, practices, andincentives deployed by an organization to “knowwhat it knows” and to make this knowledge avail-able to people who need to know it when they needto know it. At the individual or team level, the KMflow is a cycle in which solving a problem leads tonew knowledge, initially tacit (that is, known but un-expressed), and then made explicit when experiencesare documented, distributed, and shared (via data-bases, e-mail, or presentations). Once explicit, theknowledge is used by others for solving new prob-lems.2,3 The application of the explicit knowledge toa new problem creates new tacit knowledge, with thepotential of initiating a new KM cycle. In this gen-eral cycle lie a host of technical, social, and human-computer interaction issues. In this paper we focuson the technology and, specifically, on what havecome to be called knowledge portals.

�Copyright 2001 by International Business Machines Corpora-tion. Copying in printed form for private use is permitted with-out payment of royalty provided that (1) each reproduction is donewithout alteration and (2) the Journal reference and IBM copy-right notice are included on the first page. The title and abstract,but no other portions, of this paper may be copied or distributedroyalty free without further permission by computer-based andother information-service systems. Permission to republish anyother portion of this paper must be obtained from the Editor.

IBM SYSTEMS JOURNAL, VOL 40, NO 4, 2001 0018-8670/01/$5.00 © 2001 IBM MACK, RAVIN, AND BYRD 925

From information portals to knowledgeportals

The term “portal” is used quite ambiguously, espe-cially because it evolved over time and became com-monplace. Portals started as applications, typicallyWeb-based, providing a single point of access to dis-tributed on-line information, such as documents re-sulting from a search, news channels, and links tospecialized Web sites. To facilitate access to largeaccumulations of information, portals quicklyevolved to include advanced search capabilities andorganizing schemes, such as taxonomies. Because oftheir emphasis on information, these first-genera-tion portals are often called information portals. In-formation portals provide a valuable service on theInternet, by selecting, organizing, describing, andsometimes evaluating, useful sites. Yahoo!4 was oneof the first and is still one of the most popular public-domain, Web-based portals. The recent prolifera-tion of portals may seem to undermine the originalintent of single access, but in fact, this circumstanceemphasizes that portals are defined with respect toa community of users who share common tasks andinterests. (Consider, for example, the viewpoint ofa shopping consumer versus that of a professionalengaged in researching a topic for a report.) This isespecially true for internal corporate portals, wheredifferent functional and organizational groups andlines of business may have substantially differentneeds for information access and organization. Ex-amples include sales and marketing, best practices,competitive intelligence, research and development,and general corporate resources. Specialized portalsin the corporate sector are sometimes called vortals,for vertical portals, since they provide in-depth ca-pabilities that are highly focused on a vertical seg-ment of an organization or field.5

We refer to information portals used by knowledgeworkers as knowledge portals (or K Portals for short)to differentiate this KM role and usage from otherportal roles, such as consumer shopping or business-to-business commerce. K Portals are rapidly evolv-ing into broad-based platforms for supporting a widerange of knowledge worker (KW) tasks. We refer tothe broad-based platforms as the knowledge work-place to draw attention to the importance of support-ing the full range of knowledge work tasks within anintegrated and unified context of use. In the nextthree sections of this paper, we focus on the infor-mation accessing and organizing role of portals andon how this role relates to the broader spectrum ofknowledge work tasks. We describe component tech-

nologies and end-user functions, drawing heavily onour experience building a platform for portal systems.The section succeeding those (“Knowledge portalsin an expanding knowledge workplace”) discussesthe evolution of K Portals and evokes themes thatare covered in other papers in this issue.

Knowledge work and the role of portals

Portals serve tasks performed by knowledge work-ers, and we depict these tasks in the high-level viewin Figure 1. Most broadly described, KWs gather in-formation relevant to a task, organize it, search it,and analyze it, synthesize solutions with respect tospecific task goals, and then share and distribute whathas been learned with other KWs. The tasks are il-lustrated concretely in Table 1, which describes a“day in the life” of a consultant involved in the ini-tial steps of engaging with a customer in a market-ing or a consulting context. We use this scenario asthe basis for portal technologies throughout this pa-per.

The consultant (and KW) goes through several tasksteps, starting with gathering information about thecustomer, the industry, or the business field relevantto the engagement, as well as about the products andthe services available to meet the customer’s needs.References might also be sought to colleagues whomight have useful expertise to share. Using severaltools, the KW searches internal and Web informa-tion resources for electronic and nonelectronicartifacts, often generated as a result of previousprojects and distributed in a variety of ways. Search-ing is done both explicitly, using the portal searchfunctions, and possibly implicitly, by creating or mod-ifying a profile of current interests that is used to au-tomatically find, and notify users of, potentially rel-evant information. An explicit search can involveformulating a query, reviewing search results, re-questing “more documents like this,” or browsingtaxonomies that organize documents into topics.Over time, the KW acquires information relevant tothe customer engagement and may create a dedi-cated project workplace in which to collect and or-ganize these resources. This workplace supports fur-ther project activities, such as creating presentationsor carrying out analyses needed as input for propos-als, budgets, and project timelines. These activitiesinvolve soliciting information from other colleaguesand experts via e-mail, scheduling meetings and tele-conferences, distributing various artifacts, and sav-ing information in a project workplace (e.g., LotusNotes** TeamRoom) for review and coauthorship.

MACK, RAVIN, AND BYRD IBM SYSTEMS JOURNAL, VOL 40, NO 4, 2001926

Figures 2 and 3 show screen shots of a K Portal builtfor internal use in IBM Global Services. This portalprovides software support for several of the high-level KW tasks we have just described. The home pagein Figure 2 is divided into several sections, some typ-ical of Web pages (e.g., an identifying header or afooter with links to other pages on the Web site) andothers specific to this portal. The left frame has atext entry field for conducting a search and filtersfor restricting search to a selected category shownin the center of the Web page (i.e., “Intellectual Cap-ital and Assets,” “Biographies,” etc.). It also includesa document filter whose value represents the dom-inant type of document content (e.g., “projects” or“people”). In the central area of the home page arefour high-level taxonomies that organize documentsin ways relevant to KWs in IBM Global Services. Be-low the taxonomy area are bulletin board entries andtop documents accessed by colleagues.

KWs can carry out free-text searches, navigate downone or more of the taxonomies, or combine a searchwith category and document-type restrictions. Forexample, Figure 3 shows a list of documents obtainedby navigating to “Engagement Models,” a subcat-egory branch of “Intellectual Capital and Assets,”“Finance and Insurance,” and “Financial MarketsSolutions.” (The subcategory path is shown at thetop of the middle frame.) Documents such as thosereturned in Figure 3 can also result from inputtingtext terms expressing topics of interest into the searchfield in the upper left corner to initiate a search. Atext search can be restricted to a specific category.This capability is important because many of the cat-egories contain thousands of documents. Each doc-ument returned is summarized with a title, a link tothe full document, an abstract, and indicators of doc-ument size and type. Abstract and size are useful formobile users who may not want to download large

Figure 1 Knowledge work tasks, with examples of supporting technology

KNOWLEDGE WORK TASKS

TECHNOLOGY

IBM SYSTEMS JOURNAL, VOL 40, NO 4, 2001 MACK, RAVIN, AND BYRD 927

documents without knowing more about their con-tent. An attachment indicator is useful too, sincemany Lotus Notes documents contain minimal textand serve as containers for attached documents.

Once a KW has gathered a set of documents rele-vant to a task, other tasks come into play, requiringsupport beyond searching and browsing, for author-ing presentations and collaboration. Authoring andcollaboration tools are not currently launchable fromthe K Portal described in Figures 2 and 3. AnotherIBM KM tool supporting both portal functions andcertain types of collaboration is shown in Figures 4and 5. The intellectual capital management (ICM)AssetWeb is a Notes-based application, originally de-veloped for internal use within IBM Global Services.6

It is now also available externally and has garneredacclaim as a KM tool in industry reviews.7 The ICMAssetWeb uses Notes categories and TeamRooms(group document databases) to organize documents

manually. Figure 5 shows an example of a Team-Room. The left panel lists options for viewing doc-uments in the repository by categories, chronology,or authorship, like any information portal, but be-cause the ICM AssetWeb is built on Lotus Notes, ithas access to the larger application context of Notes,with tools for collaboration and communication, in-cluding e-mail and calendars.

Portals support KWs as a community. Earlier proto-type versions of the IBM Global Services K Portalshown in Figure 2, specialized for smaller “e-busi-ness” communities, listed references to news items,names of new hires, and icons pointing to featurestories, all of which were of potential interest to thepractitioner community served by the portal, help-ing to build and support members of that commu-nity. The community is scattered across geographies,with many practitioners working from home officesor on the road. Featuring new employees electron-

Table 1 Knowledge work scenario: Consultant involved in initial steps of a customer engagement

Steps Actions

1 Customer representative calls knowledge worker (KW) named “Karen,” a consulting services practitioner for“KnowledgeAdventures,” informing her of a customer’s interest in developing a document management systembased on her company’s products and services. Karen begins the first steps in a Customer Engagement. Karen andthe customer representative schedule a meeting with customer CIO and technical staff to understand what theyneed.

2 Karen uses the portal to find documents relating to the customer, by looking into the “Engagement Life Cycle”taxonomy and navigating down the category path (Engagement Life Cycle 3 document management systems 3customer references). She also finds a few digitized marketing videos on the company’s product line, from a tradeshow. She downloads these documents to her workstation. Karen also modifies her profile to add the new customername and descriptions of the problem to solve, and requests notification of information from external Internetsources on these topics.

3 Karen also needs to get help from experts in the company who know about the customer and documentmanagement systems. The portal returns resumes of other practitioners (in the Biography category) who citedocument management as an expertise. Karen does not know most of these people (due to the high turnover in theservices organization) and is unsure about their level of experience. She needs advice from colleagues.

4 Karen creates a project workspace, and fills in a project template with a set of categories representing phases ofthe project. This space will contain various project artifacts she anticipates gathering or creating, such asinformation about competitive products and technologies, skills and resources, existing assets based on priorengagements in the document management product domain, statements of work, etc. She transfers documentsfrom her workstation to the workspace, to the appropriate categories (customer reference, competitive productinformation, etc.).

5 Karen sends off an e-mail note soliciting advice and interest from a set of colleagues that she either knowspersonally or has found via the resumes she gathered. She schedules a teleconference; her assistant establishes acall-in number, and checks the schedules of the people contacted. Karen notices that two of them are at acompany site where she plans to be next week, and she wants to find out if they are available to meet in person.

6 In preparation for beginning the project, Karen drafts a presentation describing the customer’s needs, thecompany’s document management products and services, and outlines a plan. She puts the presentation in theproject workspace, alerts colleagues that it exists, and schedules a conference to review it.


ically fulfills an important social role, serving as avirtual welcome and introduction to the rest of theircolleagues. Links to resources and biographical in-formation help familiarize KWs with the communityand are of special value to new hires. Bulletin boards,frequently accessed documents (shown in Figure 2),highlighted news, and success stories help shape thecorporate culture and values, giving recognition andacknowledgment to successful employees, while cre-ating models for others. These features are partic-ularly important in a highly competitive, geograph-ically dispersed profession with high turnover.Similarly, portals are known to be very important inmerger and acquisition situations because they canbring together different corporate cultures to a sin-gle point of access.

To remain vital and current, both the IBM GlobalServices K Portal and the ICM AssetWeb require avariety of knowledge and content management pro-cesses (also discussed in the fifth section under “Por-tal management”). These processes include oversightof document gathering, indexing, and categorization.The reliance of a KW on the information availablethrough the portal raises important concerns aboutthe coverage and quality of the information sources.Higher-level KM processes include dedicated “coreteams” that evaluate the quality of intellectual cap-ital submitted to the portal. The document manage-ment process of the ICM AssetWeb includes review,classification, and certification of documents by ded-icated teams of subject-matter experts from the ap-propriate IBM Global Service lines of business. Se-

Figure 2 Example knowledge portal home page, showing bulletin board, frequently accessed documents, and links to multiple taxonomies and text search input field


curity issues involved in accessing documents are alsoof concern. Access to documents is controlled by thedocument repositories themselves. Users need to en-ter a user ID (identifier) and password to see certaindocuments.

The IBM Global Services K Portal and the ICM As-setWeb systems complement each other. The por-tal is totally Web-based, lightweight, and focused onsearch and categorization. The middleware support-ing it allows easy integration of new exploratory func-tions. The ICM AssetWeb, in contrast, includes col-laboration and communication tools, some level ofworkflow to manage the content, and application de-velopment tools. The new generation of IBM KMproduct platforms, including the Lotus Discovery

Server**8,9 and WebSphere* Portal Server, integratean expanded search function, including finding ex-pertise, i.e., knowledgeable colleagues and potentialteam members, taxonomy generation tools, and moreeasily customized collaboration spaces (Lotus Quick-Places**). Other vendor offerings, such as those fromPlumtree Software10 or Autonomy11 offer similar ca-pabilities. But we contend that there is room for moreresearch and development to improve the quality ofspecific features, such as search, categorization, andsupport for collaboration, as well as for the effectiveintegration of these features. Achieving these goalswill lead to a much richer and more supportiveknowledge workplace. We return to this point afterwe review in more depth the component technolo-gies that we have outlined.

Figure 3 Example knowledge portal showing a branch of a taxonomy, with categorized documents, showing documentmeta-data


Knowledge portal technologies

Here we delve more deeply into the technologies in-tegrated in portals, in some cases discussing issuesand capabilities that exist only in research prototypesor in competitive products. We discuss topics roughlyin the order of the high-level tasks schematized inFigure 1.

Capture and gather. Documents created in thecourse of performing knowledge work are typicallystored in multiple places—file systems on individ-ual workstations, Web sites on network servers, anddocument management systems such as Lotus Notes.In order to make content accessible to the portal basetechnologies and ultimately to users, documents need

to be automatically gathered by the system, regis-tered, managed, and analyzed. Documents are ex-tracted via a process called crawling, which starts froma given URL (uniform resource locator) or anotherspecific address, and then automatically and recur-sively follows all the links in each document. Con-tent analyzers extract text and meta-data from eachdocument as it is “crawled” and handle the partic-ulars of different document formats. The IBM GlobalServices K Portal uses a specific technology calledGrand Central Station (GCS), originally developedat the IBM Almaden Research Center12 to crawl doc-uments in Lotus Notes databases, and Web sites. Inboth cases, GCS extracts text and meta-data from doc-uments in multiple formats, such as Lotus word pro-

Figure 4 ICM AssetWeb showing a taxonomy branch of IBM Global Services “Knowledge Network” documents


cessing and business graphics applications, and thecorresponding Microsoft office applications. For Lo-tus Notes documents, information is also extractedfrom attached documents. Extracted text and meta-data are encoded in a standard XML (ExtensibleMarkup Language) format across document typesand made available for subsequent indexing andanalysis processes.

There are at least two reasons for aggregating elec-tronic information using a crawler. First, aggregat-ing data makes it easier to create a centralized searchindex for a collection, enabling a search over all doc-uments using a common search approach. Second,many useful methods for analyzing documents re-quire analyzing the properties of document aggre-gates, as we discuss in the next subsection.

However, it is not always possible to carry out full-scale, automatic crawling. For example, a repository

of documents, such as Dow Jones Interactive**, maybe stored in a proprietary database system with aninterface that controls access, preventing systematiccrawling of its contents. It may not be possible foran external portal to access the information in therepositories systematically, as required for creatinga search index within the portal. In this case, an al-ternative federated search strategy may be neededto create unified access to information across mul-tiple repositories. In a federated search, a query spec-ification created by a user is sent to multiple searchengines, and the results are aggregated. Distribut-ing the search and combining results in this way istechnically challenging for several reasons (see Ref-erence 13). A related situation arises where a singlecentral index may be too large. In this case, the cen-tral index can be structured in multiple indices toallow more efficient parallel processing of smallergroups of document statistics. Once again, technol-ogy exists for distributing a query to all the indices

Figure 5 Notes TeamRoom with document and discussion categories


in parallel, and then collecting the results and merg-ing them.14

Access control to information is an important issue.Some of the information may have to be restrictedto specific communities. Portals could accommodatethis situation by simply not including restricted in-formation in the search index or in categorized sub-collections. However, this limitation would under-mine the rationale of portals, which is to inform usersof what information is available. One way to handleaccess restrictions is to provide summary informa-tion of sensitive documents but control access to thefull content. In the IBM Global Services K Portal,search results return document titles and abstracts,including a link to the document in the repositorywhere it is stored, with access subject to the accessprotocol of the repository, which may require usersto log in to the repository with a password. An iconnext to the document title in a search hit list indi-cates whether access is restricted and saves the userthe annoyance of trying to access the document whenit is not available. In some circumstances, even a doc-ument title may be too sensitive. Human resourcedocuments may contain titles or abstracts that iden-tify people and personal issues that would violate bus-iness policies and possibly privacy laws. In thesecases, it is necessary to make clear access policies.It may be possible to create sanitized summaries ofsensitive documents, sufficient to alert users to theexistence of this information, while still protectingit.

Document analysis—Text analysis and feature ex-traction. Once the documents have been gathered,they must be analyzed so that their content is avail-able for subsequent organization, retrieval, and useby the system and by KWs. In subsequent subsections,we present text analysis operations performed by thesystem, involving various forms of clustering, cate-gorization, searching, navigation, and visualizationof documents. Here we discuss the document anal-ysis required in preparation for these operations.

As documents enter the portal system, they arestored for later retrieval and display. However, it isnot useful to simply put the documents away in theirraw form. Systems typically analyze the documentcontent and store the results of that analysis so thatsubsequent use of the documents by the system andusers will be more effective and efficient.

In order to operate on documents, we extract doc-ument features that give an indication of what doc-

uments are “about.” Since documents contain text,the portal applies text analysis in order to extract tex-tual features, which characterize the documents. Atthe lowest level, these features are characters andwords. However, when it is important to manage theconceptual content of documents, we need to iden-tify the entities referred to in the text—the things,people, places, organizations, dates, prices, etc.—thatare specific to the domain from which the documentsare drawn and that will make useful features for sub-sequent organization, search, and browsing opera-tions. Certain operations will also require featuresconsisting of relationships among these entities.

In addition to the textual features, which are intrin-sic to the document (i.e., drawn from within it), thereare also extrinsic features, whose source is outsidethe document. These features, also called meta-datafeatures, include information about creation date,author, category assignment within a classificationscheme, confidentiality, etc. Often, this meta-datainformation is gathered by the crawling process, andthe crawled content is represented in XML format,with the meta-data features encoded by XML tagswithin the XML files. For some operations, the dis-tinction between intrinsic and extrinsic features isirrelevant. Hence, in what follows, we will often usethe word “feature” to refer to both textual featuresand meta-data features.

Since document text is a form of human language,a wide variety of linguistic analysis techniques canbe used to find vocabulary and other language ex-pressions that refer to domain entities and their re-lations. These expressions and the concepts they re-fer to reflect the conceptual content of the documentcollection. These expressions provide the featuresused for organizing and finding documents in portalsystems. The simplest and most widespread type offeature used in current systems is simply the wordsin the text. These words are easy to obtain with sim-ple tokenization technology. With the addition ofstraightforward processing such as morphologicalprocessing (e.g., combining plural nouns with sin-gular ones) and stop-word processing (i.e., ignoringcommon words), word-based systems perform wellin some operations, such as a document search.

However, in knowledge-based applications, such astaxonomy generation and navigation, it is importantto identify features that reflect the domain-specificconceptual content of documents more accuratelythan simple words can. We have built a system, calledTextract,15–18 that can automatically process a doc-


ument collection and identify expressions that areproper names (of people, places, organizations, etc.),domain terms, abbreviations, and various types ofexpressions such as dates and amounts of money.Further, domain experts can customize Textract sothat it will also recognize various types of domain-specific entity references, such as telephone num-bers and document IDs. The techniques that Textractuses depend on the analysis of patterns of symbolsin text. This analysis capitalizes on the conventionsand redundancy that are characteristic of the use ofhuman language in documents. Thus, for example,over a large document collection, Textract is able todetermine that the expressions “Senator Clinton,”“Hillary Rodham Clinton,” “Hillary,” and even “thesenator” are all references to the same person. Suchinformation enables text analysis systems to betterdetermine the topics of documents and to gauge theimportance of entities that are referred to across thecollection.

Beyond entity references, document analysis shouldalso identify relationships among the entities. Tex-tract uses the contexts in which expressions occur tofind both statistical and lexical relations between thedomain entities. The lexical relations (such as: �Hil-lary Rodham Clinton�senator�New York State�) arefound by doing a deeper linguistic analysis of thephrases and clauses in the text of the documents.Note that both the relations and the names of the re-lationships that link entities are discovered duringdocument analysis. Statistical relationships amongentities are found using various measures of the fre-quency with which they occur.17

Having entity references and relationships as textualfeatures for characterizing the content of documentcollections is tremendously advantageous during theconstruction of knowledge portals by enabling op-erations that portals must perform. The followingsubsection discusses organization operations (clus-tering and categorization). Later subsections discusssearch, query refinement, relevance feedback, andlexical navigation. Other operations such as summa-rization, glossary extraction, and question answer-ing also depend crucially on the conceptual contentof documents that these features reflect.

Document organization: Clustering and categoriza-tion. When the crawler has finished its gathering task,most often the result is an undifferentiated set of doc-uments. As the number of documents under man-agement grows, it becomes increasingly importantto gather similar documents into smaller groups and

to name the groups. This operation is clustering. Allautomatic clustering methods use features to deter-mine when two documents are similar enough to beput into the same cluster. A typical approach takenis to represent a document as a vector of the fea-tures it contains and to compare the vectors for dif-ferent documents. Variants of this approach opti-mize performance by ignoring features that occurtoo seldom, too often, or with distributions that donot allow them to effectively distinguish one docu-ment from another. For example, the feature for“IBM” would not be useful for clustering documentsin an IBM internal portal.

It is almost impossible for the portal administrator(and domain expert) to know ahead of time howmany clusters or which clusters are implied by theavailable documents. Nevertheless, there needs tobe some way to control the operation of the clus-tering engine. Perhaps the most important controlpoint is the choice of which documents are presentedto the clusterer. For example, an administrator mightchoose to include formal documents such as reportsor press releases, while excluding informal docu-ments such as e-mail messages or chat room transcripts.The rationale for such decisions might be that theformal documents contain a more reliable accountof the conceptual content of the domain, whereasthe informal documents can be added to the result-ing clusters later using a different technique, such ascategorization. Depending on the system, clustererscan also accept parameters to control the sizes ofclusters, the sensitivity of the similarity metric, or thetotal number of clusters. An important additionalcontrol point is the selection of features and theirweights. Recall that the set of features available in-cludes meta-data features such as document date,author, and assigned keywords. These can also af-fect the resulting set of clusters. In fact, one pow-erful use of extrinsic features might be to allow theclusterer to preserve some aspects of a previouslyexisting category system by including category infor-mation among the features of the documents.

Rather than a flat space of clusters, some clusteringengines are capable of building hierarchical struc-tures containing clusters and subclusters. One ap-proach taken is to accumulate similar documents intoa cluster until some critical size is reached and tothen split the cluster into two or more subclusters.Control points for such clustering engines includethe critical size, the intracluster similarity metric, andthe number of subclusters to build.


Once the clusterer has finished its work, the clustersmust be named. Cluster labeling is the operation ofinspecting the final cluster contents and choosing thebest features to serve as names. The features usedas labels are not necessarily the same as those usedin the similarity metric. The requirement for labelsis that they be easily understood by human users ofthe portal, evocatively characterize the documentsin a cluster, and clarify the distinctions among neigh-boring clusters in a hierarchy.

An adequately labeled set of hierarchically organizedclusters for a document collection is usually calleda taxonomy, and the labeled clusters in the taxonomy

are called nodes. It is a tall order for a clustering en-gine and labeler to get everything right totally au-tomatically. As a consequence, systems that attemptto do automatic taxonomy generation usually incor-porate a taxonomy editor so that the portal admin-istrator or some other domain expert may craft ahigh-quality taxonomy based on the work of the au-tomatic system components. Operations supportedby a taxonomy editor include moving documentsfrom one cluster to another, splitting or combiningclusters, and manually assigning labels to clusters.The Lotus Discovery Server9 provides a taxonomygeneration tool based on the IBM Almaden ResearchCenter’s SABIO clustering technology.19,20 An addi-tional useful feature within taxonomy editing toolsis document summarization. As the domain expertinspects document assignments to clusters and movesdocuments from cluster to cluster, it must be easyto discern the conceptual content of groups of doc-uments without needing to read them in their en-tirety. Summarizers such as those described in thelater subsection “Find” can produce sentential, key-word, or topic summaries that are suitable for thistask.

Because document collections are not static, portalsmust provide some form of taxonomy maintenance.As new documents are added, they must be addedto the taxonomy at appropriate places, using the clas-sification technology described below. As the clus-

ters grow, and especially as the conceptual contentof the new documents changes over time, it may be-come necessary to subdivide clusters or to move doc-uments from one cluster to another. Although lesscommon, document deletions may also occur. Forthese reasons, it becomes appropriate to periodicallyreassess the taxonomy. As with taxonomy genera-tion, this reassessment may be accomplished usingboth automatic and manual procedures. The auto-matic part, perhaps based on the same technologythat spawned subclusters during taxonomy genera-tion, can suggest when and how a cluster that hasgrown too large must be split. A portal administra-tor, using the taxonomy editor, can monitor and im-plement these suggestions and, in general, can pe-riodically assess the health and appropriateness ofthe current taxonomy and document assignmentswithin it.

As exemplified in the “Intellectual Capital/Financeand Insurance/ . . . Engagement Models/” taxonomybranch in Figure 3, a document classification schemeprovides a powerful way for portal users to navigatethrough the document collection in their search fordocuments relevant to their information needs.Whether a classification scheme is based on an au-tomatically generated taxonomy (e.g., one derivedfrom the documents in the portal) or on an exter-nally imposed taxonomy (e.g., one imposed by cor-porate management), it is crucial to be able to ac-curately assign documents to the taxonomy nodes.Such accuracy is important so that when users nav-igate to a node and access documents through it, theycan expect that all the documents found are appro-priate to the node and belong together. Clearly, inthe case of automatic taxonomy generation, the clus-tering technology should meet this expectation, atleast for the initial set of documents. However, fordocuments added to the portal after taxonomy gen-eration—and for all documents in a portal with anexternally imposed taxonomy—another mechanismis needed. Document categorization technology pro-vides that mechanism.

The job of a document categorization system is toassign documents to categories, which are equiva-lent to the nodes in a taxonomy. In its simplest terms,a document categorization system operates in twosteps. In the first step, the training step, the systeminspects a set of previously categorized documents(the training set) and extracts a characterization ofthe documents in each category. This characteriza-tion, invariably based on the features found in thedocuments, is formatted and stored in a model. In

Because document collectionsare not static, portals must

provide some form oftaxonomy maintenance.


the second step, the categorization step, the systemprocesses one uncategorized document at a time. Itextracts features from the document and comparesthem to the features stored for each category in themodel. (Various optimization schemes can makethese comparisons efficient to perform.) The resultis a list of one or more categories to which the sys-tem thinks the new document should be assigned.

Extensive descriptions of a wide variety of ap-proaches to categorization can be found in Baeza-Yates and Ribeiro-Neto.21 The major differencesamong categorization systems concern the types offeatures they use, the way in which they representthe features associated with categories, and the wayin which they compare document features with cat-egory features. For example, in the IBM Text Ana-lyzer system, the features are words; they are asso-ciated with a category by means of “if-then” rulescorresponding to a decision tree. Document featuresare compared to category features by means of a de-cision tree processor. In contrast, the IBM Global Ser-vices K Portal uses a K nearest neighbor approach,in which the comparison between document and cat-egory is done with a standard search engine. The cat-egorization procedure uses features from the uncat-egorized document as a query against the set oftraining documents. The result of the search is a hitlist of training documents. The category chosen forthe uncategorized documents is the one associatedwith the majority of the highly ranked training doc-uments on the hit list. The categorization system inIBM’s original Intelligent Miner* for Text product15

uses a centroid approach, in which the features arevocabulary items produced by Textract; the catego-ries are represented by vectors consisting of theirmost salient features (one vector per category). Thisrepresentation is similar to the feature vectors de-scribed above for document clustering engines. Inthe centroid approach, the comparison is essentiallya vector-space comparison between a document fea-ture vector and the category vectors.

These clustering and classification methods differ intheir underlying algorithms, in how the tools asso-ciated with them are used, and in their effectivenessfor given document domains. When discussing tax-onomy generation, we pointed out the need for ataxonomy editor with which domain experts can re-view and repair decisions made by the automatic clus-tering and labeling machinery. These tools may re-quire users to find training documents or define if-then rules, or do some combination of these twotasks. Similarly, categorization engines are not per-

fect, and some are more effective for some types ofdocuments than others, e.g., Web documents ver-sus documents produced by office productivity tools,versus news articles, which tend to be relatively un-structured. Fortunately, most categorization systemsproduce a rank associated with their category sug-gestions for a document. These ranks represent thedegree of match between the features of the doc-ument and those of the categories of the model, andthey correlate with the degree of confidence a usershould have in the assignment of the document tothe category.

To conclude, clustering and classification are veryimportant organizing tools for portals, but it is clearthat no one technique is best and that all techniquesneed domain expertise and some degree of admin-istrative skill.

Find: Basic and advanced search. Once informationis gathered and categorized, users can search it tofind what they need using various techniques, froma basic text search to document result browsing in-terfaces on the Web, to more sophisticated searchand browsing tools that we describe below.

The basic technique for retrieving documents bymeans of a query became widespread starting in the1980s.22 The process begins before search, when doc-uments are scanned to produce an inverted index,a kind of dictionary that lists all the words appear-ing in the documents together with their locations.The index is the repository searched when a queryis processed. Early systems tended to index only key-words, selected from the title or other meaningfulfields in the documents. However, in the last 20 years,with more memory and cheaper storage, systems typ-ically have full-text indexing of all the words and alloccurrences. A query formulated by the user is usu-ally lightly processed (e.g., stop words are removed)and sent to the search engine to be matched againstthe index. Many search algorithms are used for thismatching. Most typically, the query is analyzed intoa list of query terms, and the index is searched fordocuments that contain the query terms. The under-lying assumption is that the user is interested in doc-uments that contain the query terms and, morespecifically, that documents containing frequentmentions of the query terms are more relevant. Sev-eral ranking algorithms for computing and sortingrelevant documents have been developed. Many arebased on a tf/df (term frequency divided by docu-ment frequency) formula, standing for the ratio be-tween the frequency of a term in the document and


the number of documents in the repository in whichthe term appears. This means that the contributionof a term to the document relevance is higher themore times the term is mentioned in the document,but this contribution is reduced if the term occursin many other documents as well. Many systems re-fine this basic formula by normalizing for the lengthof the document, taking the order and proximity ofthe terms into account, or allowing for minor termvariations (such as plural and singular forms ofnouns), among other strategies.

Basic searching, as described here, is the most com-mon method for finding information on line, yet of-ten users do not find the information they are look-ing for. There are a number of reasons for this. Theever-increasing size of document collections in-creases the pool of potentially relevant documents.If the collection is heterogeneous, query words maybe ambiguous—the same words may refer to differ-ent concepts in different domains. Finally, if thequery is short (the average Web query is under threewords long), it contains fewer terms and thus matchesmore documents. To compensate for these factors,we are developing advanced search techniques calledprompted query refinement and relevance feedback.

Prompted query refinement (PQR), as the name indi-cates, is a technique for assisting the user in inter-actively refining the query, until a satisfactory set offocused and relevant documents is returned. Often,users start with a short and general query, such asthe word “Java.” When concentrating on their spe-cific information need within their context, they maybe (and should be!) unaware of the potential am-biguity of their query terms. (Java could refer to avirus, an island, a type of coffee, or a programminglanguage.) Even if users are aware of this ambigu-ity, generating the terms necessary to appropriatelyrestrict the query is difficult.23,24 PQR is a tool thatsuggests to users candidate terms to add to their que-ries, as shown in Figure 6. The leftmost object in thefigure shows a query, “cable news,” and a list of termsrelated to the query. End users can select one ormore of these related terms, and add them to thequery specification. Since PQR exploits the featuresextracted by Textract during the document analysisstage, it only offers terms that actually occur in thecollection, in contrast to a general-purpose thesau-rus. In Figure 6, many of the related terms are namesof cable-related companies described in the docu-ments indexed. (Below we further discuss the lex-ical network shown on the right.)

During document analysis, the frequency of occur-rence of each feature in each document is recorded.A special process combines information about thefeatures and their contexts in the entire collectionand creates a special search engine, called a contextthesaurus (CT). When a user issues a query, it ismatched against the CT index, and the hit list re-turned is not one of document titles but one of termsoccurring in documents and ranked by relevance tothe query. CT uses an idea inspired by the PhraseFinder.25 For each feature, it builds and indexes avirtual document, consisting of all the contexts (twoto three sentences) in which the feature occursthroughout the collection. When a query matchesthe virtual document for term X, it is because thequery text is sufficiently similar to contexts in whichX appears in the collection. The PQR system infersthat X is related to the query.

Another advanced search function relevant to por-tal search is informally referred to as “more docu-ments like this,” or more formally as relevance feed-back. When users find one (or more) relevantdocuments in the returned hit list, they can submitthis feedback to the engine and request to see moresuch documents. Under the covers, this is achievedby an analysis module (such as Textract, discussedearlier) that extracts salient features from the doc-ument selected and turns them into queries that re-trieve new documents on the user’s behalf. Automat-ically formulated queries of this kind work very wellsince they involve feedback from the user. Theirother advantage is that they are longer than user-formulated queries and therefore more focused. Fi-nally, they select terms for the new query from theunified context of a single document and thereforereduce ambiguity. In fact, automatically formulatedqueries have been proven so useful21 that severalsearch engines now employ them even without userfeedback. In a mechanism called “automatic rele-vance feedback,” the engine simply generates andexecutes queries from the first few documents it re-turns.

PQR and relevance feedback are two examples oftools that help users find relevant documents throughinteraction. However, interaction is not always pos-sible. With the increased use of pervasive devices forsearching, there is a need to improve search resultson the first iteration, particularly the results at thetop of the returned list. Recent search algorithmshave achieved significant improvements in the searchresults by ranking documents according to other(nonquery) factors and combining the query-based


and nonquery-based scores. Web pages are rankedhigh (and called “authority pages”) if they are fre-quently pointed to by other documents. (See Ref-erences 26 and 27 for many more references to thisresearch.) The success of this method is evidencedby the popularity of the Google Web search service,28

which first put it into production. Other nonquery-based scores include other measures of documentquality, such as number and frequency of accessesand updates and other users’ recommendations. Arecent example of the use of such extrinsic informa-tion to rank documents appears in the system of met-rics used in the Lotus Knowledge Discovery Server.

A query is the commonly accepted expression of auser’s information need. However, a user may havea focused question in mind that requires a succinct,

factual answer. The traditional search paradigmneeds to be specialized for this question-answeringmodel. Users ask full natural-language questions,such as “How much does a laptop cost?” Natural lan-guage analysis determines the question focus, or theintended answer type, in this case, a price. It alsoattempts to determine the question goal (shoppingas opposed to requesting technical specifications forthe machine). Lookup in general and domain-spe-cific ontologies determine the concepts involved(here, specific laptop models). Based on this anal-ysis, the question is translated into a query and pro-cessed by the search engine. In this example, “cost”and “how much” are removed; “laptop” is expandedto include synonyms such as ThinkPad*, and a spe-cial token (MONEY) is added. To ensure a goodmatch, similar processing is done on the document

unitu

unun

unications Inc. for Time

World Championship Wrestling

proposal manager

HBO Original Programming

HBO

9090

900

6060

7070

7070

7070

7070

7070

7575

65656565

Ervin S. Duggan

express cinema

CNNfn

CNN News Group

Turner Broadcasting

CNN

medy Central

MSNBC

n

n

unitn

unituni

parta 8585

7575

joint venturev

subsidiarysid

Figure 6(e.g., “Time Warner” — “subsidiary (of)” — “Turner Broadcasting System”)

cable news


collection to identify and index semantic concepts(such as monetary amounts) prior to searching. Fi-nally, the ranking algorithm is manipulated to returnshort passages instead of full documents. Frequencyof occurrence is not important, and ranking is de-termined based on the presence of all query termsin close proximity.

Automatic question answering is a new area of re-search, combining traditional information retrieval,state-of-the-art natural language processing, andknowledge representation for a deeper understand-ing of a particular domain. Its coverage is limited atpresent, but it is driving our research agenda intothe next generation of portal technologies.29

The search methods we have been describing implythat KWs create and execute search specifications.An alternative is for the system to automatically gen-erate searches on some basis and present results tousers. Personalized search methods push informa-tion to users based on descriptions of users’ inter-ests. For example, users may want to be alerted ornotified about new documents related to a customeror product technology they are currently focused on.These interests may be explicitly expressed in pro-files created by users, mentioning customers andproduct topics. Or user interests may be inferredfrom analyzing documents that KWs browse on theportal30 or from analyzing e-mail content, or discus-sion forums for correlations between topics and peo-ple who discuss them.31

Personalization information can be used in morethan one way. In prototype versions of the IBM GlobalServices K Portal, we extract the categories associ-ated with the documents browsed by the KW and usethis information to automatically augment user que-ries by either restricting the search to these catego-ries or assigning higher weights to documents fromthose categories.30 Keywords in browsed documents,or keywords derived from profiles, can also be usedto create or augment search specifications. The sys-tem can identify other users with similar patterns ofusage and can recommend them as members of com-munities of interest. Personalization can also bebased on analyzing query and query results, as doneby the knowledge agents advertised by portal ven-dors such as Plumtree Software.10 This is an activearea of research at the IBM Research Laboratory inHaifa, Israel.32

Find: Browsing and navigation. Browsing and nav-igation are knowledge work activities that go hand

in hand with the search function. Since informationretrieval is an iterative process, it often consists ofa query-based search that returns some initial infor-mation, followed by browsing of the contents of thereturned hits to learn more about the topic. This ac-tion often produces a reformulation of the query,which initiates another search. Since portals are builtto assist users with large quantities of information,they need to include summarization tools that ex-tract the most important information from docu-ments and display it to the user. Unlike human-gen-erated abstracts, automatic summaries consist of acollection of sentences (or sentence parts) extractedfrom the document, with no new text generated. Thequality of these excerpts is not as good as human-generated prose—they may seem choppy and areusually not as concise—but they are neverthelessquite useful. There are several kinds of summaries.Longer informative summaries (about 20 to 25 per-cent of the document length) can capture all the mainpoints of a document. Shorter indicative summaries(one to three sentences long) are usually sufficientfor determining whether the document is relevantand should be accessed, read, or translated. Studieshave shown that indicative summaries are sufficientfor humans to complete tasks without having to readthe entire document, thereby saving considerabletime and effort.33 A third kind of summary is query-based summaries. They are typically very short andconsist of the most important sentences where thequery terms are mentioned. A fourth kind of sum-marization, keyword summaries, presents KWs witha simple list of technical terms, corresponding to sa-lient names and phrases automatically extracted us-ing an analysis tool such as Textract.

Document summarization works by ranking the sen-tences in the document for importance and then dis-playing as many of them as the requested length per-mits in their original order. The rank of a sentenceconsists of several factors. One is how many salienttextual features it contains, calculated according tothe tf/df formula explained earlier, with extra weightgiven to features that occur in the title and head-ings. In addition to textual features, the structure ofthe document also plays an important part, accord-ing a higher score to sentences in prime locations(such as document initial or final). For longer sum-maries, a technique called topic segmentation is alsoused to select summary sentences. This techniqueexamines the distribution of words in the documentand identifies break points (at the end of sentencesor paragraphs) where the topic changes. Topic shiftsare usually marked by a change in the distribution


of words, since different words are associated withdifferent topics. To ensure that all topics are cov-ered, the summary includes at least one sentencefrom each topic segment.34

Navigation can be described as a form of browsing,not of a single document, but over a group of doc-uments. We have developed a technique for mul-tidocument summarization (MDS)35 that blends sum-marization and navigation. MDS captures the contentof a group of related documents, such as the first100 documents on a search hit list, or the documentsin a cluster formed by automatic taxonomy gener-ation. It shows the subtopics that can be identifiedwithin the group in various ways: the terms that char-acterize each subtopic, a few sentences that best rep-resent each subtopic, and the relationship of eachdocument to each subtopic. This categorization al-lows the user to grasp the different aspects of a topicdiscussed in the documents without having to readany document in its entirety and to easily navigatefrom one subtopic to another using a graphical in-terface. The interface also provides a means to gaugethe relative importance of these aspects by exam-ining how many documents are close to each sub-topic, and to position the cursor on a document tosee at a glance its position with respect to each sub-topic.

Like browsing, navigation is also complementary tosearching. Both methods get the user to informationthat is relevant. Navigation is partially controlled bythe user, who chooses where to go next. It is alsoconstrained to a large extent by the organization ofthe information in the portal, which is designed bythe administrator, as well as by technologies such ascategorization, lexical navigation, and active markup,described below.

Category navigation is navigating along the taxon-omy that groups documents by categories (as de-scribed earlier) and is most closely related to search-ing. The search function selects a group of documentsthat resemble the query, whereas category naviga-tion selects a group of documents that resemble eachother. Combining the two is very powerful. As usersof Web search services such as Yahoo! have all ex-perienced, getting to relevant information is usuallythe result of interleaving search and category nav-igation. In the IBM Global Services K portal, this ca-pability is provided so that a user can choose a cat-egory first, and then issue a query against thedocuments in the category. Choosing a category firstcreates a more homogeneous collection to search

within, and therefore can yield more focused results.Even if a category was not preselected, the K portalmiddleware will allow documents resulting from asearch to be organized into the categories to whichthey belong (analogous to the Northern Light searchservice36). Within each category returned, documentsare ranked with respect to the query.

We have also developed a technique called lexicalnavigation17 to allow users to navigate among salientconcepts that have been identified in the collectionand represented as textual features. These conceptsare linked to one another in two different types ofrelations. Unnamed relations, based on co-occur-rence, indicate that two concepts are related in someunknown way. Named relations,37 based on linguis-tic patterns identified in the text, indicate the pre-cise nature of the relationship (e.g., location, kin-ship, or employment), as described earlier. Theseconcepts and relations form a network, with conceptsas nodes and relations as links. Once the user hasentered the network, for example, by using theprompted query refinement mechanism to select oneor more concepts that are relevant to the query, heor she can then follow relations and navigate to otherconcepts in an unconstrained way. Figure 6 showsboth PQR and an example of a lexical network. Webelieve that this form of navigation is potentiallyhelpful for the novice, who is trying to become fa-miliar with the scope of a collection of documents.The advantage of the graphical display is that userscan focus on a particular neighborhood of interest-ing terms and easily observe the interconnectionsamong several terms at once. However, when the net-works become very large, graph layout can becomedifficult, and users risk losing intuitions about theirlocation in concept space (see Conklin38 for a clas-sic analysis of usability issues pertaining to complexhyperlink graphs).

Finally, we put all of these navigation modes togetherwith a technique we have prototyped called ActiveMarkup, which links summaries, documents, and con-cepts.39 Figure 7 shows an example. When a docu-ment is accessed, its short keyword summary appearsat the top of the page. Each sentence or initial wordin the summary is an active link to the same sentencein the body of the document. Thus, the summaryserves as a launching point to the part of the doc-ument that is of interest. The keyword summary alsosupports navigation. Each keyword is a link to otherconcepts related to it, as well as to other documentscontaining it. This form of navigation is less struc-tured and more associative by nature than category


navigation. It provides a means for browsing thespace of documents without having to choose a cat-egory or formulate and rephrase queries. We some-times refer to navigation with active markup as “que-ry-free searching.”

Supporting knowledge workers’ analysis, synthesis,and authoring of information. Broadly construed,knowledge work involves solving problems. This def-inition implies human analysis of information, syn-thesis of new information expressing implications andsolutions, and authoring of new artifacts to commu-nicate solutions to colleagues. For example, in a cus-tomer engagement, presentations are prepared, pro-

posals and development plans are generated, projectteams formed, roles and responsibilities defined andnegotiated, budgets developed, and so on. Search-ing and browsing are a first step, but the informa-tion returned needs to be analyzed by KWs in task-oriented ways. Many of the software tools used tosupport this task of human analysis have been de-veloped outside KM contexts as standard office pro-ductivity applications, such as word processors, pre-sentation graphics (e.g., Lotus Freelance Graphics**,Microsoft PowerPoint**), spreadsheets, businessgraphics, project management software (e.g., Mi-crosoft Project), and document templates that rep-resent forms and outlines for documentation.

Figure 7 Active markup, where automatically extracted technical terms are turned into hyperlinks to related terms and toother documents that contain the same terms


In addition to these tools, new tools specialized forKM are emerging for analyzing and synthesizing in-formation. We have already described tools forbrowsing search results such as multidocument sum-marization and lexical navigation. Creating such toolsis an active area of research in computer science, in-formation retrieval, and cognitive psychology, andmuch more can be found in Card, Mackinlay, andShneiderman.40 These tools exist as research pro-totypes, and have not yet appeared in commercialapplications. Other KM tools such as Grape-VINE2,41,42 and the former Knowledge X41 haveemerged in the commercial domain. They are in-tended to help KWs generate relationship maps orgraphic visualizations of entities and relationships.

Some examples are schematized in Figure 8. Thesevisualizations express organizational structures, con-nections among people, and project-related topicsand artifacts. The goal of these tools is to providea heterogeneous and open-ended workplace for rep-resenting objects and relationships and to help KWsdiscover potential new relationships. Representa-tions of entities and relations are integrated to someextent with databases containing information de-scribing entities, such as organizational, personnel,and project-related databases.

Project collaboration is another focus in research andcommercial domains. An example of the former isTeamSpace,43 a prototype that provides real-time

Figure 8 Visualizing knowledge resources

BILL SMITH

PORTAL, INC. PROJECTPROPOSAL

CUSTOMERLIAISON

PROJECT MANAGER

RELATIONSHIP MAP INTEGRATED WITHCORPORATE DATABASES AND OTHERINFORMATION SOURCES

PORTAL, INC. ENGAGEMENT

PROJECT LEADPROJECT

TEAM

TECHNICAL CONTACTS SUSAN MOORE JOHN BROWN


distributed meeting support using shared workplaces,telephony, and video conferencing, and in addition,the tool archives meetings artifacts such as presen-tations and video and audio recordings. Users canbrowse collaboration events on a graphic timelineand select meeting artifacts, such as audio and videorecords, to browse and play back. Tools like Bab-ble44,45 enhance real-time chat-like communicationamong collaboration teams, presenting graphic rep-resentations of topic groups, including the identityof discussion participants, and even visual indicationof their real-time participation. Issue-Based Infor-mation Systems (IBIS) capture team design and prob-lem-solving using text-oriented outlines or visualmaps to represent discussion topics and the issuesrelated to them. (Conklin38 is a somewhat dated, butstill definitive discussion of IBIS in the context of hy-pertext computer systems; Conklin and Begeman46

discuss a graphic version called gIBIS.) A version ofthe IBIS system47 is used by the ICM AssetWeb.

Specialized productivity tools have been developedto support KWs in call centers. These workers, knownas customer support representatives (CSRs), need fastaccess to specialized information as they attempt toidentify a solution to a customer problem during alive telephone conversation. The DataCase system,developed at the IBM T. J. Watson Research Centerfor assisting IBM help-line CSRs, involves manual cre-ation of decision trees that are traversed by the CSRto identify the nature of a technical problem.41,48 TheTAKMI project, at IBM’s Tokyo Research Laborato-ry,49 classifies and describes customers’ problem in-quiries in terms of the key concepts they express, us-ing enhancements of the text mining tools discussedearlier. It correlates the textual features extractedfrom documents with other meta-data features, suchas the date, or manufacturing location, to spot trendsin customer problems, and the products and featuresassociated with them. These feature correlations canalso be shown in a variety of information outliningviews,50 e.g., timelines for trend analysis or event dis-tributions plotted against geographical locations.

These tools emphasize visualization techniques, withsome degree of automated generation and updateof visualizations, intended to help users discover newfacts and implications of information. The rationalefor visual techniques is based on the fact that hu-mans are highly visual, and much human reasoningand problem solving is facilitated by visual metaphorsand techniques as evidenced by the widespread useof presentation graphic artifacts in office productiv-ity applications and the great care taken in pro-

ducing them (see Tufte51 and Card et al.40 for anextensive review of information visualizationtechniques). What existing office productivity toolslack is an automatic relation between the represen-tations of entities and the data that they represent.In some cases, this relationship may not be possibleto obtain automatically, because too much humanintelligence was involved in the synthesis and con-ceptualization that created it (reflect on the com-plexity of typical presentation graphic slides). How-ever, in other simpler cases, such as representingsimple connections among project team members,customers, and project artifacts, it may be possibleto automatically link information about these enti-ties as stored in a database to their visual represen-tations. Such updating still requires a level of infor-mation and application integration that is not yetcommonplace. Keeping seemingly straightforwardartifacts such as on-line Web pages, resumes, andpersonal information databases current is a difficulttask. Moreover, the conceptual structures createdby standard office productivity tools are more com-plicated than relationship maps. These structuresrequire a great effort to create and maintain, andtypically require formal human explanation tounderstand and draw implications from, involvingintensive human communication and presentationskills. More innovation is needed to automate thegeneration and update of these kinds of structuresand to support the discovery of implications basedon them.

Once KWs have analyzed information and synthesizeda solution, they need to communicate it. Several in-novations in authoring are emerging. Collaborativeauthoring allows multiple authors to keep track ofmultiple contributions, annotate contributions of co-authors, and merge multiple edits. Collaborative an-notation allows annotation by readers at large, en-riching documents with comments and additionalperspectives.52,53 Robertson and Reese54 describe acorporate research-desk prototype where researchresults related to a topic are organized in hyperlinkedbriefs for reuse in future inquiries related to the sametopic, and internal versions of such research briefsare provided as a service through an IBM Global Ser-vices research desk organization. Smart documentsuse a portal-like search to automatically retrieve rel-evant information for the document at hand. Thesetools analyze what the author is composing and sug-gest collateral information that might be of use. Theylook up references, make sure citations are accurate,and provide example passages from other docu-ments. The SOALAR (Solution Architecture Logic


and Reuse) project at the IBM T. J. Watson ResearchCenter55 enhances a document management systemspecialized for creating contracts and proposals byretrieving potentially reusable document compo-nents from prior documents, in the appropriate con-texts. Since the tool is aware of the structure of suchdocuments, it also checks internal consistency andcompleteness of the current document and capturesthe resulting new document as a reusable asset backinto the repository.

The relevance to portals is at least twofold. First, theartifacts created by these tools contain useful infor-mation that can become part of a K Portal, if thecrawling and content analysis methods can access andprocess them, which is not the case today. Second,keeping these tools for analysis, synthesis, and au-thoring updated with relevant information providedby search and text mining capabilities can make themmore useful. We believe that as portals evolve intomore broad-based knowledge workplaces, thesefunctions will become increasingly interwoven withother tools that support analysis, authoring, and proj-ect execution. The results of searching in a new gen-eration of analysis and synthesis tools will be inte-grated into targeted information flow and into abroader range of information visualization struc-tures, thereby helping KWs apply their human intel-ligence to become aware of and discover new rela-tionships among information elements. We discussthe innovations implied by this scenario in the sixthsection of this paper.

Distribute, share, and collaborate. The last high-levelknowledge work task in Figure 1 is sharing exper-tise. The raison d’etre of portals is dissemination ofknowledge captured in electronic form. KWs can dis-tribute information by submitting documents to re-positories accessed by the portal crawling infrastruc-ture. KM practices may be needed to evaluate thequality of documents and to assign meta-data at-tributes so that documents can be categorized or han-dled in a standard way in a portal infrastructure. Theprogress of a document within some electronic dis-semination processes (e.g., certification, categorymeta-data, and authorization) can be managed byworkflow software. Within a project team, projectrelevant information may also be distributed viashared document repositories such as Lotus NotesTeamRooms, or via electronic mail with attach-ments. Portals support sharing of documents and col-laboration among KWs by giving them access to sum-maries of persons’ resumes and areas of expertiseand by publishing documents. KM tools like the ICM

AssetWeb exist in a workstation environment thatincludes tools for collaboration, such as electronicmail, calendar, real-time meeting support withshared applications that are integrated with tele-phony, instant messaging and awareness,8,56 andvideo exchange. Beyond these portal connections,collaboration support is a broad area of research andproduct technology.45,57

Building and maintaining knowledge portals

Knowledge portal technologies are typically associ-ated with a set of administrative tasks, requiring theexercise of various kinds of expertise. Figure 9 de-scribes a general portal architecture that capturesthe integration of technologies and human interven-tion.

Portal application architecture and implementation.Figure 9 indicates the major K Portal componentswe have developed, beginning along the top of thefigure with “Gather/Analyze Content.” This crawl-ing component gathers and extracts text and meta-data content from collections of documents distrib-uted in multiple repositories over a network. Theextracted content is rendered in a standard XML for-mat, which allows its exploitation by various text anal-ysis and indexing processes, identified in the lowerleft box in Figure 9. The XML meta-data are loadedinto relational database (RDB) tables, as are the cat-egory features. The text content of the documentsis indexed in a searchable text index, and the doc-uments are automatically categorized. Search andnavigation functions in the application client (UI) arebased on real-time access of text search engines andRDB tables by a set of run-time classes. If person-alization functions are used, such as we describedearlier, they may require storage of user profiles andrecords of portal usage, aggregated for the identi-fication of communities of users.

In our K Portal, we developed a set of object-ori-ented portal abstractions for run-time support of ap-plications, captured in Java** classes that correspondto familiar entities such as documents, categories,and queries. This middleware provides a higher-levelprogramming interface aimed at improving applica-tion development and more powerful ways to ma-nipulate the results of text and RDB searches duringrun-time processing of user interactions. For exam-ple, the application client may allow end users to en-ter queries using a simpler syntax than the under-lying search engine can accept, or it may define setsof default parameters transparent to the user. Ap-


plication enabling middleware then parses user que-ries and transforms them into search specificationsappropriate for one or more search engines. A newgeneration of application enabling middleware willallow results to be merged and manipulated in fed-erated searches across multiple heterogeneous da-tabases. Customizable application clients renderbackend text and meta-data features flexibly. Typ-ically, the middleware and application client softwareoperate in a rich Web infrastructure that includesa variety of techniques for managing Web access, per-formance, and generation of Web pages with dy-namic data (e.g., search results).

Knowledge workers typically do not have to concernthemselves with the implementation and mainte-nance mechanisms of portals, although they canexperience the impact of these on performance andintegration of end-user functions. For example, themiddleware may play a role in managing user reg-istration and controlling access to documents. Theimpact on the user is manifested in log-on proce-dures and document access limitations. Applicationintegration affects how easily new functions can beimplemented, how easily code can be customized andmodified, and how seamlessly data objects in one toolcan be used by another. In our experience, the path

Figure 9 Technology architecture for an example knowledge portal, from gathering document information, to indexing andcategorization, and run-time generation of Web pages

APPLICATION MANAGERQUERY INTERPRETER

NOTES DATABASE WEB SITES

CONTROLS K-MAPS (XML)

KNN CATEGORIZER AND RULESINDEX TEXT CONTENTLOAD CATEGORY NAVIGATION AND DOCUMENT META-DATA

KNOWLEDGE MAP

CORE TECHNOLOGIES


from prototype functions to availability in a sup-ported K Portal application can be quite lengthy, inpart as a result of these considerations.

Figure 9 (upper left quadrant) also alludes to aKnowledge Map editor (K-Map) tool used to createspecifications that control crawling and categoriza-tion. Note that the use of the term K-Maps is rap-idly becoming a generic term with slightly differentmeanings in different contexts, e.g., in the IBM GlobalServices K Portal and the Lotus Knowledge Discov-ery System** (see Davenport and Prusak2 for still moreuses of the term “knowledge map”). However, mostuses refer to the capability of building and editingtaxonomies. From a knowledge administrator’s pointof view, K-Maps are intended to specify what repos-itories to access for the portal and how to categorizedocuments. K-Maps are implemented as XML de-scriptions that can be interpreted by the K Portalindexing, analysis, and categorization programs inorder to control their behavior. K-Maps are high-level tools used to create taxonomies. They have thelook and feel of a directory navigator (e.g., MicrosoftWindows Explorer**), and allow users to drag anddrop training documents into subcategories. Othercapabilities under development include forms forcapturing rules that specify what repositories tocrawl, or alternative rule-based methods for catego-rizing documents.

From a programming viewpoint, K-Maps are in-tended to facilitate the maintenance of K Portal ad-ministration programs with declarative specificationsfor how to organize information and manage the in-teroperability of software components. For exam-ple, in the IBM K Portal context, K-Maps might beused to control the crawling process (specifyingsources and crawling parameters), to control howcrawler output is to be used in text indexing and cat-egorization processes, and to specify how users viewand navigate taxonomies in the K Portal Web clientuser interface. K-Maps are a major step toward anew information and software architecture wheresoftware components are services that interact in astandard XML protocol, and where a declarative setof attributes represents implied rules for the oper-ation of each service, its required input, and the re-sults it produces. The use of XML is also changingthe nature of the analysis processes—text analysisand information extraction are now XML enabled.New search engines are being developed to allowsearches on XML structures that represent both tex-tual features and meta-data.58 As a consequence,component applications do not need to be compiled

together, but can interact in simple, standard ways,based on simple Web-based client-server protocols.59

Portal management. Ideally, the technology compo-nents described in Figure 9 will run virtually auto-matically, minimizing the role of human manage-ment. Although this situation is increasingly the casefor many K Portal tasks, there are still aspects of por-tal operation that require human involvement andoversight. These aspects include managing the pro-cess of crawling, indexing, and running categorizers.Other tasks involving content management will likelynever be automated. These tasks include, for exam-ple, developing and maintaining taxonomies, assess-ing the quality of search and categorization, andmaintaining news channels and highly dynamicsources of information.

Gathering and extracting information requires iden-tifying relevant repositories and specifying crawlingrules to gather relevant information and ignore ir-relevant information. Web sites and repositories suchas Lotus Notes can pose various difficulties to crawl-ing and data extraction. Access rights may have tobe negotiated with owners. Dictionaries may needto be defined to map differences in meta-data ter-minology from one repository to another. Docu-ments may be corrupt, and Web sites may have id-iosyncrasies. These problems diminish over time butcan be challenging early in deploying portal infra-structures, requiring system administration expertise.

Building or installing a K Portal infrastructure typ-ically requires a range of software engineering skills,such as database and system administration skills andsome level of programming where Web clients needto be customized. These administration tasks shouldbe supported by high-level tools. State-of-the-artWeb generation software, such as JavaServer Pag-es**, is also critical to rapid customization and it-eration on client user interfaces. Once the K Portalis running, the skills needed are more in line withKW goals and expectations and are less system-re-lated. Domain experts need to develop taxonomies,identify new sources valuable to the community,manage certification, and possibly classify new in-tellectual capital.

How much quality control should be exercised in ac-cepting assets into the portal repositories is an openquestion. Better quality requires great effort on thepart of a few authors and editors but minimizes thefrustration of many end users and maximizes theirefficiency. An approach to the issue of varying qual-


ity is to create a process for submitting and quali-fying documents. This role can be assumed by spe-cialists who are domain experts (the “core teams”mentioned earlier). Although it increases the valueof portal assets, quality control can have disadvan-tages. It can lead to bottlenecks in getting informa-tion into the portal repository in a timely manner,which may discourage KWs from both submitting in-formation and using the portal for business-criticaldecisions. A consequence we have observed is that,in some communities, informal portals, that are ex-empt from the formal quality control requirements,proliferate. This occurs when portals are built andsupported in partisan ways by small organizations.

There is an inherent tension between trying to cap-ture information as quickly and broadly as possiblewhile ensuring its quality. Organizations grapple withthis issue when they establish policies for managingquality. We have seen stipulated regulations requir-ing a practitioner to verify that all the intellectualassets associated with an engagement have been sub-mitted to the portal before the engagement can beclosed. Organizational incentives also play a role. Au-thors may be acknowledged or compensated if theirdocuments are valued by others. (An excellent dis-cussion of these issues can be found in books by Dav-enport and Prusak2 and Stewart.60) Automation canplay some role here. For example, click logs can de-termine how many times a document is accessed.Documents with links can be analyzed for their con-nections to and from other documents (as we dis-cussed in an earlier section; see also Chakrabarti etal.26). Another promising approach involves algo-rithms for detecting “useless” documents that do nothave much content or value.61 To identify such doc-uments, we have asked users to skim documents andjudge them useful, somewhat useful, or useless. Wethen used machine learning techniques to train onthese documents in order to recognize similar doc-uments, presumed useless, and eliminate them fromthe repository. This technique works well for veryobvious cases, such as documents that contain mostlystandard template verbiage with little additional con-tent. Still, it leaves open the issue of subtle qualityproblems, such as poor style or inaccurate informa-tion, as well as the problem of improving and cor-recting these documents.

In our experience, developing taxonomies and en-suring the accuracy of categorizing documents is adifficult task. We discussed technology for cluster-ing and categorizing documents and the skills neededfor this task in an earlier section. Building taxono-

mies requires a domain expert who understands howusers would like the collections organized and whatterminology will be intuitive for naming categories.In our experience, domain experts need to under-stand the users who make up the community (nov-ices and experts alike) and be able to produce a co-herent organization of the domain that will besuitable for them. In the IBM Global Services expe-rience, developing taxonomies has turned into amethodology that has become an integral part of in-ternal K Portal deployments. The expert should alsoknow how to use tools, such as K-Maps referred toearlier, which allow easy creation and editing of tax-onomies, finding of training documents, and assign-ment of documents to categories by dragging anddropping. Search tools can help identify training doc-uments, allowing users to search for documents thatcontain terminology relevant to the taxonomy nameor description. These tools assist in the building ofthe taxonomy but still leave the burden of evaluat-ing its quality to the human.

Tools have been developed to calculate metrics thatcan help in this evaluation. The e-Classifier tool de-veloped at the IBM Almaden Research Center is ataxonomy-editing application that produces quan-titative measures to characterize a categorizationscheme. It comes with visualization tools for analyz-ing the distribution of documents in categories. Theuser can see, for example, how big each category is,how similar its member documents are to one an-other, and how well differentiated one category isfrom another. If a category is too big, or not coher-ent enough (documents in a category are not sim-ilar enough in some sense), the category can be bro-ken into smaller categories, and documents can bedistributed appropriately (see References 16, 19, and20 for an overview of publications on text analysisincluding categorization and clustering; see also dis-cussion of aspects of this technology in the LotusKnowledge Discovery System).

A final requirement for portal development is cus-tomization and personalization. In addition to thefundamentals of text search and category navigation,portals provide information targeted at a specificuser community. For example, bulletin board items(shown in Figure 2) and news items (not shown) canbe customized and maintained by staff in a supportorganization. Some of the customization burden canbe alleviated by allowing individuals to personalizetheir own portal. My Yahoo!, for example, allowsusers to create their own portal around a core setof Yahoo! functions, with other information and ser-


vices of interest specified. Beyond news channels, atrend in K Portals is to enable applications to ap-pear in windows within the portal context. PlumtreeSoftware calls these windows gadgets. The Lotus K-station** portal application calls these windows port-lets. An example is a business graphic tool that popsup in the context of a spreadsheet to display numer-ical data. These miniapplications have great poten-tial in our view for transforming K Portals into morebroad-based KM workplaces. The key issue is howwell these miniapplications actually integrate withone another in task-relevant ways.

Portal in a box? The administrative and support re-sponsibilities discussed here question the feasibilityof a portal “right out of the box,” as advertised bysome vendors. This ideal is certainly plausible andone that motivates the development of tools such asthe K-Map discussed earlier. We should differen-tiate between easy deployment of the technology andthe effort required for integrating it into a hetero-geneous software environment and operating it dayto day. Even for deployment, the ease of installa-tion should be assessed against the size of the com-munities, the scope, diversity, and quality of infor-mation resources to gather and index, and the skillsavailable for the roles and responsibilities outlinedabove (e.g., taxonomy development). There is roomfor more systematic competitive evaluation practicesto assess these trade-offs. Proprietary evaluations areoften undertaken, but the results are rarely availablein the public domain.

Knowledge portals in an expandingknowledge workplace

Knowledge portals represent a combination of tech-nologies and practices that serve key knowledge worktasks. As we noted, however valuable portals are, theynonetheless represent only a portion of the supportKWs need to be effective. Other tasks are not yet wellintegrated in the broader knowledge workplace. Fig-ure 10 suggests the dimensions of this larger elec-tronic knowledge work context and how K Portalsneed to evolve. We depict in schematic form the va-riety of tools and collaboration contexts in which KWsoperate (right side of Figure 10). These stand in con-trast to the tasks, artifacts, and information needsKWs have as they carry out project tasks (left side ofthe figure). In our view, the digital knowledge work-place of the future will be driven by a more intel-ligent and task-oriented infrastructure than the oneenabled by current KM technology. This emergingknowledge workplace will support targeted knowl-

edge work tasks more directly and integrally, withreference to specific project roles and responsibil-ities in a collaborative work environment. This work-place will emerge even as the KW’s software environ-ment expands and becomes more distributed andvaried.

The expanding knowledge workplace. Knowledgework is increasingly carried on in mobile and per-vasive computing environments, in the field or onthe road, outside of offices, where KWs must use note-book computers or even handheld devices, and with-out access to high-bandwidth network connections.These constraints influenced design decisions for theIBM Global Services K Portal. For example, the por-tal needed to be a lightweight Web application, withno Java servlets, thus limiting the interactive sophis-tication of applications. To accommodate travelingusers with a low-bandwidth connection, we displaymeta-data indicating document size in bytes and at-tachments. Techniques such as summarization andquestion answering will also play an important rolein adapting search results for display on small de-vices. But mobility requirements run much deeperthan this, of course. For example, information cre-ated or available in one application context (e.g., e-mail) needs to be available in other application con-texts (e.g., phone mail or a handheld device). Theinput and output modalities and properties of thesedifferent devices vary widely in physical and human-computer interaction terms. The challenge is to mapinputs and outputs appropriately from one deviceto another, where in some cases, the mapping mayentail a deep restructuring of information.62,63

Most of the knowledge we have been discussing inthis paper is expressed in formal electronic docu-ments that take considerable effort to produce. How-ever, KWs express their expertise in other situationsas well, such as in meetings or phone conversations,and these situations provide potential new sourcesof knowledge and expertise. The effort of producingformal documents often inhibits their creation, sincethe focus of practitioners’ work is on customers, notdebriefing. However, if KWs do not always have thetime or incentive to compose formal documents, theymay be willing to dictate reports using speech cap-ture, or allow conversations on the phone or in meet-ings to be captured and analyzed using speech rec-ognition. Work done in the IBM T. J. Watson andAlmaden Research Centers suggests that speech cap-ture from audio or video recording and text analysisof recognized speech text may provide rich newsources of potential knowledge.64 Analyzing other


sources, such as discussion groups, e-mail commu-nications, project artifacts stored in project work-places, and Web usage, can also provide informa-tion that can be used to find expertise, identifycommunities of interest, and find new relevant in-formation.30,43

Deep task support: Information architecture and in-telligent software infrastructure. KWs who use theportal technologies described earlier find them use-ful for discovering information relevant to projects,but they also express the need for better integrationof these technologies with the other tools that theyuse in projects. This applies to all the KM-relatedtools we have been discussing, including project man-

agement, collaboration support, information anal-ysis, document generation, and data management.The information they produce needs to be readilyusable by other databases or tools. If the motivationfor K Portals is unification and integration, it is rea-sonable to pursue these goals across the full spec-trum of KM tools. In this subsection, we sketch thebroad outlines of the knowledge workplace we seeemerging with such integration.

If we elaborated on the “day-in-the-life” scenario inTable 1 to describe in more depth the pragmatic andtemporal structure of project tasks, the roles and re-sponsibilities implied by them, information flow, anddependencies, we would note the big gaps in the elec-

Figure 10 Expanding the knowledge workplace

NOTES

KNOWLEDGE WORKERS

E-MAILCALENDAR PROJECT

PORTALS

TEAMROOMS

ANORTAL,

NP

APPLICATIONS/


tronic workflow that supports them. These gaps aretoo often “left to the user to integrate,”65 that is, filledby more or less ad hoc practices, workarounds, and,in our opinion, nonoptimal, nonelectronic, integra-tion activities developed by KWs. Although currentproject management tools help KWs represent proj-ect execution, the resulting representation is not it-self an active driver of application information flowor task execution. Providing deeper, more integratedsupport for KW projects will require technical inno-vation in information architecture, that is, in the rep-resentation of the structure, and attributes and rolesof information associated with tasks. Exploiting aricher, broader information architecture also impliesintelligent software infrastructure: software pro-cesses and tools that can act on the representationof task information and processes and support andeven automate aspects of task execution for KWs.

Currently, information architecture of tasks is im-plied in the activities and expertise of KWs but nottightly coupled to software architecture. For exam-ple, the information architecture we developed inan early prototype of the portal described in Figures1 and 2 was intended to categorize documents interms of the processes and subtasks involved in “en-gaging customers.” However, this relates only indi-rectly to the temporal and pragmatic steps of engag-ing with customers, and it is dependent on the skillof KWs to understand and exploit. There is no au-tomated workflow implied in this category represen-tation, no constraints or structure to guide KWs inproject fulfillment, no explicit representation of mile-stones or ordering of subtasks, and little relation tothe tools and applications used to fulfill elements ofthe project over time. What is missing is active proj-ect support: This means automatic or semiautomaticaccessing, organizing, and presentation of projectinformation within the temporal context of day-to-day project management. For example, it would beuseful to show a KW who is responsible for summa-rizing a customer visit, similar reports written by oth-ers for customers with similar size, industry, and de-liverables. In addition, it would be useful to accessand present these similar reports automatically,based on an understanding of the KW project role,and where he or she is in a temporal sequence ofproject tasks.

The information architecture necessary (but not suf-ficient) for enabling active task support is a richerrepresentation of task structure, represented, for ex-ample, in terms of scripts or schemata.66–68 Thesestructures are intended to represent structural ele-

ments of tasks organized in time, including projectroles and responsibilities, information dependenciesand flow, and conventions for handling human in-teractions, including exceptions and breakdowns incommunication (e.g., see Flores et al.69 and Medina-Mora et al.70). We envision a new form of K-Mapsto represent not only content-based taxonomies, butricher, task-based workflow. End users would expe-rience the value of this architecture because it wouldenable them to operate in more task-oriented terms,using active project templates.

The smart documents project mentioned earlier pro-vides an example. Currently, IBM Global Services KWswork on documents called statements of work (SOW)in the isolated context of a word processor, search-ing for relevant planning information in a separateportal context, and then using some manual effortto extract relevant information and incorporate itinto the SOW. In contrast, a smart document wouldenable KWs to initiate a context-sensitive search fromwithin segments of the SOW, returning similar seg-ments from other SOWs, based on the attributes ofthe SOW segment. Carrying this notion further, wecan envision other smart project management toolsthat embody the engagement life cycle in an activeway. For example, a new project might start as a time-line view (as in TeamSpace43 or based on an en-hanced project management tool), with links fromthe timeline to appropriate templates for the doc-uments and analyses that need to be created at eachpoint in the life cycle. Agents would analyze, infer,and gather information that is contextually relevantfor each subtask of a project. An approximation ofthis process is suggested by automated, profile-basedsearch and integration techniques for creating per-sonalized news71 or topic-related reports.72 Auto-matic search and information integration could ful-fill the promise of the visualization tools discussedearlier. It would entail more effectively channelinginformation into visual representations, expandingthe kinds of structures and relations that can be cap-tured and expressed, and expanding the kind of im-plications that can be made. Collaboration supportcould be enhanced by this infrastructure as well.Meeting mining projects at the IBM Watson and Al-maden Research Centers64 are exploring how to re-trieve information dynamically during meetings byusing search agents that listen to discussions as theyoccur. Active Calendar, a project at the IBM Alma-den Research Center, is exploring an intelligentmethod for correlating information in calendars, pro-files, and e-mail so that implications related to timemanagement can be drawn for KWs. For example, a


KW who schedules a business trip to San Franciscomay be notified of local events of personal or bus-iness interest occurring during the visit period.73

These scenarios exploit a simple but well-known cog-nitive psychological principle, namely, that it is eas-ier to recognize and select information than it is torecall and generate it. 74 (This is why, for example,menu-based user interfaces are more likely to be us-able than command-based interfaces.) KWs would re-use and manipulate relevant information, addingonly what is uniquely required for the specific taskat hand. Portal-like functions of search and analysiswould be distributed and interwoven throughoutelectronic workflows and within artifacts and in-volved in the context of creation and editing.

The automation or semiautomation of these tasksis often based on intelligent agents, which are pro-grams that run autonomously in some electronicenvironment (such as the Internet), monitor events,respond to them, and carry out actions, based onrules defined over representations of some task do-main.75 Intelligent agents also raise the question ofthe general role of artificial intelligence (AI) in re-placing or augmenting human intelligence in knowl-edge work. The role of AI is a complicated story, withsuccesses in specialized domains where expertise canbe expressed in rules, e.g., medical diagnosis or min-eral prospecting. Examples of promising research onintelligent support of cooperative work76–78 suggestpotentially broader KM roles for AI technology. How-ever, in our opinion, AI has not yet provided a re-liable, broad-based foundation for intelligent infor-mation or software architectures in the domain ofKM technology (see Davenport and Prusak2 andSmith and Farquar79 for more discussion).

Methods of knowledge capture will have to evolveas well as richer representational schemes for de-scribing tasks in rules and schemata and for captur-ing the flexibility characteristic of human problem-solving and planning. Structure and constraints intask objects and processes need to be balancedagainst flexibility and open-endedness required forcreative organization of new tasks and informationobjects. In the real world, tasks do not always un-fold in a linear sequence. Task flows must allow foractivities to proceed in parallel, reflecting activitiesby multiple participants, and they should also allowfor unanticipated difficulties. Although temporal or-dering and ordering by information dependency areimportant, contingencies are needed to reflect break-downs and repairs in executing goals and subgoals.

The knowledge workplace might best be viewed asan active project management tool enhanced withrouting, task lists, approval management, and flex-ible access control to allow participation by peoplewith different roles, including project members, aswell as customers and partners.

Broad-based KM product platforms, such as the Lo-tus Discovery Server and content-management com-ponents of the IBM WebSphere Portal Server, arebeginning to provide middleware and application de-velopment tools that can be used to build workflowprocesses moving in these directions. However, inour view, more innovation is needed to develop aninformation architecture and intelligent infrastruc-ture as we have defined them. The capabilities of ex-isting portal and KM tools and applications need tobe organized in a more fine-grained and modularWeb services-oriented architecture,5,59 transformingmonolithic applications into systems of lighter-weightWeb-based components, organized around elec-tronic workflows, and interoperating via lightweightinformation exchange protocols. In this context, por-tal functions likewise become distributed compo-nents of a larger and more task-oriented electronicworkplace.

Ensuring quality and effectiveness ofknowledge portals

In various parts of this paper, we have appealed toanecdotal evidence for various claims about the use-fulness of KM technologies. The fundamental mo-tivation for KM technologies such as knowledge por-tals is to make KWs more productive. How can wefind hard evidence for claims about utility and us-ability? We have also suggested that understandingknowledge work will be the driver of more intelli-gent software infrastructures. Can we discover howKWs use KM technology so that we can capture it inmodels and rules?

Human-centered evaluation and analysis practicesare as important to developing and innovating KMtechnology as are the hard disciplines of computerscience and software engineering. Professional re-search and applied disciplines in usability engineer-ing, human factors, and human-computer interac-tion draw on cognitive and behavioral sciences,including anthropology, to provide behavioral meth-ods for analyzing knowledge work, identifying re-quirements, evaluating system use, and analyzingfeedback to guide design. Examples of organizationsthat do this work are the ACM (Association for Com-


puting Machinery) Special Interest Groups, includ-ing SIGCHI for computer-human interaction, SIGIRfor information retrieval and text analysis, SIGGROUPfor computer-supported collaborative work, or group-ware, and SIGKDD for data mining and KM-relatedresearch. User-centered design methodologies canbe found in Helander80 and in the contextual designmethodology of Beyer and Holtzblatt.81 Case stud-ies can be found in KM-related areas such as collab-oration.82,83

A research agenda

The emerging knowledge workplace as we have de-scribed it will be driven by a co-evolution of threeresearch initiatives: an evolving understanding ofhow KWs accomplish tasks, technical innovation incomponent technologies, and innovation in applica-tion integration linking tasks and technology.

First, understanding KW tasks and needs means ap-plying human-centered behavioral methods to dis-cover and analyze knowledge work and to guide thedevelopment of new technology with user require-ments and user-centered design and evaluationmethods. Second, technological innovation at thelevel of the text-based and search-based componentsis required. The research challenges continue to beimproving how KWs find relevant information, assist-ing them in seeing implications and connections ofseemingly unrelated facts, and helping them accom-plish elements of project plans and artifacts. We havesuggested that a prerequisite for these innovationsis deeper understanding of the semantics of docu-ment content and structure and of project task struc-ture.

Finally, a better understanding of knowledge workpractices can also contribute to the third area we be-lieve is essential for effective KM tools, which is howthe subsystems and components implied by innova-tion can be integrated into a cohesive knowledgeworkplace. The challenge is for all the tools to worktogether smoothly and efficiently, with the ultimategoal of making knowledge workers more effectiveand productive.

Acknowledgments

Much of the work described in this paper was de-veloped with numerous colleagues. We would liketo especially acknowledge our partners in IBM GlobalServices, notably Ko-Yang Wang and Charles (Rod)Cowan, who have supported collaboration with Re-

search for several years, and who provided many in-sightful discussions over the years about topics dis-cussed. In particular, our discussion of “knowledgeworkplace” owes much to Rod’s and Ko-Yang’s owninitiatives for supporting the internal IBM Global Ser-vices community. We also thank several colleaguesin IBM Research, including Wayne Niblack, EricBrown, Edward So, James Cooper, Mary Neff, BranBoguraev, Herb Chong, Aaron Kershenbaum, AnniCoden, and John Prager, with whom we have col-laborated on the technologies discussed in this paper(some deployed in the portals described). In partic-ular, Ed So, Eric Brown, and Wayne Niblack werekey contributors to the IBM Global Services knowl-edge portal discussed in this paper. Finally, we thankJames Cooper, Alan Marwick, Rod Cowan, andthree anonymous reviewers for providing commentsthat improved this paper.

*Trademark or registered trademark of International BusinessMachines Corporation.

**Trademark or registered trademark of Lotus Development Cor-poration, Dow Jones & Company, Inc., Microsoft Corporation,or Sun Microsystems, Inc.

Cited references

1. L. Prusak, “Introduction to Knowledge in Organizations,”L. Prusak, Editor, Knowledge in Organizations, Butterworth-Heinemann, Boston, MA (1997).

2. T. H. Davenport and L. Prusak, Working Knowledge: How Or-ganizations Manage What They Know, Harvard BusinessSchool Press, Boston, MA (1998).

3. A. D. Marwick, “Knowledge Management Technology,” IBMSystems Journal 40, No. 4, 814–830 (2001, this issue).

4. Yahoo!, Yahoo, Inc., http://www.yahoo.com (2001).5. M. Carroll, “Beneath the Vortals,” Web Techniques, 59–65

(February 2000).6. K.-T. Huang, “Capitalizing on Intellectual Assets,” IBM Sys-

tems Journal 37, No. 4, 570–582 (1998).7. IBM Global Services Wins GIGA Global Excellence Award

for Second Straight Year, IBM Corporation, http://www-4.ibm.com/software/data/knowledge/download/GIGApr.htm(2001).

8. Knowledge Management: The Mind of Many, Lotus De-velopment Corporation, http://www.lotus.com/home.nsf/welcome/km (2001).

9. Knowledge Discovery Server, Lotus Development Corpora-tion, http://www.lotus.com/home.nsf/welcome/discoveryserver(2001).

10. Corporate Portals: A Simple View of a Complex World, digitalWhite Paper, Plumtree Software, Inc., available fromhttp://www.plumtree.com (2001).

11. Autonomy Corporation, http://www.autonomy.com/autonomy/(2001).

12. Grand Central Station, http://www.research.ibm.com/topics/popups/smart/network/html/gcs.html, Almaden ResearchCenter, IBM Corporation (2001); (also http://www.research.ibm.com/resources/magazine/1997/issue_3/grandcentral397.html).


13. Garlic Project, Almaden Research Center, IBM Corporation,http://www.almaden.ibm.com/cs/garlic/ (2001).

14. E. Brown, “Parallel and Distributed IR,” Chapter 9, in Mod-ern Information Retrieval, R. Baeza-Yates and B. Ribeiro-Neto, Editors, ACM Press, New York (1999), pp. 229–256.

15. Intelligent Miner for Text, IBM Corporation, http://www-4.ibm.com/software/data/iminer/fortext/downloads.html (2001).

16. Talent: Text Analysis and Language Engineering Tools, IBMCorporation, http://www.research.ibm.com/irgroup/talent.html (2001).

17. J. W. Cooper and R. J. Byrd, “Lexical Navigation: VisuallyPrompted Query Expansion and Refinement,” Proceedingsof the 2nd ACM International Conference on Digital Libraries,Philadelphia, PA (July 25–28, 1997), pp. 237–246.

18. Y. Ravin and N. Wachholder, Extracting Names from Nat-ural Language Text, Research Report 20338, IBM Corpora-tion (1996); also available at: http://www.research.ibm.com/people/r/ravin.

19. “A New Computer Program Classifies Documents Automat-ically,” Almaden Research Center, IBM Corporation, http://www.research.ibm.com/resources/magazine/2000/number_2/selection200.html (2001).

20. Text and Information, Almaden Research Center, IBM Cor-poration, http://www.almaden.ibm.com/cs/k53/ir.html (2001).

21. R. Baeza-Yates and B. Ribeiro-Neto, Editors, Modern Infor-mation Retrieval, ACM Press, New York (1999).

22. An Introduction to Modern Information Retrieval, G. Saltonand M. McGill, Editors, McGraw-Hill, New York (1993).

23. G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Du-mais, “The Vocabulary Problem in Human-System Commu-nication,” Communications of the ACM 30, No. 11, 964–971(November 1987).

24. S. Dumais, “Textual Information Retrieval,” Handbook ofHuman-Computer Interaction, M. Helander, Editor, ElsevierScience Publishers, North-Holland (1988), pp. 673–700.

25. Y. Jing and W. B. Croft, “An Association Thesaurus for In-formation Retrieval,” Proceedings of RIAO’94 (Recherched’Informations Assistee par Ordinateur) (1994), pp. 146–160.

26. S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan,S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg,“Mining the Web’s Link Structure,” Computer 32, No. 8,60–67 (August 1999).

27. C. Boyer, “Building Community Online,” IBM Research 36,No. 4 (1998); also at http://www.research.ibm.com/resources/magazine/1998/issue_4/babble498.html (2001).

28. Google, Inc., http://www.google.com.29. J. Prager, E. Brown, A. Coden, and D. Radev, “Question-

Answering by Predictive Annotation,” Proceedings of SIGIR2000, Athens, Greece (2000).

30. C. Aggarwal, J. Wolf, K.-L. Wu, and P. Yu, “The IntelligentRecommendation Analyzer,” IEEE ICDCS 2000 Workshopon Knowledge Discovery and Data Mining (2000).

31. M. Ackerman and T. Malone, “Answer Garden: A Tool forGrowing Organizational Memory,” Proceedings of the ACMConference on Office Information Systems (1990), pp. 31–39.

32. Y. Aridor, D. Carmel, R. Lempel, A. Soffer, and Y. Maarek,“Knowledge Agents on the Web,” unpublished manuscript,IBM Corporation (2000).

33. T. F. Hand, “A Proposal for Task Based Evaluation of TextSummarization Systems,” Intelligent Scalable Text Summari-zation Proceedings of a Workshop Sponsored by the Associa-tion of Computational Linguistics (July 1997), pp. 31–38.

34. B. K. Boguraev and M. S. Neff, “Lexical Cohesion, DiscourseSegmentation and Document Summarization,” Proceedings

of RIAO’2000, (Recherche d’Informations Assistee parOrdinateur), Paris (April 12–14, 2000).

35. R. Y. Ando, B. Boguraev, R. Byrd, and M. Neff, “Multidoc-ument Summarization by Visualizing Topic Content,” Pro-ceedings of ANLP/NAACL Workshop on Automatic Summa-rization (2000), pp. 79–88.

36. Northern Light, Northern Light Technology, Inc., http://www.northernlight.com.

37. R. Byrd and Y. Ravin, “Identifying and Extracting Relationsin Text,” NLDB’99 Conference, Klagenfurt, Austria (1999).

38. J. Conklin, “Hypertext: An Introduction and Survey,” Com-puter-Supported Cooperative Work: A Book of Readings,I. Grief, Editor, Morgan Kaufmann Publishers, San Mateo,CA (1988), pp. 423–475.

39. M. S. Neff and J. W. Cooper, “ASHRAM: Active Summa-rization and Markup” (abstract), Proceedings of the 32nd An-nual Hawaii International Conference on Systems Science,Maui, HI (1999), p. 83.

40. S. Card, J. Mackinlay, and B. Shneiderman, Readings in In-formation Visualization: Using Vision to Think, Morgan Kauf-mann Publishers, San Francisco, CA (1999).

41. G. E. Bock, “Knowledge Management Frameworks,” Patri-cia Seybold’s Workgroup Computing Report 20, No. 2 (Feb-ruary 1997).

42. D. Tkach, IKM Technology: Knowledge Representation Effec-tiveness, unpublished White Paper, IBM Corporation (Oc-tober 7, 1998).

43. W. Geyer, S. Daijavad, T. Frauenhofer, H. Richter, K. Truong,L. Fuchs, and S. Poltrock, “Virtual Meeting Support inTeamSpace,” Demonstration at CSCW’2000, ACM Confer-ence on Computer-Supported Cooperative Work, Philadelphia,PA (December 2–6, 2000); see also IBM Research Web siteat http://www.research.ibm.com/teamspace.

44. T. Erickson, D. N. Smith, W. A. Kellogg, M. R. Laff, J. T.Richards, and E. Bradner, “Socially Translucent Systems: So-cial Proxies, Persistent Conversation, and the Design of ‘Bab-ble’,” Proceedings of Computer-Human Interaction 99, Pitts-burgh, PA (May 15–20, 1999), pp. 72–79.

45. J. C. Thomas, W. A. Kellogg, and T. Erickson, “The Knowl-edge Management Puzzle: Human and Social Factors inKnowledge Management,” IBM Systems Journal 40, No. 4,863–884 (2001, this issue).

46. J. Conklin and M. L. Begeman, “gIBIS: A Hypertext Toolfor Exploratory Policy Discussion,” Proceedings of the Con-ference on Computer-Supported Cooperative Work (CSCW’88)(1988), pp. 140–152.

47. Corporate Portals: Case Studies and White Papers, IBM Cor-poration, http://www-4.ibm.com/software/data/knowledge/corporate/details.html (2001).

48. S. Hantler, personal e-mail communication (February 14,2001).

49. T. Nasukawa and T. Nagano, “Text Analysis and KnowledgeMining System,” IBM Systems Journal 40, No. 4, 967–984(2001, this issue).

50. M. Morohashi, K. Takeda, H. Nomiyama, and H. Maruyama,“Information Outlining—Filling the Gap Between Visualiza-tion and Navigation in Digital Libraries,” Proceedings of theInternational Symposium on Digital Libraries (1995), pp. 151–158.

51. E. Tufte, Visual Explanations: Images and Quantities, Evidenceand Narrative, Graphics Press, Cheshire, CT (1997).

52. J. Cadiz, A. Gupta, and J. Grudin, “Using Web Annotationsfor Asynchronous Collaboration Around Documents,”CSCW’2000, ACM Conference on Computer-Supported Co-operative Work, Philadelphia, PA (December 2–6, 2000).


53. I. Ovsiannokov, M. Arbib, and T. McNeill, “Annotation Tech-nology,” International Journal of Human-Computer Studies 50,329–362 (1999).

54. S. Robertson and K. Reese, “A Virtual Library for BuildingCommunity and Sharing Knowledge,” International Journalof Human-Computer Studies 51, 663–685 (1999).

55. D. Ferrucci, “Knowledge Management Through InteractiveDocument Configuration,” The Second International Confer-ence and Exhibition on the Practical Application of KnowledgeManagement (PAKeM99), London (April 21–April 23, 1999);see also http://www.practical-applications.co.uk/PAKeM99/.

56. Sametime–Real-Time Collaboration That’s Fit for Business,Lotus Development Corporation, http://www.lotus.com/home.nsf/welcome/sametime.

57. Computer-Supported Cooperative Work: A Book of Readings,I. Grief, Editor, Morgan Kaufmann Publishers, San Mateo,CA (1988).

58. D. Carmel, Y. Maarek, and A. Soffer, “XML and Informa-tion Retrieval: A SIGIR 2000 Workshop,” ACM Conferenceon Information Retrieval, SIGIR 2000 (July 28, 2000).

59. D. Platt, “Web Services: Building Reusable Web Componentswith SOAP and ASP.NET,” MSDN Magazine, 100–114 (Feb-ruary 2001).

60. T. A. Stewart, Intellectual Capital: The New Wealth of Orga-nizations, Currency and Doubleday Publishing, New York(1997).

61. J. W. Cooper and J. M. Prager, “Anti-Serendipity: FindingUseless Documents and Similar Documents” (abstract), Pro-ceedings of the 33rd Hawaii International Conference on Sys-tems Science, Maui, HI (2000), p. 57.

62. C. Schmandt, N. Marmasse, S. Marti, N. Sawhney, andS. Wheeler, “Everywhere Messaging,” IBM Systems Journal39, Nos. 3&4, 660–677 (2000).

63. M. Viveros and D. Wood, unpublished project reports on mo-bility, knowledge management, and collaboration, IBM Cor-poration.

64. E. W. Brown, S. Srinivasan, A. Coden, D. Ponceleon, J. W.Cooper, and A. Amir, “Toward Speech as a Knowledge Re-source,” IBM Systems Journal 40, No. 4, 985–1001 (2001, thisissue).

65. R. Cowan, internal presentation, IBM Corporation (April 17,2001).

66. R. Schank and H. Abelson, Scripts, Plans, Goals and Under-standing, Erlbaum, Hillsdale, NJ (1977).

67. J. Sowa, Conceptual Structures: Information Processing in Mindand Machine, Addison-Wesley Publishing Co., Reading, MA(1984).

68. A. S. Gordon, “The Representational Requirements of Stra-tegic Planning,” Fifth Symposium on Logical Formalizationof Commonsense Reasoning, New York University, New York(May 20–22, 2001).

69. F. Flores, M. Graves, B. Hartfield, and T. Winograd, “Com-puter Systems and the Design of Organizational Interaction,”ACM Transactions on Office Information Systems 6, No. 2, 153–172 (April 1988).

70. R. Medina-Mora, T. Winograd, R. Flores, and F. Flores, “TheAction Workflow Approach to Workflow Management Tech-nology,” ACM 1992 Conference on Computer-Supported Co-operative Work, Toronto (October 31–November 4, 1992), pp.281–297.

71. S. Elo Dean and L. Weitzman, “SuperNews: Multiple Feedsfor Multiple Views,” IBM Systems Journal 39, Nos. 3&4, 633–645 (2000).

72. Atomica Web service, Atomica Corporation, http://www.atomica.com (2001).

73. Active Calendar, Almaden Research Center, IBM Corpora-tion, http://www.almaden.ibm.com/cs/activecal.html (2001).

74. B. Shneiderman, User Interface Design: Strategies for EffectiveHuman-Computer Interaction, Addison-Wesley PublishingCo., Reading, MA (1999).

75. P. Maes, “Agents That Reduce Work and Information Over-load,” Readings in Human-Computer Interaction: Toward theYear 2000, 2nd Edition, R. M. Baecker, J. Grudin, W. A. S.Buxton, and S. Greenberg, Editors, Morgan Kaufmann Pub-lishers, San Francisco, CA (1995), pp. 811–821.

76. K. Abbot and S. Sarin, “Experience with Workflow Manage-ment: Issues for the Next Generation,” Proceedings of theACM Conference on Computer-Supported Cooperative Work,Chapel Hill, NC (October 22–26, 1994), pp. 113–120.

77. K. Lai, T. Malone, and K.-C. Yu, “Object Lens: A Spread-sheet for Cooperative Work,” ACM Transactions on OfficeInformation Systems 6, No. 4, 1–26 (1990).

78. T. Winograd, “A Language/Action Perspective on the De-sign of Cooperative Work,” Human Computer Interaction 3,3–30 (1987).

79. R. G. Smith and A. Farquhar, “The Road Ahead for Knowl-edge Management: An AI Perspective,” AI Magazine (Amer-ican Association for Artificial Intelligence), 17–40 (Winter2000).

80. Handbook of Human-Computer Interaction, M. Helander, Ed-itor, Elsevier Science Publishers, North-Holland (1988).

81. H. Beyer and K. Holtzblatt, Contextual Design: A Customer-Centered Approach to System Design, Morgan Kaufmann Pub-lishers, San Francisco, CA (1997).

82. J. Grudin, “Groupware and Social Dynamics: Eight Chal-lenges for Developers,” Readings in Human-Computer Inter-action: Toward the Year 2000, 2nd Edition, R. M. Baecker,J. Grudin, W. A. S. Buxton, and S. Greenberg, Editors, Mor-gan Kaufmann Publishers, San Francisco, CA (1995), pp. 762–774.

83. R. M. Baecker, D. Nastos, I. Posner, and K. Mawby, “TheUser-Centered Iterative Design of Collaborative Writing Soft-ware,” Readings in Human-Computer Interaction: Toward theYear 2000, 2nd Edition, R. M. Baecker, J. Grudin, W. A. S.Buxton, and S. Greenberg, Editors, Morgan Kaufmann Pub-lishers, San Francisco, CA (1995), pp. 775–782.

Accepted for publication June 1, 2001.

Robert Mack IBM Research Division, Thomas J. Watson ResearchCenter, P.O. Box 704, Yorktown Heights, New York 10598 (elec-tronic mail: [email protected]). Dr. Mack is manager ofthe Information Design and Access Group. He joined IBM Re-search in 1981 in a postdoctoral position, and then as a researchstaff member, after completing his Ph.D. in cognitive and exper-imental psychology at The University of Michigan in 1981. Hereceived a B.A. degree in physics from Oakland University andan M.A. degree in experimental psychology from Michigan StateUniversity. He has worked on a wide range of software systems,both as a user interface and human-computer interaction spe-cialist, and as a project lead for prototype systems and middle-ware for applications relating to digital libraries, digital video ar-chive and search, and knowledge portals. All of these projectshave involved collaboration with and technology transfer to theIBM Software Group or IBM Global Services. Dr. Mack and histeam are currently working on projects related to using user taskcontext to enhance the search function, and applying text miningto support knowledge discovery in the life sciences.


Yael Ravin IBM Research Division, Thomas J. Watson ResearchCenter, P.O. Box 704, Yorktown Heights, New York 10598 (elec-tronic mail: [email protected]). Dr. Ravin is manager of theKnowledge Structures Group. She received a B.A. degree in En-glish literature and philosophy from the Hebrew University inJerusalem (cum laude), an M.A. degree in teaching English asa second language from Teachers’ College, Columbia University,and a Ph.D. degree in linguistics from the Graduate Center ofthe City University of New York. Since then, she has been work-ing as a research staff member at the T. J. Watson Research Cen-ter on a variety of natural language processing and informationretrieval projects. Dr. Ravin created the IBM named-entity rec-ognizer and led its development and transfer into a product (In-telligent Miner for Text). Other projects include word-sense dis-ambiguation, linguistic support for IR, and relation extractionfrom text. Currently, she manages a question-answering project.

Roy J. Byrd IBM Research Division, Thomas J. Watson ResearchCenter, P.O. Box 704, Yorktown Heights, New York 10598 (elec-tronic mail: [email protected]). Mr. Byrd joined IBM in 1968and has done development and research on programming lan-guages and compilers, database query systems, object-orientedprogramming languages, natural language processing, informa-tion retrieval, and text analysis. From 1969 to 1974, he partic-ipated in the development of several IBM products incorpor-ating compilers and interpreters for the BASIC and PL/Iprogramming languages. From 1974 to 1976, he worked in theformer IBM Advanced Systems Development Division on theEQUAL natural language query system for relational databases.He joined the T. J. Watson Research Center in 1976. Until 1981,he worked on the System for Business Automation project, whichproduced the Query-by-Example product, and which built a mes-sage-passing object-oriented programming system based on ac-tor theory. Since 1982, he has done computational linguistics re-search on the analysis and construction of lexicons for naturallanguage processing and on techniques for text analysis in largedocument collections. Mr. Byrd manages the Text Analysis andLanguage Engineering group in the Knowledge ManagementTechnologies department at IBM Research. The group has con-tributed to several IBM products, including DB2� Text Extenderand Intelligent Miner for Text. His current research interests in-clude information extraction from large text corpora, using fastnatural language processing techniques. He received his B.A. de-gree in theory and composition of music from Yale Universityin 1967 and is a Ph.D. candidate in linguistics at New York Uni-versity.


Knowledge portals and the emerging digital knowledge workplace

Documents

Transcript of Knowledge portals and the emerging digital knowledge workplace