Knowledge Discovery in Databases and Information … · Web viewTitle Knowledge Discovery in...

Knowledge Discovery in Databases and Information Retrieval

In

Knowledge Management Systems

Anne Marie Donovan

April 22, 2003

Knowledge Management Systems, LIS 385T

The University of Texas at Austin

School of Information

KDD and IR 2

Introduction

The processes of Knowledge Discovery in Databases (KDD) and Information

Retrieval (IR) appear deceptively simple when viewed from the perspective of

terminological definition. Fayyad, Piatetsky-Shapiro, and Smith (1996) define KDD as

"the nontrivial process of identifying valid, novel, potentially useful, and ultimately

understandable patterns in data" (p. 30). The closely related process of IR is defined by

Rocha (2001) as "the methods and processes for searching relevant information out of

information systems that contain extremely large numbers of documents" (1.1). In

execution, however, these processes are not simple at all, especially when executed to

satisfy specific personal or organizational Knowledge Management (KM) requirements

or as the core functionality of Knowledge Management Systems (KMS).

The potential validity or usefulness of an individual data element or pattern of

data elements may change dramatically from individual to individual, organization to

organization, or task to task. Relevance is a highly contextual and personal data

characteristic, changing even as the IR process is underway and information requirements

are incrementally met. Making retrieved data or a description of data patterns generally

understandable is also highly problematic. Data that may appear relevant and easily

understandable in one retrieval context may be completely unintelligible in another, even

to the same audience. KDD and IR are, in fact, highly complex processes that are

strongly affected by a wide range of factors. These factors include the needs and

information seeking characteristics of system users as well as the tools and methods used

to search and retrieve, the structure and size of the data set or database, and the nature of

the data itself.

KDD and IR 3

KDD and IR: An Historical Perspective

Origins

Information professionals often describe the KDD and IR processes in the context

of specific types of Database Management Systems (DBMS). Devarakonda (2001)

divides DBMS into four types: simple data without query, simple data with query,

complex data without query, and complex data with query. An example of the first type,

simple data without query, is a filing system, including files that may exist only in paper

form. The second, third and fourth types are exemplified by Relational DBMS

(RDBMS), Object-Oriented DBMS (OODBMS), and Object-Relational DBMS

(ORDBMS), respectively (Devarakonda, 2001, ORDBMS). The type of database that is

queried significantly affects the processes of knowledge discovery (KD) and IR.

Because an RDBMS of some type forms the core of almost all KMS,

improvement of RDBMS functionality for KD and IR has been a crucial part of KMS

refinement for the past three decades. The relatively recent introduction of OODBMS to

KMS has created many new KD and IR problem sets for researchers. These challenges

have been met, thus far, primarily through the introduction of certain features of RDBMS

to OODBMS. The result has been the development of a small group of ORDBMS that

combine the best KD and IR features of RDBMS and OODBMS (ORDBMS).

Information professionals familiar with traditional filing systems are acutely

aware of the limitations imposed on KD and IR by their pre-set filing structure. Although

technically a database, this type of DBMS does not lend itself to automated searching, but

KDD and IR 4

only to browsing or search by pre-designated subject categories and file descriptions

(e.g., library card catalogs). The difficulties presented for KD and IR by simple filing

structures were initially replicated in computer-supported file structures and were only

alleviated with the introduction of the Relational Database Model (RDM), by E. F. Codd

in 1970 (Devarakonda, 2002, RDBMS).

Introduction of the RDM resulted in rapid adoption of RDMS for information

organization and control across a broad range of commercial and social organizations as

well as the development of increasingly effective data collection and storage

technologies. RDBMS permitted much more flexibility in data organization and retrieval

than traditional data filing systems, but traditional IR methods did not permit flexibility in

the characterization of user needs or the delineation of search parameters (Rocha, 2001,

1.2). The result, of course, was increasing numbers of organizations that possessed very

large and continually growing databases but only rudimentary tools for KD and IR. Two

areas of research focus in information management developed in response to this

problem: data warehousing and data mining.

Data warehousing, defined by Fayyad et al. as "collecting and 'cleaning'

transactional data to make it available for online analysis and decision support" (2001, p.

30), focuses on the methodical collection and pre-processing of data for specific

analytical uses. The data is subject-oriented, time-stamped, and integrated to permit

interactive analysis in support of decision-making processes. A data warehouse normally

integrates data from a variety of sources, "thus enriching the data and broadening the

context and value of the information" (Rauber et al., 2002, Data Warehousing and

OLAP).

KDD and IR 5

Data mining, defined as "the application of specific algorithms to a data set for the

purpose of extracting data patterns" (p. 28), focuses on improving the utility of large data

sets as well as IR response. Data mining, in particular the algorithms used in data

mining, has received a lion's share of attention in the development of Decision Support

Systems (DSS) and RDMS research because results are often immediately applicable in

high-payoff decision-making industries such as insurance, sales, and financial and

medical services.

Inspirations and Intentions for the Technology

Rocha describes the ultimate goal of IR as the production or recommendation of

relevant information to users (2001, 1.2). We can ascribe the same motivation to the

development of KDD systems and methods in general, particularly in regards to the

refinement of DBMS. Research in data collection, storage, and retrieval has focused on

issues specifically related to the improvement of KD and IR functionality. Among the

topics given special attention have been data translation, change detection, integration,

duplication, summarization, aggregation, and timeliness (Widom, 1995).

Research has also focused on the need to improve automation in KD and IR,

especially in the areas of data selection and pre-processing, data transformation, and data

interpretation and evaluation (Fayyad et al., 1996, p. 28). However, increased automation

in KD and IR requires increased attention to the methods used for data collection and

storage as well as the statistical foundations of the search and retrieval processes (p. 29).

Despite this complication, however, it is clear that manual analysis of billions of records

and hundreds of fields is impractical and that automated data handling will be even more

KDD and IR 6

in demand as requirements for on-the-fly analysis and more flexible presentation of

search results increase (p. 28).

KDD and IR: Application to KMS

Technological Systems and Processes

Interface, interaction, and ubiquity. The relationship of KDD and IR to KMS is

intimate: all KMS rely in some form on the aggregation of data for search and retrieval.

Historically, improvements in the utility of KMS have depended in large part on

improvements in KDD and IR functionality. Fayyad et al. describe KDD as "the overall

process of knowledge discovery from data, including how the data is stored and accessed,

how algorithms can be scaled to massive data sets and still run efficiently, how results

can be interpreted and visualized, and how the overall human-machine interaction can be

modeled and supported” (Fayyad et al., 1996, p. 29). This comprehensive list of KDD

processes, which encompasses IR, also serves to describe the core functionality of most

KMS (pp. 30-31). Research issues that have arisen in the development of DBMS and the

study of KDD are also closely related to the development and deployment of KMS.

Among these are: data collection and pre-processing; continually increasing volumes of

data; increasingly complex forms of data; identifying and extracting useful knowledge

from extremely large repositories; means for identifying knowledge of value about as

well as in the data set; extracting knowledge from data and presenting that knowledge in

usable forms (pp. 30-31).

The development of highly specialized DBMS for data warehousing and the

continual refinement of data mining methods and technologies have been motivated in

KDD and IR 7

large part by the deployment of KMS throughout industry. Many KMS are simply

elaborated RDBMS integrated with IR and communication systems. More sophisticated

KMS may also add collaborative work tools. Decisions related to data mining, including

model functions, model representation, and preference criterion are an elemental part of

KMS development and deployment (pp. 31-32). Data mining tasks (classification,

forecasting, clustering, description, deviation detection, link analysis, and visualization

(Piatetsky-Shapiro, 1998, Slide 17) and search algorithms are fundamentally affected by

the focus and purpose of an organization's KMS.

System architecture. The characteristics of the underlying DBMS

determine the architecture of KD and IR systems. RDBMS are composed of many

relations in the form of two-dimensional tables of rows and columns containing related

tuples. The rows (tuples) are called records, and the columns (fields in the record) are

called attributes. Each column is accorded a specific data type. The type of data stored in

an RDBMS has traditionally been constrained to ensure that there are no ambiguous

tuples in the database (Devarakonda , 2002, RDBMS) although in the case of very

complex data types, for example scientific data, programmers have overcome the

constraints of the DBM by employing Binary Large Objects (BLOBs) to store data in a

database. This "solution" creates its own set of problems, however. BLOBs are usually

much larger than a single block of storage in a database, a characteristic that undermines

the efficiency of the database. As well, because of their size, and because BLOBs in a

single database may contain a variety of data types and compound data, the data content

of the BLOB is not visible to the database. The opacity of data content means that a user

KDD and IR 8

cannot perform a high-level search across the BLOBs in a database (Wallace, Benschop,

and Köhntopp, 1999).

RDBMS use Structured Query Language (SQL) for data definition, modification,

querying and constraint specification. Queries can range from simple single-table queries

to complicated multi-table queries. A commonly used RDBMS is Microsoft Access, but

the existence of a standard query language allows data to be migrated easily from one

RDBMS to another (Devarakonda , 2002, RDBMS). Although the structure of RDBMS

renders them incapable of handling complex data types such as spatial data, images, or

number arrays without the use of BLOBs, it does permit rapid data access and large

storage capacities.

The data management limitations of RDBMS led to the development of

OODBMS. In OODBMS, internal data structure is hidden so that external operations can

be performed on the data as an Abstract Data Type (ADT). RDBMS and OODBMS are

fundamentally different in the way they handle data relationships; OODBMS represent

relationships explicitly, which improves data access performance. Nonetheless,

OODBMS are plagued by poor query performance and problems of database scalability

(Devarakonda, 2002, OODBMS).

ORDBMS, a relatively recent innovation, are designed to incorporate the best

features of both RDBMS and OODBMS. Data is stored in tables, but some entries may

have richer data structure; as in OODBMS, these entries are called ADTs. Because the

data is stored in rows and columns, the ORDBMS maintains a relational data model,

although it must be heavily modified to support object-oriented programming. In

essence, the object-relational model adds a new object-oriented layer to support rich data

KDD and IR 9

types on top of the relational database model. ORDBMS support query and handle data

objects; the can also be built on a massive scale. These features make ORDBMS

particularly useful for the development of KMS for handling complex data types.

System configuration and deployment. A primary concern for many organizations

during the configuration and deployment of KD and IR systems has been the creation of

data and query context. Some efforts to create context have been retrospective. Lee and

Hwang (2002) describe the process of extracting and visualizing semantic metadata from

databases. This process, called relational database reverse engineering (RDRE), “extracts

a conceptual model from an existing relational database by analyzing data instances as

well as metadata” (Lee and Hwang, 2002, Conclusion). RDRE has been especially useful

in creating shared "conceptual schema" for multiple databases (Introduction). A

conceptual schema describes the database in terms of data items and relationships

between data items in a form "suitable for human presentation" (Introduction) thereby

enhancing KD and IR. The ability to discover and describe data relationships within and

between databases allows organizations to profile and map information in their data

warehouses in ways that were previously unimaginable. Mapping and profiling of data

not only creates discovery and retrieval context to enhance data reuse, it can also reveal

entirely new uses possibilities. A well-defined database reengineering project enables an

organization to integrate the masses of transactional data that lies in its data warehouse

with information collected from other enterprise systems or from outside the company.

Another common method for creating data and query context for enterprise data

warehouses is the establishment of mechanisms for creating context during data creation

and collection or during query construction. Many personal KMS provide robust

KDD and IR 10

mechanisms for data contextualization through the addition of metadata or by data

structuring. KMS such as PeopleGarden (Xiong and Donath, 1999) extract social

context for data during the processes of data collection and data exchange. Extending IR

throughout the social network of an organization, as is done by Answer Garden

(Ackerman, 1994, Ackerman, and Malone, 1990, & Ackerman and MacDonald, 1996) is

another method for providing query context for KD and IR.

Technology transition in organizations. Institutions that have pioneered the use of

KDD and IR, especially in the form of data mining, have traditionally been those that rely

heavily on knowledge-based decisions for their success. Because their operations have

historically relied heavily on data collection, these organizations normally have a large

quantity of accessible, relevant, historical and current data. They also anticipate a high

payoff for making rapid, correct decisions based on their collected data and they actively

seek a technological advantage in knowledge management. Financial institutions such as

banking and investment firms, healthcare and insurance organizations, and businesses

that rely heavily on marketing and customer relations are emblematic of sectors that have

aggressively pursued technological innovations in KD and IR (Piatetsky-Shapiro, 1998,

Slides 28-31).

The development of Decision Support Systems (DSS) based on electronic data

processing (EDP) was an early application of database technology to KM in large

enterprises. In many cases, however, technological strides in data collection (hardware

and software) rapidly outpaced the enterprises' ability to understand and manage the data

that was being collected and stored. Information was often plentiful without being

KDD and IR 11

relevant and extensive data warehouses often proved inadequate for applied decision

making (Bass, 1983, p. 189).

Another difficulty faced by organizations that relied on large data bases for

decision support was the danger that decisions would be made based on data that was

poorly contextualized or poorly understood. Managers faced with a complex decision

process might misinterpret the applicability of a data set to the problem or fail to

investigate the existence of contradictory data (Calvert, 1993, p. 91). The less contextual

the data, the more easily it may be misinterpreted or misapplied.

Organizational Systems and Processes

The introduction of automated KDD and IR changed the fundamental nature of

knowledge work, organizational architectures, management practices, and

communication flows in organizations. The introduction of Web-served data collection,

query and delivery has also significantly affected these systems. In particular, the

expansive application of KDD and IR technologies and techniques to information

management for distributed or "flattened" organizations has resulted in KM becoming a

ubiquitous "industrial" product in many business sectors.

Two aspects of knowledge work profoundly affected by the pervasive use of

KDD and IR technologies have been knowledge creation and communication in the

context of collaboration. The enhancement of collaborative possibilities in knowledge

work created by distributed KDD and IR has had significant social affects in

organizations and among individuals. The problem of creating shared context for data

collection, retrieval, and delivery in distributed DBMS has already been mentioned.

KDD and IR 12

Equally difficult are the incitement of collaboration and the creation of networks of trust

among the dispersed users of distributed DBMS.

The creation of massive, increasingly powerful DBMS and more effective KDD

and IR technologies and techniques has also raised many complex social issues outside

business processes. One significant social concern is the increasingly pervasive

collection of detailed individual data that enabled by sophisticated DBMS. Many

individuals enjoy the convenience offered by the maintenance of personal information in

commercial databases, but are unaware of the privacy implications inherent in the

services these databases enable. Many individuals are faced with a daily choice:

convenience and service or security and privacy?

KDD and IR: Looking to the Future

KDD and IR research problems

The demands of commercial KM markets drive the lifecycle of KD and IR

systems. The creation of highly dimensional, massive data sets and the increasing

sophistication of users and complexity of database uses have directed KDD research in

specific directions. High priority research topics include: problems of statistical

significance and missing data; the understandability of data patterns; the management of

changing data and data integration; and the manipulation of non-standard, multi-media,

and object oriented data (Fayyad, Piatetsky-Shapiro, & Smyth, 1996, pp. 33-34).

Research and development in IR is equally market driven. In 1995, Croft

published a "top ten" list of IR research issues based on his experiences in the area of

industrial and government research priorities as a member of the National Science

Foundation (NSF) Center for Intelligent Information Retrieval (CIIR) (¶ 3). These

KDD and IR 13

research priorities, derived from surveys of companies that use and sell IR systems, still

resonate today:

1. Integrated solutions (standardized architectures and common platforms; the

integration of database management and IR systems with multimedia capabilities)

2. Distributed IR (retrieval systems that can work in distributed, wide-area

network environments)

3. Efficient, flexible indexing and retrieval (including ability to handle a wide

variety of data formats)

4. Automatic query expansion (To overcome vocabulary mismatch between users

and databases

5. Interfaces and browsing (Interfaces that support a range of functions including

query formulation, presentation of retrieved information, feedback, and browsing

in a conceptually simple way)

6. Routing and filtering (many companies considered data routing to be the main

function required for a text-based DBMS, with IR being a secondary function)

7. Effective retrieval (companies are particularly interested in techniques that

produce significant improvements in precision but still avoid occasional major

retrieval mistakes)

8. Multimedia retrieval (techniques for accessing image, video and sound

databases without text descriptions)

9. Information extraction (techniques to identify database entities, attributes and

relationships in full text)

KDD and IR 14

10. Relevance feedback (improved algorithms and models for automatic relevance

feedback) (Croft, 1995)

New developments

KD and IR problems for Web resources. The rapid growth of the Web and

increasing reliance on the Web for the collection and delivery of data for KM has created

new problems in KD and IR as well as bringing some older problems to the fore. Among

the problems are: standardization of data collection and pre-processing; huge volumes

of continually changing data; complex, streaming, and multi-media data; identifying and

extracting useful knowledge from Web resources; a lack of consistent data models and

context; a lack of available descriptive information; the problem of presenting knowledge

in usable forms; and the rapid development of more time-sensitive, multi-media

applications for Web resources.

Many of these problems reflect the inadequacy of current methods for Web

resource KD and IR. Data collection is presently performed primarily by automated Web

crawlers. Pre-processing consists of link-based ranking or human indexing and

categorization. The identification and extraction of useful knowledge from Web

resources is dependent on highly inefficient keyword searches on natural language text or

on imprecise topical directories or topical Web sites. Retrieved knowledge can be

viewed only in its native format (with a plugin) or sometimes only as derived HTML.

A variety of research and development projects are underway to enable more

efficient, automated KD and IR for Web resources. Among the best known efforts are

those that seek to apply semantic markup to Web resources to enable machine

understanding and processing and inference analysis. Related projects seek to develop

KDD and IR 15

intelligent search engines and agents to exploit the semantic statements created by this

markup, while still others are creating ontologies to provide context for these search

engines and agents (Shah et. al., 2002)

Other researchers are examining improved methods for automated data and

context collection (data pre-processing), the provision of value-added services such as

query routing, the development of integrated query and knowledge delivery systems, and

the establishment of social accounting metrics to provide context for humans (Smith,

2002, p. 52). Another major area of research focuses on leveraging historical information

about individual and group Web browsing experience and patterns to enable more

efficient KD and IR (Chakrabarti et al., 1998, Abstract). Rauber et al. (2002) provide an

evocative description of the potential for enhanced KD and IR that is as yet untapped,

"With [such] a repository of Web data, as well as the metadata associated with the

documents and domains, we have a powerful source of information that goes beyond the

content of Web pages …. in order for the most useful analyses to yield answers to project

questions and issues, a different perspective of the Web and Web archives is needed, a

perspective focusing not solely on content, but on the wealth of information

automatically associated with each object on the Web" (Introduction). Capturing an

understanding of how other individuals have discovered, retrieved, and used Web content

provides invaluable context for users who are accessing that content for the first time.

Integration with Other Technologies

Enhanced presentation for the Web. The need for better integration of KDD and

IR systems with delivery and presentation technologies has already been mentioned and it

KDD and IR 16

is a need that cannot be overstated. This is particularly true in the case of information

presentation on the Web. Considerable research is underway in the area of reformatting

data for discovery and presentation through Web-enabled devices. Another area of

research focus is differentiated service for different devices that would enable variable

visualization of retrieved information depending on a user's needs and device

characteristics. Researchers in the field of adaptive graphics, "a unifying framework that

allows visual representations of information to be customized and mixed together into

new ones” have proposed content pre-viewing, interactive content, selective presentation,

and customized views of Web-served content (Boier-Martin, 2003, pp. 6-9) as areas ripe

for progressive research. Many of these researchers refer to the work of Turner Whitted

who in 1998 suggested the use of computer displays as "wallpaper" for interactive

information exchange to enable pervasive collaboration and information retrieval (1999,

p. 6).

KDD and IR for pervasive computing. Achieving what Cherniack, Franklin, and

Zdonik term “ubiquitous data access” (2001, slide 7) presents several unique challenges

in system integration. Many of these challenges reflect data management problems.

Among these are: the resolution of context-dependent data (e.g., push/data pull delivery

issues); synchronization of data from multiple, distributed sensors and collectors; the

efficient renewal of data streams; effecting profile-driven data management; dealing with

location aware, mobile devices; and the enabling of service mobility and service

discovery (slides 8-27).

KDD and IR 17

The next generation

Research trends and priorities suggest a number of substantial advances in next

generation KDD and IR systems. We can expect them to enable the solving of business

problems, not data analysis problems. They will embed knowledge discovery engines

and integrate access to enterprise and external data on the back-end. Moreover, most

importantly, they will integrate the knowledge discovery process with knowledge

delivery tools (Piatetsky-Shapiro, 1998, Slide 7). We can also expect next generation

KDD and IR systems to manage information retrieval contextually, allow contextual

query/continuous query, enable KD in virtual networks of peer-to-peer databases, and

interpolate or extrapolate for missing data (Cherniack et. al., 2001, slides 115-138).

To enable mobile and pervasive computing applications, future KDD and IR

systems will also have to be able to characterize information resources, recognize

individual users, provide variable means to exchange knowledge between users and

information sources (push and pull of information), adapt to the user community, and

enable the reuse and recombination of information as well as its exchange (Rocha, 2001,

1.2). The most fundamental and difficult of these challenges will be information

characterization.

Conclusion: On the Bleeding Edge

One might reasonably ask if the KDD and IR systems described above fall in the

realm of science or science fiction. The answer is, assuredly, in the realm of science,

although science fiction has often been influential in application development. This

answer is supported by a brief examination of the KDD and IR research being funded by

the Defense Advanced Research Projects Agency (DARPA) (the folks who brought us

KDD and IR 18

the Internet) under the auspices of the federal Total Information Awareness (TIA)

Program. This research covers substantially new database technologies, architectures,

population techniques, search algorithms, and data models.

One funded project, Genisys, has the goal of producing technology to enable

ultra-large, all-source information repositories (DARPA, 2003b, Program Strategy).

Unlike RDBMS in use today, Genisys-developed DBMS will require no prior data

modeling; support automated restructuring and projection of data; store data in context of

time and space; and develop a large, distributed system architecture for managing a huge

volume of raw data input, analysis results, and feedback (DARPA, 2003a, TIA System:

Program Strategy). Programs such as Genisys are building aggressively on a foundation

of 30 years of research in KDD and IR technology and techniques. Although these

initiatives raise new social as well as technical problems, they also suggest the possibility

of substantially new applications for these technologies.

The difficulties of contextualizing and interpreting data for KM have increased

many-fold in the past decade. New technologies for data collection and storage have led

to ever-larger data warehouses containing hugely complex data types-- a development

that has greatly complicated data discovery, retrieval, visualization, and sharing within

organizations. A growing need to incorporate increasingly disparate data sources from

outside the organization has transformed enterprise KM from a cluster of internal

management problems into a problem set that also encompasses an organization's

relationships with clients and competitors, as well as its ability to participate in lucrative

cooperative ventures. Enterprises now seek to use information technology to support not

just individual problem solving, but entire decision making processes.

KDD and IR 19

KD and IR have become tools that not only enhance human decision-making but

that also compensate for inherent weaknesses in human decision making processes. The

result has been the development of powerful new EDP applications in knowledge

discovery, KM, and enterprise decision making, especially in the areas of collaborative

ventures, market forecasting, the management of customer relations, and fraud or crime

detection. If these technologies are to progress even further, however, researchers must

deal with the essential task of describing (characterizing) our growing wealth of

information resources (online and offline). Only when we are able to visualize

meaningfully the vast extent of our available information resources will we be able to

develop new approaches to KD and IR. The fundamental problems in KM today relate to

our inability to find and understand the information we already possess, not to an

inability to collect and manipulate new data. It is in the development of better KD and IR

tools that the future of KM and KMS lie.

KDD and IR 20

References

Ackerman, M. S. (1998, July). Augmenting the organizational memory: A field study of Answer Garden. ACM Transactions on Information Systems, 16(3), 203-204. Retrieved March 28, 2003 from http://doi.acm.org/10.1145/290159.290160

Ackerman, M. S., & Malone, T. W. (1990, April). Answer Garden: A tool for growing

organizational memory. ACM SIGOIS Bulletin, 11(2-3), 31-39. Retrieved March 28, 2003 from http:// doi . acm .org/10.1145/91474.91485

Ackerman, M. S., & McDonald, D. W. (1996). Answer Garden 2: Merging organizational memory with collaborative help. Proceedings of the ACM Conference on Computer-Supported Cooperative Work 1996 (CSCW96 Boston, MA). Retrieved March 28, 2003 from http:// doi . acm .org/10.1145/240080.240203

Bass, B. M. (1983). Organizational decision making. In L. L. Cummins, E. Kirby Warren, & J. F. Mee (Eds.), The Irwin series in management and the behavioral sciences. Homewood, IL: Richard D. Irwin.

Boier-Martin, I. M.. (2003, January/February). Adaptive graphics. In T. Rhyne (Ed.) Visualization Viewpoints, IEEE Computer Graphics and Application, 23(1), 6-10. Retrieved April 5, 2003 from http://www.research.ibm.com/people/i/imartin/papers/visviewpoints.pdf

Calvert, G. (1993). Highwire management: Risk-taking tactics for leaders, innovators, and trailblazers. San Francisco, CA: Jossey-Bass Publishers.

Chakrabarti, S., Srivastava, S., Subramanyam, M., & Tiware, M. (1998). Using Memex to archive and mine community Web browsing experience. A paper presented at the 9th International World Wide Web Conference, Amsterdam, May 15-19, 2000. Retrieved April 12, 2003 from http://www9.org/w9cdrom/98/98.html

Croft, W. B. (1995, November). What do people want from information retrieval?: The

top 10 research issues for companies that use and sell IR systems. D-Lib Magazine. Retrieved April 5, 2003 from http://sunsite.anu.edu.au/mirrors/dlib/dlib/november95/11croft.html

DARPA. (2003a). Genysis. Retrieved from the DARPA Information Awareness Office Web site at: http://www.darpa.mil/iao/Genisys.htm

DARPA. (2003b). Total Information Awareness System. Retrieved from the DARPA Information Awareness Office Web site at: http://www.darpa.mil/iao/TIASystems.htm

http://www.darpa.mil/iao/TIASystems.htm

http://www.darpa.mil/iao/Genisys.htm

http://sunsite.anu.edu.au/mirrors/dlib/dlib/november95/11croft.html

http://www9.org/w9cdrom/98/98.html

http://www.research.ibm.com/people/i/imartin/papers/visviewpoints.pdf

http://doi.acm.org/10.1145/240080.240203

http://doi.acm.org/10.1145/240080.240203

http://doi.acm.org/10.1145/240080.240203

http://doi.acm.org/10.1145/240080.240203

http://doi.acm.org/10.1145/240080.240203

http://doi.acm.org/10.1145/91474.91485

http://doi.acm.org/10.1145/91474.91485

http://doi.acm.org/10.1145/91474.91485

http://doi.acm.org/10.1145/91474.91485

http://doi.acm.org/10.1145/91474.91485

http://doi.acm.org/10.1145/290159.290160

KDD and IR 21

Devarakonda, R. (2001, March). Object-relational database systems - The road ahead. ACM Crossroads Student Magazine. Retrieved April 12, 2003 from www.acm.org/crossroads/xrds7-3/ordbms.html

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996, November). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27-34. Retrieved March 03, 2003 from http://wwwhome.cs.utwente.nl/~mpoel/colleges/dwdm/ACM_artikelen/fayyad2.pdf

Lee, D., & Hwang, Y. (2002, March 1). Extracting semantic metadata and its visualization. ACM Crossroads Student Magazine. Retrieved March 27, 2003 from www.acm.org/crossroads/xrds7-3/smeva.html

Piatetsky-Shapiro, G. (1998, December 4). Data mining and knowledge discovery tools: The next generation. Retrieved February 27, 2003 from kdnuggets.com at http://www.kdnuggets.com/gpspubs/dama-nextgen-98/index.htm

Rauber, A., Aschenbrenner, A., Witvoet, O., Bruckner, R. M., & Kaiser, M. (2002, December). Uncovering information hidden in Web archives: A glimpse at Web analysis building on data warehouses. D-Lib Magazine, 8(12). Retrieved March 28, 2003 from http://www.dlib.org/dlib/december02/rauber/12rauber.html

Rocha, L. M. (2001). TalkMine: A soft computing approach to adaptive knowledge recommendation [Electronic version]. In V. Loia & S. Sessa (Eds.), Studies in fuzziness and soft computing: Vol. 75. Soft computing agents: New trends for designing autonomous systems. (pp. 89-116). New York: Springer. Retrieved March 28, 2003 from http://www.c3.lanl.gov/~rocha/softagents.html

Shah, U., Finin, T., Joshi, A., Cost, R. S., & Mayfield, J. (2002, November). Information retrieval on the Semantic Web. Paper presented at The ACM Conference on Information and Knowledge Management , November 2002. Retrieved March 28, 2003 from http://www.csee.umbc.edu/~finin/papers/cikm02/cikm02.pdf

Smith, M. (2002). Tools for navigating large social cyberspaces. Communications of the ACM, 45(4), 51-55. Retrieved March 28, 2003 from http://delivery.acm.org/10.1145/510000/505272/p51-smith.html?key1=505272&key2=5541680501&coll=GUIDE&dl=GUIDE&CFID=9914049&CFTOKEN=12943474

Wallace, N., Benschop, O., & Köhntopp, K. (1999). What is a BLOB? php.faqts. Retrieved May 1, 2003 from http://www.faqts.com/knowledge_base/view.phtml/aid/416/fid/876

Whitted, T. (1999, July/August). Draw on the Wall. IEEE Computer Graphics and Applications, 19(4), 6-9. Retrieved April 8, 2003 from ieeeexplore.ieee.org at:

http://www.faqts.com/knowledge_base/view.phtml/aid/416/fid/876

http://delivery.acm.org/10.1145/510000/505272/p51-smith.html?key1=505272&key2=5541680501&coll=GUIDE&dl=GUIDE&CFID=9914049&CFTOKEN=12943474



http://www.csee.umbc.edu/~finin/papers/cikm02/cikm02.pdf

http://www.c3.lanl.gov/~rocha/softagents.html

http://www.dlib.org/dlib/december02/rauber/12rauber.html

http://www.kdnuggets.com/gpspubs/dama-nextgen-98/index.htm

http://www.acm.org/crossroads/xrds7-3/smeva.html

http://wwwhome.cs.utwente.nl/~mpoel/colleges/dwdm/ACM_artikelen/fayyad2.pdf

http://wwwhome.cs.utwente.nl/~mpoel/colleges/dwdm/ACM_artikelen/fayyad2.pdf

http://www.acm.org/crossroads/xrds7-3/ordbms.html

KDD and IR 22

http://ieeexplore.ieee.org/iel5/38/16795/00773957.pdf?isNumber=16795&arnumber=773957&prod=JNL&arSt=6&ared=9&arAuthor=Whitted%2C+T.

Widom, J. (1995, November). Research problems in data warehousing. Proceedings of the 4th International Conference on Information and Knowledge Management (CIKM). Retrieved March 28, 2003 from http://www.ischool.utexas.edu/~i385tkms/readings/Widom-1995-ResearchProblems.pdf

Xiong, R., & Donath, J. (1999). PeopleGarden: Creating data portraits for users. CHI

Letters, 1(1), 37-44. Retrieved April 8, 2003 from http://smg.media.mit.edu/papers/Xiong/pgarden_uist99.pdf

http://smg.media.mit.edu/papers/Xiong/pgarden_uist99.pdf

http://www.ischool.utexas.edu/~i385tkms/readings/Widom-1995-ResearchProblems.pdf

http://www.ischool.utexas.edu/~i385tkms/readings/Widom-1995-ResearchProblems.pdf




Knowledge Discovery in Databases and Information … · Web viewTitle Knowledge Discovery in...

Documents

Transcript of Knowledge Discovery in Databases and Information … · Web viewTitle Knowledge Discovery in...