Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

54
August 27, 2002 Data Mining and Text-base d Information - Mark Wass on 1 Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis [email protected] August 27, 2002

description

Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis [email protected] August 27, 2002. The Agenda. Knowledge Discovery, Data Mining, Text Mining From Free Text to Structured Metadata Knowledge Discovery and Data Mining in Text - PowerPoint PPT Presentation

Transcript of Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

Page 1: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

1

Data Mining and Text-based Information

Mark WassonSenior Architect, Research Scientist

LexisNexis

[email protected]

August 27, 2002

Page 2: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

2

• Knowledge Discovery, Data Mining, Text Mining

• From Free Text to Structured Metadata

• Knowledge Discovery and Data Mining in Text

• The Forecast for Data Mining and Text

• Information Sources and Links

The Agenda

Page 3: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

3

Knowledge Discovery, Data Mining, Text Mining

Page 4: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

4

• Knowledge discovery in databases (KDD) is defined as “the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.”

• Stated another way, KDD is the process of applying scaled, optimized statistical processes to large quantities of structured data in order to help users discover new, potentially interesting patterns and information in that data.

What is Knowledge Discovery?

Page 5: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

5

• Find trends and patterns in current data in order to support predictions or classification as new data comes in

• Explain existing data, not just describe it• Summarize the contents in a large database to

facilitate decision making• Support “logical” (as opposed to graphical) data

visualization to support end users

What Folks Do With KDD

Page 6: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

6

• Business trends and financial instrument forecasting (e.g., predict the stock market)

• Fraud detection• Merchandise handling and placement• Finding hidden relationships between entities• Credit worthiness evaluation and loan approvals• Marketing and sales data analysis• Recommender systems• Customer Relationship Management (CRM)• Bioinformatics (e.g., in silico drug discovery)• Defect identification and tracking

What Folks Really Do With KDD

Page 7: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

7

• Understand application domain; determine goals• Create target dataset for analysis and discovery• Clean data for noise, missing values, etc.• Perform data reduction• Choose best data mining method to meet goals• Choose best data mining algorithm for method• Conduct data mining, i.e., apply the algorithm• Review results (novel? interesting?); redo steps

if necessary• Consolidate discovered knowledge

Can be fully automated, but often highly interactive

The 9-step KDD Process

Page 8: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

8

• A synonym for Knowledge Discovery• The statistical/analytical processing within the

KDD process

What is Data Mining? (classic def’n)

Page 9: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

9

• Online Analytical Processing (OLAP)• Information Retrieval• Finding and extracting proper names and other

pieces of information in a text• Document categorization and indexing• Simple descriptive statistics (e.g., average, mean,

median)

These tools do help find potentially interesting existing information, but not discover new information.– Not necessarily new just because it’s new to you

What Isn’t Data Mining (classic def’n)

Page 10: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

10

• With the emergence of successful data mining applications in the mid to late-1990s, everyone piled on to the term “data mining”

• Today “data mining” is widely used to label tools and processes that– Discover new, potentially interesting information– Find existing, potentially interesting information

• “Knowledge discovery” still specifically emphasizes discovery

What is Data Mining? (buzzword)

Page 11: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

11

• Text mining is the process of applying knowledge discovery and data mining techniques to information found in a collection of texts in order to help users discover new, potentially interesting patterns and information in that data.

• Combines information from multiple texts– What is in an individual text is known information

• Authors know what they write

What is Text Mining? (classic def’n)

Page 12: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

12

• Computational linguists have piled on, too!

• Today, “text mining” is widely used to label tools and processes that– Discover new, potentially interesting information in text

collections– Discover new, potentially interesting information in text-

based information– Find existing, potentially interesting information in text

and text collections• Information Retrieval

• Named Entity, Relationship and Information Extraction

• Categorization and Indexing

• Question Answering

What is Text Mining? (buzzword)

Page 13: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

13

• Not enough focus on the data– Collection– Cleansing– Scale– Completeness, including non-traditional sources– Structure

• Too much focus on algorithms

• The problem of Interestingness– What is interesting?– What isn’t?– How do we tell the difference?

Today’s Key KDD Problems

Page 14: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

14

• We’re dealing with text!– Text lacks structure that traditional data mining

processes can exploit– Information within text generally are not labeled– Actual and approximate synonymy– Ambiguity

• Contrast with Spreadsheets, Databases, Etc.– Well-defined structure– Row, column headings identify content

KDD and Text Problems

Page 15: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

15

Convert Information in Text to Metadata

How to “Fix” Text

Page 16: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

16

From Free Text to Structured Metadata

Page 17: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

17

• Metadata is data about data

• Content-based metadata is structured information that is somehow derived from the information content of a document rather than from the format of a document

• Key Benefit for Data Mining: Structured representation of content

• For our purposes references to “metadata” are references to content-based metadata

What is Metadata?

Page 18: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

18

• Standard Generalized Markup Language (SGML)– Meta-language for defining markup languages– Markup primarily used to support presentation

• Hypertext Markup Language (HTML) – SGML-based markup language for the web– Emphasis on structural elements of documents

• Extensible Markup Language (XML)– Meta-language for defining markup languages– Markup supports both presentation and

information/content identification– Ability to support information/content identification is

severely limited by our ability to process text for content

Markup Languages and Metadata

Page 19: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

19

• Publisher-provided fields– Publication name– Title– Author– Date– Dateline– Topic-indicating terms

• A list of all the words and phrases in a document– Simple list– List of unique words and phrases– Sets of related terms– Frequency information

Content-based Metadata

Page 20: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

20

• Specialized terms– Named entities (companies, people, places, etc.)– Citations, judges, attorneys, plaintiffs, defendants– Numerical information and monetary amounts– Noun phrases and their head nouns– Sentences

• Relationships– Items in close proximity– Subject-verb-object (agent-action-patient) relationships– Citation-based linkages– Coreference-based linkages

(John Smith left Microsoft. He joined IBM.)

Content-based Metadata

Page 21: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

21

• Content-indicating annotations– Controlled vocabulary indexing– Statistically interesting extracted terms– Abstracts, summaries– Specialized fields– Domain templates

Content-based Metadata

Page 22: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

22

• Search support (information finding)– Find and retrieve documents– Link to related documents

• Analysis support (information understanding)– Overall content summarization

• This has real value to information users– Link metadata to documents via good document IDs– Provide metadata to customers who can use it for

retrieval from their own search and analysis tools

Value of Content-based Metadata

Page 23: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

23

• Publisher-provided fields– Some basic standardization helps

• Simple term listing and counting– Generally easy, and quite good

• Finding Specialized Terms– Lots of good pattern recognition tools, including SRA’s

NetOwl, Inxight’s ThingFinder– Pattern recognition, lexicons do well for most

categories (literary titles, product names are hard)

Metadata Creation Technologies

Page 24: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

24

• Linguistics-based lexical tools– Morphological analysis, part of speech tagging– Inxight’s LinguistX

• Sentence boundary detection– Easily doable, but many need to consider more text

• Linguistics-based syntactic tools– Shallow parsing– Deep parsing– Coreference resolution– Varied text, difficult but progressing

Metadata Creation Technologies

Page 25: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

25

• Finding related items– Proximity, within sentence easy– Subject-verb-object/agent-action-patient requires some

degree of parsing– Coreference-based relationship finding requires

coreference resolution– SRA’s NetOwl– ClearForest’s rule books– Insightful’s InFact, SVO– Cymfony’s Brand Dashboard– Attensity, SVO– Alias I, coreference-based

Metadata Creation Technologies

Page 26: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

26

• Template-driven extraction– Often combines many technologies into domain-specific

applications– Clear Forest’s rule books– WhizBang (defunct, now Inxight?) machine learning-

based extraction– Various “web-farming” technologies, e.g., Caesius– University of Sheffield’s GATE tool kit

• Automatic abstracting/summarization– Leading text best for individual news documents– Columbia University’s NewsBlaster for multiple texts– True summary generation – a hard problem

Metadata Creation Technologies

Page 27: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

27

• Document categorization and indexing– 80% - 90% accurate (recall and precision) common– Often integrated with editorial processes– Inxight– Nstein– Stratify– Verity– A lot of others

Metadata Creation Technologies

Page 28: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

28

• Metadata creation technologies– Text mining?

• Read about them– Natural Language Processing for Online Applications –

Text Retrieval, Extraction and Categorization (John Benjamins Publishing Company, 2002)

Peter Jackson, Vice President of R&D, and

Isabelle Moulinier, Senior Research Scientist,

Thomson Legal & Regulatory

Metadata Creation Technologies

Page 29: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

29

Knowledge Discovery and Data Mining in Text

Page 30: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

30

• What is Knowledge Discovery in Metadata?(The term is unique to us, by the way; Ronen Feldman et al

called this Knowledge Discovery in Text)

• It is KDD that incorporates document metadata into its data collection step

Combining KDD and Metadata

Page 31: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

31

• Data source selection• Metadata creation, organization• Perhaps combine with other appropriate data

– Align data based on common attributes– Align data based on date or time– Use knowledge sources to guide analysis of metadata

(e.g., world knowledge, thesauri, etc.)

• Analyze the data– Language-aware processes, e.g., SVO– Routine processes that apply to structured content

Basic KDD Task Using Metadata

Page 32: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

32

• Does document metadata have value for KDD applications in addition to its value for information finding and retrieval purposes?

• If so, where?

Research Problems

Page 33: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

33

• Research at LexisNexis• Can daily “hot topics” be identified automatically

by comparing today’s indexing frequency for the topic to its recent history?– Track controlled vocabulary indexing assignments over

time to determine a historical average– Compare today’s frequency of assignment for a given

company’s index term to its historical average– If it exceeds some threshold, flag it as a “hot” company

in that day’s news– Analysts confirmed 96.2% of 1,137 flagged companies,

company pairs were in fact “hot”

See Shewhart & Wasson (1999)

Example 1 – Trend Analysis

Page 34: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

34

• Research at IBM• Can trends in emerging and fading technologies

be identified?– Extract, normalize and monitor vocabulary found in

documents and compare it to document categories– Provide users with a querying tool where they can

specify the “shape” of the trend– Used patent data

See Lent et al. (1997)

Example 2 – Emerging Technologies

Page 35: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

35

• Work at University of Massachusetts• Can specific news stories be identified that will

influence the behavior in financial markets?– Examine features of news articles that occurred before

interesting changes in the financial markets– Find patterns of features that regularly occur before

interesting changes– In future data, monitor incoming stories for those

patterns for alert purposes– Real-time data, real-time stock prices

See Lavrenko et al. (2000)

Example 3 - Influence of News Stories

Page 36: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

36

• Can citation histories be used to identify potential relationships between specific illnesses and other features, exposures, medications, etc.– Collect the citations in a large medical texts collection– Examine citation chains in pairs of domains that do not

directly cite one another– Measure the amount of overlap in the citation chain– Verify results through clinical medical research

See Swanson & Smalheiser (1996)

Example 4 - Citation Pattern Analysis

Page 37: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

37

• Work at Webmind (out of business)• Is the tone of news stories, Usenet discussions,

website stories, etc., about some company, its management or its products positive or negative? – Use categorization technology to determine the positive

or negative tone in individual documents about a given company or its products

– Combine results across all documents about that company or its products

– Compute a score or summarize the results

Example 5 - Sentiment Detection

Page 38: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

38

• Work at Hewlett Packard Laboratories• Can sets of genes be associated with given

diseases by analyzing MEDLINE abstracts?– Identify references to genes, addressing major

problems with recognition, ambiguity and synonymy in this domain

– Identify references to targeted diseases– Statistically analyze co-occurrence patterns between

mentions of the genes and mentions of diseases for statistically significant correlations

See Adamic et al. (2002)

Example 6 - Link Genes to Diseases

Page 39: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

39

• Analyzing the activities of a person, company or organization using its role as subject/agent or object/patient in clauses

• Predicting the spread between borrowing and lending interest rates

• Identifying technical traders in the T-bonds futures market

• Daily predictions of major stock indexes

Additional Examples

Page 40: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

40

• Alias I• Attensity• ClearForest• eNeuralNet• IBM (Intelligent Miner for Text)• Inforsense• Insightful (InFact)• Megaputer Intelligence• SAS (Enterprise Miner, Inxight)• SPSS (LexiQuest)

Data Mining and Text Vendors

Page 41: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

41

The Forecast for Data Mining and Text

Page 42: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

42

• Can we get information from unstructured (free) text into some structured format?

• Are there enough interesting KDD applications where access to content-based metadata from text actually produces interesting results?

• Does adding text-based information to existing data mining and knowledge discovery applications make them better?

What is the forecast for KDT?

Page 43: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

43

• A handful of interesting experiments published– Mostly one-off experiments– Almost no evidence any of it was commercialized

• Holding back the research– Almost no one had access to large quantities of

appropriate metadata for research purposes– Linguistics technologies still maturing, often too slow– Almost no one had the combination of content and tools

to generate large quantities of appropriate metadata for research purposes

KDT, 1996-1999

Page 44: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

44

• Movement. Early stages, but movement• Maturing, scaleable tools in classification and

extraction from web content and other texts to create metadata

• Products from the Big 3 analytical tool providers (SAS, SPSS, Insightful)

• Companies created to focus on it (not always successful), such as ClearForest, Webmind

• Emerging importance of bioinformatics, availability of MEDLINE content

• But data mining hit hard by dot-com collapse

KDT, 2000+

Page 45: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

45

• KDT is emerging, but slowly

• Still in early stages

• Lots of promise

The Forecast

Page 46: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

46

Information Sources and Links

Page 47: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

47

• KDnuggets, http://www.kdnuggets.com• ACM Special Interest Group in Knowledge

Discovery and Data Mining, http://www.acm.org/sigkdd/

• Association for Computational Linguistics, http://www.aclweb.org

• Data Mining and Knowledge Discovery (journal), Kluwer Academic Publishers, http://www.digimine.com/usama/datamine/

• Companies, http://www.kdnuggets.com/companies/

• Glossary of Terms, http://www3.shore.net/~kht/glossary.htm

Resources

Page 48: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

48

• The 3rd SIAM International Conference on Data Mining, May 1-3, 2003, San Francisco, CA http://www.siam.org/meetings/sdm03/

• 2003 North American Association for Computational Linguistics/Human Language Technology Joint Conference, approx. early June, 2003, Edmonton, AB

http://www.aclweb.org• The 9th ACM SIGKDD International Conference on

Knowledge Discovery and Data Mining, August 24-27, 2003, Washington, DC http://www.acm.org/sigkdd/kdd2003/

Related Technical Conferences

Page 49: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

49

• Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., & Uthurusamy, R. (1996). Advances in Knowledge Discovery and Data Mining. AAAI Press / The MIT Press.

• Jackson, P., & Moulinier, I. (2002). Natural Language Processing for Online Applications – Text Retrieval, Extraction and Categorization. John Benjamins Publishing Company.

Books

Page 50: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

50

Attensity, http://www.attensity.com

Alias I, http://www.alias-i.com

Caesius, http://www.caesius.com

ClearForest, http://www.clearforest.com

Columbia University, http://www.cs.columbia.edu/nlp/newsblaster/

Cymfony, http://www.cymfony.com

eNeuralNet, http://www.eneuralnet.com

Hewlett Packard Labs, http://www.hpl.hp.com/org/stl/dmsd/

IBM, http://www-3.ibm.com/software/data/iminer/

Company Links

Page 51: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

51

Inforsense, http://www.inforsense.com

Insightful, http://www.insightful.com

Inxight, http://www.inxight.com

John Benjamins Publishing, http://www.benjamins.com/cgi-bin/t_bookview.cgi?bookid=NLP_5

Megaputer Intelligence, http://www.megaputer.com

Nstein, http://www.nstein.com

SAS, http://www.sas.com

SPSS, http://www.spss.com

SRA International, http://www.sra.com

Company Links

Page 52: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

52

Stratify, http://www.stratify.com

University of Massachusetts-Amherst, http://ciir.cs.umass.edu/

University of Sheffield, http://gate.ac.uk/

Verity, http://www.verity.com

Company Links

Page 53: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

53

Adamic, L., Wilkinson, D., Huberman, B., & Adar, E. (2002). A Literature Based Method for Identifying Gene-Disease Connections. Proceedings of the 1st IEEE Computer Society Bioinformatics Conference.

Lavrenko, V., Schmill, M., Lawrie, D., Ogilvie, P., Jensen, D., & Allan, J. (2000). Language Models for Financial News Recommendation. Proceedings of the 9th International Conference on Information and Knowledge Management.

Lent, B., Agrawal, R., & Srikant, R. (1997). Discovering Trends in Text Databases. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining.

Shewhart, M., & Wasson, M. (1999). Monitoring Newsfeeds for “Hot Topics.” Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Swanson, D., & Smalheiser, N. (1996). Undiscovered Public Knowledge: A Ten-year Update. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.

Data Mining/Text References

Page 54: Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist LexisNexis

August 27, 2002 Data Mining and Text-based Information - Mark Wasson

54

Questions?

You can also contact me at

[email protected]