Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother? Better...

6
Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother? Better information finding Query text and relational data together Query metadata and unstructured data together Bring structure to unstructured data Enterprise search of web sites, email, … Better analysis Leverage “semantics” from unstructured context Derive further dimensions from unstructured data Add precision to search Compliance, call center performance, … NOT transactional apps Unstructured => uncertain

Transcript of Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother? Better...

Page 1: Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother?  Better information finding – Query text and relational data together.

Almaden Research Center

© 2003 IBM Corporation

Structured + Unstructured: Why Bother?

Better information finding

– Query text and relational data together

– Query metadata and unstructured data together

– Bring structure to unstructured data

– Enterprise search of web sites, email, …

Better analysis

– Leverage “semantics” from unstructured context

– Derive further dimensions from unstructured data

– Add precision to search

– Compliance, call center performance, …

NOT transactional apps

– Unstructured => uncertain

Page 2: Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother?  Better information finding – Query text and relational data together.

Almaden Research Center

© 2003 IBM Corporation

The Structure-Precision Plane

Structured UnstructuredDATA

QU

ER

IES

Pre

cise

Impr

ecis

e

Relational databases with SQL queries

Information retrieval

systems with free text search

Text Analytics(uncertain annotations)

Page 3: Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother?  Better information finding – Query text and relational data together.

Almaden Research Center

© 2003 IBM Corporation

The Structure-Precision Plane

Structured UnstructuredDATA

QU

ER

IES

Pre

cise

Impr

ecis

e

Relational databases with SQL queries

Information retrieval

systems with free text search

Query Imprecision

Interpret keyword queries(uncertainty in user intent)

Page 4: Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother?  Better information finding – Query text and relational data together.

Almaden Research Center

© 2003 IBM Corporation

Integrated Search

Traditional interpretation Return documents that contain the keywords “paper”, “295”, “contact” and phoneKeyword Search

Return paper #295 contact name from pubs db and

find the contact’s phone number from emails

True user intent could be:

Imprecise query with multiple possible interpretations over data

from multiple sources

Paper

Contact Email

295

413

321

Beineke

Kossman

Miller

[email protected]

[email protected]

[email protected]

Paper 295 contact phone

Page 5: Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother?  Better information finding – Query text and relational data together.

Almaden Research Center

© 2003 IBM Corporation

Business Intelligence in CRM

SPOKE WITH MIKE IN SVC AT ACME CHEVY. HE ADVISED THAT THEY HAD ADDED SPRINGS TO REAR OF VEHICLE, NOW HAS A CALL INTO DPSM BILL HARROLD TO REVIEW WITH HIM BEEFING UP THE FRONT SUSPENSION. STATES HE CANNOT TELL IF CUST IS OVERLOADING VEH AS THEY DO NOT HAVE SCALES TO WEIGH …………………………………… ……, CUST YELLING AND SCREAMING. WHEN ADVISED THAT DPSM IS WAITING ON INFORMATION FROM DLRSHP TO MAKE DECISION ON REPAIRS. CUST STATES HE TOOK VEH INTO DLR 3 DAYS AGO AND DLR TEST DROVE VEHICLE WITH CUST AND AGREED THAT VEHICLE WAS DANGEROUS TO DRIVE. CUST ALSO ANGRY THAT HE HAS CALLED SVC MGR, Jack Green AT ACME2 DLRSHP AND NO ONE WILL RETURN HIS CALLS. CUST REQUESTED LOANER VEHICLE UNTIL HIS VEHICLE IS REPAIRED. DENIED LOANER, WHICH ALSO SEVERLY UPSET CUST, CUST STATES HE HAS BEEN COMPLAINING ABOUT THIS SINCE VEH WAS NEW AND HIS USE OF VEHICLE IS LIMITED AND CUST FEELS

What is the number of angry calls by Dealer and Model of Car ?

Structured Attribute Model: Malibu

Precise query over annotatedinherently uncertain data

Text-enabling the data-warehouse to answer aggregate queries such as:

Page 6: Almaden Research Center © 2003 IBM Corporation Structured + Unstructured: Why Bother?  Better information finding – Query text and relational data together.

Almaden Research Center

© 2003 IBM Corporation

Database Management

System

Application

Information Intensive Solutions

Storage Management

Traditional View

Emerging View

Today’s View

Application Application

Federated

System

“Semantic” Query

System

Federated Access Crawl and Index

Annotate