Using New Technologies to Make Sense of Content Chaos: Text mining and visualization
-
Upload
km-chicago -
Category
Business
-
view
661 -
download
5
description
Transcript of Using New Technologies to Make Sense of Content Chaos: Text mining and visualization
![Page 1: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/1.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Using New Technologiesto Make Sense of Content Chaos:
Text mining and visualization
Glenn FannickProduct Development Manager
12 December 2005
![Page 2: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/2.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
No longer “information overload” …
… we’re awash in “content chaos”.
![Page 3: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/3.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
How difficult is it to find…?
• thought leaders in an industry
•newly hired CEOs who’ve commented on wifi
•which of your products are written about most often
•most mentioned people near Oracle
•most prolific journalists in an industry
•how much of your press coverage is negative
![Page 4: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/4.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Three Causes of Chaos
• Blogs Mean Everyone’s a Publisher
• ‘Markets are Conversations’
• More dynamic news cycles
![Page 5: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/5.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Everyone’s a Publisher
•Feb: Decided to break with long-
time public support for anti-
discrimination legislation.
•Apr 21: Local press coverage
spurred Microsoft employee
bloggers to speak out.
•May 6: Steve Ballmer reverses
Microsoft’s stance.
Cause #1
![Page 6: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/6.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Markets are Conversations
• Savvy consumers are not trusting of
corporate marketing.
• On the Web, people tell each other
their opinions about products and
companies.
• The most reliable information comes
from peers.
• Companies must participate in the
conversation or risk irrelevance.
Cause #2
![Page 7: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/7.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
maytag2004 2005 2006 2007 20082003200220012000 2001 20022000199919981997 2003 2004 2005
MayOctober
Cause #2
![Page 8: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/8.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Shrinking News Cycle
•Newspapers continue to wane in influence
•Radio long-ago filled the role of the evening newspaper.
•Web now fills the role of the morning newspaper.
•Pushing newspapers into the analysis role formerly filled
by the newsweeklies.
•News is reported 24 / 7
•Web editions
•Citizen journalists
Cause #3
![Page 9: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/9.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Managing the Chaos
• People need answers, not documents
• Trends must be discovered early
• Going beyond search
![Page 10: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/10.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
People need answers, not documents
•Articles 1-100 of about 2,343,000
•Spend more time analyzing, less time looking
•We must continue to push technology toward a point
where it can provide us facts and answers, not
headlines and links.
ActDecideAnalyzeSearch/GatherIdentify
ActDecideAnalyzeFind/DiscoverIdentify
No
wG
oal
![Page 11: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/11.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Trends must be discovered early
• Identify the waves before
they break on shore.
Principle #3
![Page 12: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/12.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Hurricane RitaGoldman SachsFlorida KeysJohn RobertsOil Prices
Using technology to power serendipity
Facts gleaned from across an entire day’s news can visually summarize an industry.
Extracted entities, phrases and events can direct users to the top newsmakers of the day.
![Page 13: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/13.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
How To Get There
![Page 14: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/14.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
How To Get There: Text Mining
Phase 1 | Classification / Taxonomy
•Metadata tags what an article is about
Phase 2 | Entity Extraction
•Extracting the billions of facts and entities stored in
millions of documents
Phase 3 | Ontological Search
• searching for concepts
![Page 15: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/15.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
text mining – n., a process of extracting information from unstructured text, drawing on practices from information retrieval, data mining, machine learning, computational linguistics and statistics.
![Page 16: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/16.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Company Codes
Ind
ust
ries
Regio
ns
Su
bje
cts
FII
Technology Editorial Experts
Unstructured Text
1. Document Classification
Meta
data
![Page 17: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/17.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Unstructured Text
People
Products
companies
events
authors
Meta
data
Docu
men
t le
vel
Sente
nce
le
vel
2. Entity Extraction
Meta
data
Technology Editorial Experts
Company Codes
Ind
ust
ries
Regio
ns
Su
bje
cts
FII
En
titi
es
Com
pan
ies
Peop
le
Bra
nd
s
Rela
tion
sh
ips
Even
ts
Au
thors
![Page 18: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/18.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Article receives company code for:
T-Mobile USA
But there are other companies involved
And captures news subjects and industry.
And people and authors
And brands and products
Extracting More Value from Documents
And quotations
And regions
![Page 19: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/19.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Today’s Search
![Page 20: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/20.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Articles containing executive appointments
List of people and companies found in relationship to executive appointments
Ontological Search
![Page 21: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/21.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Dec
Jan
Feb
Mar
Ap
r
May
Jun
Jul
Au
g
Sep Oct
David Sifry [45]
Robert Scoble [30]
Sergey Brin [23]
John Battelle [19]
Mena Trott [1]
People
Re-Engineering Search Results
Concept
Screen
Related companies and subjects provide: filtering, navigation and discovery.
Previous dates can be navigated.
Publications can act as filters.
People and phrases can be discovered.
![Page 22: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/22.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Making Sense of Content Chaos
Factiva Insight: Reputation Intelligence
![Page 23: Using New Technologies to Make Sense of Content Chaos: Text mining and visualization](https://reader033.fdocuments.in/reader033/viewer/2022060109/55530aaab4c9054e3f8b4d3b/html5/thumbnails/23.jpg)
© 2003 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved.
Questions ?
Glenn FannickProduct Development Manager
fannick.blogspot.com