Open Data and News Analytics Demo

19
Open Data and News Analytics Demo 4th Open Data & Linked Data meetup, Sofia, Bulgaria, 29 Mar 2016

Transcript of Open Data and News Analytics Demo

Page 1: Open Data and News Analytics Demo

Open Data and News Analytics Demo 4th Open Data & Linked Data meetup, Sofia, Bulgaria, 29 Mar 2016

Page 2: Open Data and News Analytics Demo

Quick news-analytics case

Open Data and News Analytics

• Our Dynamic Semantic

Publishing platform

already offers linking

of text with big open

data graphs

• One can get navigate

from text to concepts,

get trends, related

entities and news

• Try it at

http://now.ontotext.com

Page 3: Open Data and News Analytics Demo

The Web of Linked Data in 2007

Mar 2016 Open Data & News Analytics #3

structured database

version of Wikipedia

database of all

locations on Earth

product

reviews

semantic synonym

dictionary

Note: Each bubble represents a dataset.

Arrows represent mappings across datasets; e.g. dbpedia:Paris owl:sameAs geo:2988507

Page 4: Open Data and News Analytics Demo

The Web of Data is Gaining Mass (2011)

Mar 2016 Open Data & News Analytics #4

Page 5: Open Data and News Analytics Demo

The Web of Linked Data is Gaining Mass

Mar 2016 Open Data & News Analytics #5

• 2013 stats: 2 289 public datasets − http://stats.lod2.eu/

• Growing exponentially − see the dotted trend line

• Structured markup − Schema.org; semantic SEO

• Enables better semantic tagging! − As there are more concepts and

richer descriptions to refer to

27 43 89 162295

822

2,289

2007 2008 2009 2010 2011 2012 2013

Linked Data Datasets

Page 6: Open Data and News Analytics Demo

The FactForge Data

• DBpedia (the English version only): 496M statements

• Geonames: 150M statements − SameAs links between DBpedia and Geonames: 471K statements

• NOW data – metadata about news: 128M statements

• Total size: 938М statements − 656M explicit statements + 281M inferred statements

− RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-region constraints

• Available at http://ff-news.ontotext.com

Open Data and News Analytics

Page 7: Open Data and News Analytics Demo

News Metadata

• Metadata from Ontotext’s Dynamic Semantic Publishing platform − Automatically generated as part of the NOW.ontotext.com semantic news showcase

• News corpus from Google since Feb 2015, about 10k news/month

• ~70 tags (annotations) per news article

• Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases

Open Data and News Analytics

Page 8: Open Data and News Analytics Demo

Class Hierarchy Map (by number of instances)

Open Data and News Analytics

Left: The big picture Right: dbo:Agent class (2.7M organizations and persons)

Page 9: Open Data and News Analytics Demo

Query: Big Cities in Eastern Europe # benefits from inference over transitive gn:parentFeature

# benefits from owl:sameAs mapping between DBPedia and Geonames

PREFIX dbr: <http://dbpedia.org/resource/>

PREFIX onto: <http://www.ontotext.com/>

PREFIX gn: <http://www.geonames.org/ontology#>

PREFIX dbo: <http://dbpedia.org/ontology/>

select *

from onto:disable-sameAs

where {

?loc gn:parentFeature dbr:Eastern_Europe ; gn:featureClass gn:P.

?loc dbo:populationTotal ?population ; dbo:country ?country .

FILTER(?population > 300000 )

} order by ?country

Open Data and News Analytics

Page 10: Open Data and News Analytics Demo

Query: People and Organizations related to Google

# benefits from inference over transitive dbo:parent

# RDFRank makes it easy to see the “top suspects” in a list of 93 entities

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>

PREFIX dbr: <http://dbpedia.org/resource/>

select distinct ?related_entity ?rank

where {

BIND (dbr:Google as ?entity)

{ ?related_entity a dbo:Person ; ?p ?entity . } UNION

{ ?related_entity a dbo:Organisation ; dbo:parent ?entity . }

?related_entity rank:hasRDFRank ?rank

} order by desc(?rank)

Open Data and News Analytics

Page 11: Open Data and News Analytics Demo

Query: Airports near London

# GraphDB’s geo-spatial plug-in allows efficient evaluation of near-by

# RDFRank brings the top 6 passanger airports at the top of a list of 80

PREFIX dbr: <http://dbpedia.org/resource/>

PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>

PREFIX gdb-geo: <http://www.ontotext.com/owlim/geo#>

PREFIX dbo: <http://dbpedia.org/ontology/>

PREFIX gdb: <http://www.ontotext.com/owlim/>

SELECT distinct ?airport ?rrank

WHERE {

{ SELECT * { dbr:London geo-pos:lat ?lat ; geo-pos:long ?long . } LIMIT 10 }

?airport gdb-geo:nearby(?lat ?long "50mi");

a dbo:Airport ;

gdb:hasRDFRank ?rrank .

} ORDER BY DESC(?rrank)

Open Data and News Analytics

Page 12: Open Data and News Analytics Demo

Semantic Press-Clipping

• We can trace references to a specific company in the news − This is pretty much standard, however we can deal with syntactic variations in the names, because state

of the art Named Entity Recognition technology is used

− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)

• We can trace and consolidate references to daughter companies

• We have comprehensive industry classification − The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.

company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)

Mar FF-NEWS short demo

Page 13: Open Data and News Analytics Demo

Query: News Mentioning an IBM

# technical example to demonstrate how news metadata can be accessed

PREFIX pub-old: <http://ontology.ontotext.com/publishing#>

PREFIX pub: <http://ontology.ontotext.com/taxonomy/>

PREFIX dbr: <http://dbpedia.org/resource/>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

select distinct ?news ?title ?date ?pub_entity

where {

?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .

?pub_entity pub:exactMatch dbr:IBM .

?news pub-old:creationDate ?date; pub-old:title ?title .

FILTER ( (?date > "2015-10-01T00:02:00Z"^^xsd:dateTime) &&

(?date < "2015-11-01T00:02:00Z"^^xsd:dateTime))

} limit 100

Open Data and News Analytics

Page 14: Open Data and News Analytics Demo

Query: News Mentioning Gazprom and Its Related Entities

# benefits from inference over transitive dbo:parent relation and mappings to it

select distinct ?news ?title ?date ?related_entity

where {

{ select distinct ?related_entity {

BIND (dbr:Gazprom as ?entity)

{ ?related_entity a dbo:Person ; ?p ?entity .

FILTER NOT EXISTS { ?related_entity dbo:club ?entity } } UNION

{ ?related_entity a dbo:Organisation ; dbo:parent ?entity . } UNION

{ BIND(?entity as ?related_entity) }

} }

?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .

?pub_entity pub:exactMatch ?related_entity .

?news pub-old:creationDate ?date; pub-old:title ?title .

} order by desc(?date) limit 1000

Open Data and News Analytics

Page 15: Open Data and News Analytics Demo

Query: Most Popular in the News, including children

# benefits from mapping and consolidation of industry classifications

select distinct ?parent (count(?news) as ?news_count)

where {

{ select distinct ?parent ?entity {

BIND(dbr:Automotive as ?industry)

?industry ff-map:industryVariant ?industryVar .

?parent dbo:industry ?industryVar .

?parent a dbo:Company .

FILTER NOT EXISTS { ?parent dbo:parent / dbo:industry / ff-map:industryVariant ?industry }

{ ?entity dbo:parent ?parent . } UNION

{ BIND(?parent as ?entity) }

} }

?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .

?pub_entity pub:exactMatch ?entity .

?news pub-old:creationDate ?date .

} group by ?parent order by desc(?news_count)

Open Data and News Analytics

Page 16: Open Data and News Analytics Demo

News Popularity Ranking: Automotive

Open Data and News Analytics

Rank Company News # Rank Company incl. mentions of controlled News #

1 General Motors 2722 1 General Motors 4620

2 Tesla Motors 2346 2 Volkswagen Group 3999

3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658

4 Ford Motor Company 1934 4 Tesla Motors 2370

5 Toyota 1325 5 Ford Motor Company 2125

6 Chevrolet 1264 6 Toyota 1656

7 Chrysler 1054 7 Renault-Nissan Alliance 1332

8 Fiat Chrysler Automobiles 1011 8 Honda 864

9 Audi AG 972 9 BMW 715

10 Honda 717 10 Takata Corporation 547

Page 17: Open Data and News Analytics Demo

News Popularity: Finance

Open Data and News Analytics

Rank Company News # Rank Company incl. mentions of controlled News #

1 Bloomberg L.P. 3203 1 China Merchants Bank 40940

2 Goldman Sachs 1992 2 Alphabet Inc. 24219

3 JP Morgan Chase 1712 3 Capital Group Companies 4379

4 Wells Fargo 1688 4 Bloomberg L.P. 3893

5 Citigroup 1557 5 Exor (company) 2775

6 HSBC Holdings 1546 6 JP Morgan Chase 2715

7 Deutsche Bank 1414 7 Nasdaq, Inc. 2178

8 Bank of America 1335 8 Oaktree Capital Management 1757

9 Barclays 1260 9 Goldman Sachs 1085

10 UBS 694 10 Sentinel Capital Partners 1064

Note: Including investment funds, stock exchanges, agencies, etc.

Page 18: Open Data and News Analytics Demo

News Popularity: Banking

Open Data and News Analytics

Rank Company News # Rank Company incl. mentions of controlled News #

1 Goldman Sachs 996 1 China Merchants Bank * 38288

2 JP Morgan Chase 856 2 JP Morgan Chase 1972

3 HSBC Holdings 773 3 Goldman Sachs 1030

4 Deutsche Bank 707 4 HSBC 966

5 Barclays 630 5 Bank of America 771

6 Citigroup 519 6 Deutsche Bank 742

7 Bank of America 445 7 Barclays 681

8 Wells Fargo 422 8 Citigroup 630

9 UBS 347 9 Wells Fargo 428

10 Chase 126 10 UBS 347

Note: including investment funds, stock exchanges, agencies, etc.

Page 19: Open Data and News Analytics Demo

Thank you!

Experience the technology with NOW: Semantic News Portal

http://now.ontotext.com

Start using GraphDB and text-mining with S4 in the cloud

http://s4.ontotext.com

Learn more at our website or simply get in touch

[email protected], @ontotext

Open Data and News Analytics