Open Data and News Analytics Demo
-
Upload
ontotext -
Category
Technology
-
view
334 -
download
4
Transcript of Open Data and News Analytics Demo
Open Data and News Analytics Demo 4th Open Data & Linked Data meetup, Sofia, Bulgaria, 29 Mar 2016
Quick news-analytics case
Open Data and News Analytics
• Our Dynamic Semantic
Publishing platform
already offers linking
of text with big open
data graphs
• One can get navigate
from text to concepts,
get trends, related
entities and news
• Try it at
http://now.ontotext.com
The Web of Linked Data in 2007
Mar 2016 Open Data & News Analytics #3
structured database
version of Wikipedia
database of all
locations on Earth
product
reviews
semantic synonym
dictionary
Note: Each bubble represents a dataset.
Arrows represent mappings across datasets; e.g. dbpedia:Paris owl:sameAs geo:2988507
The Web of Data is Gaining Mass (2011)
Mar 2016 Open Data & News Analytics #4
The Web of Linked Data is Gaining Mass
Mar 2016 Open Data & News Analytics #5
• 2013 stats: 2 289 public datasets − http://stats.lod2.eu/
• Growing exponentially − see the dotted trend line
• Structured markup − Schema.org; semantic SEO
• Enables better semantic tagging! − As there are more concepts and
richer descriptions to refer to
27 43 89 162295
822
2,289
2007 2008 2009 2010 2011 2012 2013
Linked Data Datasets
The FactForge Data
• DBpedia (the English version only): 496M statements
• Geonames: 150M statements − SameAs links between DBpedia and Geonames: 471K statements
• NOW data – metadata about news: 128M statements
• Total size: 938М statements − 656M explicit statements + 281M inferred statements
− RDFRank and geo-spatial indices enabled to allow for ranking and efficient geo-region constraints
• Available at http://ff-news.ontotext.com
Open Data and News Analytics
News Metadata
• Metadata from Ontotext’s Dynamic Semantic Publishing platform − Automatically generated as part of the NOW.ontotext.com semantic news showcase
• News corpus from Google since Feb 2015, about 10k news/month
• ~70 tags (annotations) per news article
• Tags link text mentions of concepts to the knowledge graph − Technically these are URIs for entities (people, organizations, locations, etc.) and key phrases
Open Data and News Analytics
Class Hierarchy Map (by number of instances)
Open Data and News Analytics
Left: The big picture Right: dbo:Agent class (2.7M organizations and persons)
Query: Big Cities in Eastern Europe # benefits from inference over transitive gn:parentFeature
# benefits from owl:sameAs mapping between DBPedia and Geonames
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX onto: <http://www.ontotext.com/>
PREFIX gn: <http://www.geonames.org/ontology#>
PREFIX dbo: <http://dbpedia.org/ontology/>
select *
from onto:disable-sameAs
where {
?loc gn:parentFeature dbr:Eastern_Europe ; gn:featureClass gn:P.
?loc dbo:populationTotal ?population ; dbo:country ?country .
FILTER(?population > 300000 )
} order by ?country
Open Data and News Analytics
Query: People and Organizations related to Google
# benefits from inference over transitive dbo:parent
# RDFRank makes it easy to see the “top suspects” in a list of 93 entities
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rank: <http://www.ontotext.com/owlim/RDFRank#>
PREFIX dbr: <http://dbpedia.org/resource/>
select distinct ?related_entity ?rank
where {
BIND (dbr:Google as ?entity)
{ ?related_entity a dbo:Person ; ?p ?entity . } UNION
{ ?related_entity a dbo:Organisation ; dbo:parent ?entity . }
?related_entity rank:hasRDFRank ?rank
} order by desc(?rank)
Open Data and News Analytics
Query: Airports near London
# GraphDB’s geo-spatial plug-in allows efficient evaluation of near-by
# RDFRank brings the top 6 passanger airports at the top of a list of 80
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX geo-pos: <http://www.w3.org/2003/01/geo/wgs84_pos#>
PREFIX gdb-geo: <http://www.ontotext.com/owlim/geo#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX gdb: <http://www.ontotext.com/owlim/>
SELECT distinct ?airport ?rrank
WHERE {
{ SELECT * { dbr:London geo-pos:lat ?lat ; geo-pos:long ?long . } LIMIT 10 }
?airport gdb-geo:nearby(?lat ?long "50mi");
a dbo:Airport ;
gdb:hasRDFRank ?rrank .
} ORDER BY DESC(?rrank)
Open Data and News Analytics
Semantic Press-Clipping
• We can trace references to a specific company in the news − This is pretty much standard, however we can deal with syntactic variations in the names, because state
of the art Named Entity Recognition technology is used
− What’s more important, we distinguish correctly in which mention “Paris” refers to which of the following: Paris (the capital of France), Paris in Texas, Paris Hilton or to Paris (the Greek hero)
• We can trace and consolidate references to daughter companies
• We have comprehensive industry classification − The one from DBPedia, but refined to accommodate identifier variations and specialization (e.g.
company classified as dbr:Bank will also be considered classified as dbr:FinancialServices)
Mar FF-NEWS short demo
Query: News Mentioning an IBM
# technical example to demonstrate how news metadata can be accessed
PREFIX pub-old: <http://ontology.ontotext.com/publishing#>
PREFIX pub: <http://ontology.ontotext.com/taxonomy/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
select distinct ?news ?title ?date ?pub_entity
where {
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch dbr:IBM .
?news pub-old:creationDate ?date; pub-old:title ?title .
FILTER ( (?date > "2015-10-01T00:02:00Z"^^xsd:dateTime) &&
(?date < "2015-11-01T00:02:00Z"^^xsd:dateTime))
} limit 100
Open Data and News Analytics
Query: News Mentioning Gazprom and Its Related Entities
# benefits from inference over transitive dbo:parent relation and mappings to it
select distinct ?news ?title ?date ?related_entity
where {
{ select distinct ?related_entity {
BIND (dbr:Gazprom as ?entity)
{ ?related_entity a dbo:Person ; ?p ?entity .
FILTER NOT EXISTS { ?related_entity dbo:club ?entity } } UNION
{ ?related_entity a dbo:Organisation ; dbo:parent ?entity . } UNION
{ BIND(?entity as ?related_entity) }
} }
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch ?related_entity .
?news pub-old:creationDate ?date; pub-old:title ?title .
} order by desc(?date) limit 1000
Open Data and News Analytics
Query: Most Popular in the News, including children
# benefits from mapping and consolidation of industry classifications
select distinct ?parent (count(?news) as ?news_count)
where {
{ select distinct ?parent ?entity {
BIND(dbr:Automotive as ?industry)
?industry ff-map:industryVariant ?industryVar .
?parent dbo:industry ?industryVar .
?parent a dbo:Company .
FILTER NOT EXISTS { ?parent dbo:parent / dbo:industry / ff-map:industryVariant ?industry }
{ ?entity dbo:parent ?parent . } UNION
{ BIND(?parent as ?entity) }
} }
?news pub-old:containsMention / pub-old:hasInstance ?pub_entity .
?pub_entity pub:exactMatch ?entity .
?news pub-old:creationDate ?date .
} group by ?parent order by desc(?news_count)
Open Data and News Analytics
News Popularity Ranking: Automotive
Open Data and News Analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 General Motors 2722 1 General Motors 4620
2 Tesla Motors 2346 2 Volkswagen Group 3999
3 Volkswagen 2299 3 Fiat Chrysler Automobiles 2658
4 Ford Motor Company 1934 4 Tesla Motors 2370
5 Toyota 1325 5 Ford Motor Company 2125
6 Chevrolet 1264 6 Toyota 1656
7 Chrysler 1054 7 Renault-Nissan Alliance 1332
8 Fiat Chrysler Automobiles 1011 8 Honda 864
9 Audi AG 972 9 BMW 715
10 Honda 717 10 Takata Corporation 547
News Popularity: Finance
Open Data and News Analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 Bloomberg L.P. 3203 1 China Merchants Bank 40940
2 Goldman Sachs 1992 2 Alphabet Inc. 24219
3 JP Morgan Chase 1712 3 Capital Group Companies 4379
4 Wells Fargo 1688 4 Bloomberg L.P. 3893
5 Citigroup 1557 5 Exor (company) 2775
6 HSBC Holdings 1546 6 JP Morgan Chase 2715
7 Deutsche Bank 1414 7 Nasdaq, Inc. 2178
8 Bank of America 1335 8 Oaktree Capital Management 1757
9 Barclays 1260 9 Goldman Sachs 1085
10 UBS 694 10 Sentinel Capital Partners 1064
Note: Including investment funds, stock exchanges, agencies, etc.
News Popularity: Banking
Open Data and News Analytics
Rank Company News # Rank Company incl. mentions of controlled News #
1 Goldman Sachs 996 1 China Merchants Bank * 38288
2 JP Morgan Chase 856 2 JP Morgan Chase 1972
3 HSBC Holdings 773 3 Goldman Sachs 1030
4 Deutsche Bank 707 4 HSBC 966
5 Barclays 630 5 Bank of America 771
6 Citigroup 519 6 Deutsche Bank 742
7 Bank of America 445 7 Barclays 681
8 Wells Fargo 422 8 Citigroup 630
9 UBS 347 9 Wells Fargo 428
10 Chase 126 10 UBS 347
Note: including investment funds, stock exchanges, agencies, etc.
Thank you!
Experience the technology with NOW: Semantic News Portal
http://now.ontotext.com
Start using GraphDB and text-mining with S4 in the cloud
http://s4.ontotext.com
Learn more at our website or simply get in touch
[email protected], @ontotext
Open Data and News Analytics