The most valuable treasure is knowledge and wisdom and the ...
From Big Data to Valuable Knowledge
-
Upload
gerard-de-melo -
Category
Technology
-
view
54 -
download
0
Transcript of From Big Data to Valuable Knowledge
From Big Data to Valuable Knowledge
Gerard de Melo, Tsinghua Universityhttp://gerard.demelo.org
From Big Data to Valuable Knowledge
Gerard de Melo, Tsinghua Universityhttp://gerard.demelo.org
25 Years of the World Wide Web:1989−2014
25 Years of the World Wide Web:1989−2014
http://geekcom.wordpress.com/2009/03/19/
Tim Berners-Lee
Big Data on the WebBig Data on the WebBig Data on the WebBig Data on the Web
Theological Hall, Strahov Monastery Library, Prague
Main Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: ScaleMain Challenge So Far: Scale
Matej Kren: Idiom. Prague Municipal Libraryhttps://www.flickr.com/photos/ill-padrino/6437837857/
Developing for ScalabilityDeveloping for Scalability
officialHadoopWordCount v1.0
excludingimportsandimprovementsin WordCountv2.0
Developing for ScalabilityDeveloping for Scalability
import com.twitter.scalding._
class WordCountJob(args : Args) extends Job(args) { TextLine(args("input")) .flatMap('line -> 'word) { line : String => line.split("""\s+""") } .groupBy('word) { _.size } .write(Tsv(args("output")))}
Developing for ScalabilityDeveloping for Scalability
Apache Spark Twitter's Scalding
Knowledge OrganizationKnowledge Organization
Image: http://commons.wikimedia.org/wiki/File:Mundaneum_Tir%C3%A4ng_Karteikaarten.jpg
Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Knowledge OrganizationKnowledge Organization
Image: Mundaneum
Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Universal Bibliographic Repertory(Repertoire Bibliographique Universel, RBU)by Paul Otlet and Henri La Fontaine in 1895
index cards with answers to queries
Alex Wright: This was a sort of“analog search engine”
Alex Wright: This was a sort of“analog search engine”
Current Challenge:Current Challenge:Knowledge OrganizationKnowledge Organization
Current Challenge:Current Challenge:Knowledge OrganizationKnowledge Organization
Alexandre Duret-Lutz https://www.flickr.com/photos/gadl/110845690/
25 Years of the World Wide Web:1989−2014
25 Years of the World Wide Web:1989−2014
HyperText(the “HT” in
“HTML”)
HyperText(the “HT” in
“HTML”)
Basic Idea:Connecting Data
Basic Idea:Connecting Data
http://geekcom.wordpress.com/2009/03/19/
Tim Berners-Lee
25 Years of the World Wide Web:1989−2014
25 Years of the World Wide Web:1989−2014
Source: Ivan Herman. Introduction to Semantic Web Technologies
Data reallyneeds to be more connected!
Data reallyneeds to be more connected!
The Web of Data:Linked Data
The Web of Data:Linked Data
Semantic WebSemantic WebJournal 2014Journal 2014Semantic WebSemantic WebJournal 2014Journal 2014
InterdisciplinaryInterdisciplinaryWork, e.g. inWork, e.g. inDigital HumanitiesDigital Humanities
InterdisciplinaryInterdisciplinaryWork, e.g. inWork, e.g. inDigital HumanitiesDigital Humanities
The Web of Data:Lexvo.org
The Web of Data:Lexvo.org
Source: Peter Mika
Entity Integration:Challenges
Entity Integration:Challenges
Entity Integration:Challenges
Entity Integration:Challenges
ACL 2010AAAI 2013ACL 2010AAAI 2013
Entity Integration:Challenges
Entity Integration:Challenges
One bad link is One bad link is enough to make aenough to make aconnected component connected component inconsistentinconsistent
One bad link is One bad link is enough to make aenough to make aconnected component connected component inconsistentinconsistent
ACL 2010AAAI 2013ACL 2010AAAI 2013
Entity Integration:Challenges
Entity Integration:Challenges
Min. cost solution:Min. cost solution:NP-hardNP-hard
APX-hardAPX-hard
Min. cost solution:Min. cost solution:NP-hardNP-hard
APX-hardAPX-hard
Entity IntegrationEntity Integration
ACL 2010AAAI 2013ACL 2010AAAI 2013
Our Solution:Our Solution:Use Linear Program andUse Linear Program andthen apply region growingthen apply region growing
techniquestechniques
→ → LogarithmicLogarithmicApproximationApproximation
GuaranteeGuarantee
Our Solution:Our Solution:Use Linear Program andUse Linear Program and
then apply region growingthen apply region growingtechniquestechniques
→ → LogarithmicLogarithmicApproximationApproximation
GuaranteeGuarantee
Taxonomic Links
a user wantsa list of
„Art Schools in Europe“
Taxonomic Integration:MENTA Approach
De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award
Taxonomic Integration:MENTA Approach
De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award
Taxonomic Integration:MENTA Approach
De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award
Taxonomic Integration:MENTA Approach
De Melo & Weikum (2010).CIKM Best Interdisciplinary Paper AwardDe Melo & Weikum (2010).CIKM Best Interdisciplinary Paper Award
UWN/MENTA: multilingual extension of WordNet forword senses and taxonomical information over 200 languages
Gerard de Melo
UWN/MENTAUWN/MENTAUWN/MENTAUWN/MENTA
Relation ExtractionRelation Extraction
Images: Denilson Barbosa, Haixun Wang, Cong Yu. Shallow Information Extraction for the Knowlege Web
Scaling Up:Tandon, de Melo & Weikum.AAAI 2011, COLING 2012
Scaling Up:Tandon, de Melo & Weikum.AAAI 2011, COLING 2012
Equivalent:
MetaWeb was acquired by Google.MetaWeb was just recently acquired by Google.MetaWeb, surprisingly, was acquired by Google.
Relation IntegrationRelation Integration
MetaWeb was bought out by Google.Google bought MetaWeb.Google acquired MetaWeb.MetaWeb was sold to Google.Google's acquisition of MetaWeb.Google's MetaWeb acquisition.and so on...
Underlying frame: Commercial transfer
● Capture the “who-did-what-to-whom”● Microsoft bought the patent from Nokia.
Nokia sold the patent to Microsoft.The patent was acquired by Microsoft [from Nokia].The patent was sold [by Nokia] to Microsoft.
Relation IntegrationRelation Integration
Buyer: Microsoft
Seller: Nokia
Product: The patent
Relation Integration:FrameBase.org
Bringing knowledge into a standard formbased on natural language (FrameNet)
Bringing knowledge into a standard formbased on natural language (FrameNet)
Relation IntegrationRelation Integration
X isAuthorOf YY writtenBy XX wrote YY writtenInYear Z
Relation IntegrationRelation Integration
YAGO: isMarriedTo predicateYAGO: isMarriedTo predicate
Freebase: Marriage EntityFreebase: Marriage Entity
Challenge:Modelling
Differences
Challenge:Modelling
Differences
Search Interfaces
“Which companies were created during the last century in Silicon Valley ?”
YAGO2:WWW 2011
Best Demo Award
YAGO2:WWW 2011
Best Demo Award
Gerard de Melo
Real Understanding?Real Understanding?
Knowledge Bases keep growing, butmuch of the Web is still not truly understood
Knowledge Bases keep growing, butmuch of the Web is still not truly understood
Real Understanding?
Source: CMU NELL Browser 2015-03-17
Over 4000countries
with >90%confidence
Over 4000countries
with >90%confidence
NoisyPatterns
NoisyPatterns
Future Challenge: Future Challenge: Real UnderstandingReal UnderstandingFuture Challenge: Future Challenge:
Real UnderstandingReal Understanding
Voynich Manuscript, early 15th century
From Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to KnowledgeFrom Big Data to Knowledge
Image:Brett Ryder
Machine LearningMachine Learning
Examples ProbablyIncorrect!
LearningLearning PredictionPrediction
ClassifierModel
Incorrect
Correct
Better Machine LearningBetter Machine Learning
Examples ProbablyIncorrect!
LearningLearning PredictionPrediction
Incorrect
CorrectClassifierModelBetter
Model!
+BetterLabelsfor Test
Data
ConversationConversation
Always there toanswer questionsAlways there toanswer questions
Learning Common-SenseLearning Common-Sense
Gerard de Melo
I'm cold.
Warm coffee and tea are available atCosta Coffee just around the corner.But don't forget your meeting with
Linda in half an hour!
Learning Common-Sense:From Big Data?
Learning Common-Sense:From Big Data?
WebChild
AAAI 2014WSDM 2014AAAI 2011
WebChild
AAAI 2014WSDM 2014AAAI 2011
WebChild: LearningCommon-Sense From Big Data
WebChild: LearningCommon-Sense From Big Data
Why do you think Mary put on thering at the end of the movie?
Yes, that was powerful scene. The factthat she put it on after reading theletter from her mother indicates
that she may have changedher mind about the value of ...
Future: Learning Advanced Common-Sense Knowledge?Future: Learning Advanced
Common-Sense Knowledge?
SummarySummarySummarySummary
Big Data is radically changing the world
Main Challenge in the Past: Scale
Main Current Challenge: Organization1. Entity Integration2. Taxonomic Integration3. Relation Extraction and Integration
Main Future Challenge: Real Understandingby learning from weak signals