DeepDive:ExtractingDatabasesfromDarkDataHazyResearchGroup,Infolab,ComputerScienceDepartment,StanfordUniversityJaehoShin,CeZhang,FeiranWang,SenWu,AlexRatner,TheodorosRekatsinas,ThomasPalomares,YoussefAhres,DanIter,IvanGozali,HenryEhrenberg,ZifeiShan,FengNiu,RaphaelHoffman,MichaelCafarella,ChristopherRé
DebuggingDeepDiveapps
RunningDeepDiveapps
WritingDeepDiveapps
- Interpretableprobabilities,noarbitraryscores- Calibrationplotsshowifsystemoutputmakessense- Algorithmindependentevaluationmethod- Iterativeerroranalysisprocesshelpsusersachievehighqualityrapidlybyfocusingonthetoperrormode
- Interactivetoolsforevaluatingandexploringdata
- Declarative,end-to-endspecification(DDlog)- Statisticalinferencemodel- Dataprocessing
- Standardtoolsfordataprocessing- SQL,PythonUDFs,CoreNLP- Extractorsinfamiliarlanguages
- Automaticfeatures(DDlib)- Genericpatternsindata(e.g.,NLPtags,dependency)
- Dataprogramming:DeepDive'sfuture- Programaticsupervision:noisybutcost-effective- Appsbuiltentirelywithrulesusingpatternsindata
DeepDivecreatingsocietalimpact
Input:Unstructuredinformatione.g.,text,tables(plantoaddimages,diagrams)
Output:Structureddatawithprobabilities
Factor Graph
- Leveragingdatabase&operatingsystemstechnologies- PostgreSQL,Greenplum- Parallel/distributeddataprocessing
- Incrementalmaintenanceofinferenceresults- Runsmallchangestomodelupto100xfaster- VLDB2015,SIGMODResearchHighlightAward2015
- DimmWitted:FastInferenceEngine- NUMA-awarehigh-speedGibbssampler- SIGMOD2014BestPaperAward
- Noise-awarelearningtechnique- Noisyrulesreinforcing/fixing-upotherrules- Handledjointlyinaunifiedstatisticalinferencemodel- Higherqualitypossibledespitenoisyindividualrules
1.8Mdocuments(TAC-KBP2014)
2.4Mfacts
- PowersDARPAMEMEXtofighthumantrafficking,activelydeployedinmanylawenforcementagencies
- Acceleratesbiodiversityresearch,featuredinNature- Award-winningknowledgebaseconstructionsystems
DeepDiveextractingandfusingknowledge
deepdive.stanford.edu
hasSpouse(p1,p2,1,"from_dbpedia"):-spouseCandidate(p1,name1,p2,name2),spousesInDBpedia(n1,n2),[lower(n1)=lower(name1),lower(n2)=lower(name2);lower(n2)=lower(name1),lower(n1)=lower(name2)].
hasSpouse(p1,p2,1,"wife/husbandbetween"):-spouseCandidate(p1,name1,p2,name2),personMention(p1,begin1,end1,s),personMention(p2,begin2,end2,s),sentences(s,tokens,lemmas),lemmas[end1..begin2]&&["wife","husband"].
Wherearethehuman-traffickersonthedarkWeb?
Locaeon Price
NYC $…
NaRonalDistribuRon
WhichgenemutaEoncauses
glaucoma?
Gene Phenotype
LTBP2 Glaucoma
Whatistheeffectofadrug?
Gene1 Gene2
BRCA1 ARNT
ProteinInteracRon
BiomedLiterature OnlineAds/Reviews
Gene-PhenoInteracRon
GenomicsLiterature
Mutations altering the coding sequence of KRT16 cause Pachyonychia Congenita …
Recessive mutations in LTBP2 (…) have been identified as a cause of early-onset glaucoma …
Mutations in QARS, encoding Glutaminyl-tRNA synthetase, cause progressive microcephaly …
incrementalruns<0.5hour[VLDB15]
fullrun 6hours
Thinkaboutfeatures,notalgorithms
Thinkabouttraining,notfeaturesnoralgorithms
- Opensourceengineforbuildingmachinelearningsystemsthatextractdatabasesfromdarkdata
- Helpsanswermacroscopicquestionsthatrequirehigh-qualitystructureddataatmassivescale
- Enablesrapiddevelopmentwithtools&abstractions
erroranalysisdebug
re-run
PaleobiologyAnti-Human Trafficking
TAC-KBP2014 Winner
Genomics Drug Repurposing
Top Related