Post on 17-Dec-2014
description
Copyright 2007
Cyc-Gateworkflow
Blaz Fortuna, Luka BradeskoCycorp Europe, Slovenia
Goal
• Demonstrate reasoning over non-structured input data
• Learn how to correctly annotate a new plug-in
• Learn how to add a new plug-in to the platform
External tools useds
• GATE– Information Extraction framework– Used here for extraction of named entities
from articles
• ResearchCyc– Common-sense knowledge base
• ~300,000 concepts, 1.3M assertions
– Reasoning engine
Pipeline diagram
Query Identify
Transform
Select
ReasonResult
ResearchCyc
GATE
Internet
Example
Query
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle " http://shodan.ijs.si:8080/GateServer/news.txt " .
?company cyc:isa cyc:PubliclyHeldCorporation }
Identify
• Find links to html documents and retrieve them using ArticleIdentifier plugin.– Returns a text document:
http://shodan.ijs.si:8080/GateServer/news.txt
Transform
• Use GATE to extract organizations– Retruns SetOfStatements of style:
article-0 urn:hasUrl “http://shodan.ijs.si:8080/GateServer/news.txt "
company-0 urn:nameString “Microsoft”
company-0 urn:mentionedInArticle article-0
company-1 urn:nameString “Ford”
company-1 urn:mentionedInArticle article-0
…
Query:
…
?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt"
…
Select
• Select only the companies with corresponding concept in ResearchCyc KBcompany-0 → #$MicrosoftInccompany-1 → #$FordMotors
• Replace URIs with Cyc conceptscyc:mentionedInArticle → #$mentionedInArticle
• Output:
#$MicrosoftInc #$mentionedInArticle #$article-0
#$FordMotors #$mentionedInArticle #$article-0
…
Reason
• Reason– Load the triples with
Cyc concept names in ReasearchCyc KB
– Transform SPARQL query to Cyc query
– Execute and retrieve results
Run the workflow on your computer!
Main class: eu.larkc.core.LarkcVM arguments: -Xmx512m
Run SPARQL client
• In windows:Double-click SPARQLClient.jar
• In Linux:java –jar SPARQLClient.jar
Run example query
• Execute query in SPARQL Client
• Walk-through the output of the program
• Go through the plug-ins’ .java files
Other interesting queries
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:PubliclyHeldCorporation }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:isa cyc:SoftwareVendor }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:SoftwareVendor }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>SELECT ?company WHERE{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news.txt" .?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .?company cyc:isa cyc:Business }
Other interesting queries
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?company cyc:makesProductType cyc:CellularTelephone }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?company cyc:makesProductType cyc:CellularTelephone .
?company cyc:stockTickerSymbol ?ticker }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?program cyc:programAuthor ?company }
PREFIX cyc: <http://www.cycfoundation.org/concepts/>
SELECT ?company WHERE
{ ?company cyc:mentionedInArticle "http://shodan.ijs.si:8080/GateServer/news2.txt" .
?competitor cyc:competitors ?company .
?competitor cyc:makesProductType cyc:CellularTelephone }
Plug-in SAWSDL description
<wsdl:description>
<!-- COMMON TO ALL SELECTERS -->
<wsdl:interface name="identifier"
sawsdl:modelReference="http://larkc.eu/plugin#Identifier">
</wsdl:interface>
<wsdl:binding name="larkcbinding" type="http://larkc.eu/wsdl-binding" />
<!-- SPECIFIC TO THIS IDENTIFIER -->
<wsdl:service
name="urn:eu.larkc.plugin.identify.article.ArticleIdentifier"
interface="identifier”
sawsdl:modelReference="http://larkc.eu/plugin#ArticleIdentifier" >
<wsdl:endpoint
location="java:eu.larkc.plugin.identify.article.ArticleIdentifier" />
</wsdl:service>
</wsdl:description>
Plug-in ontology
@prefix larkc: <http://larkc.eu/plugin#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
larkc:ArticleIdentifier
rdf:type rdfs:Class ;
rdfs:subClassOf larkc:Identifier ;
larkc:hasInputType larkc:SPARQLQuery ;
larkc:hasOutputType larkc:NaturalLanguageDocument .
Scripted decider
Pipeline pipeline = new Pipeline();
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.identify.article.ArticleIdentifier"));
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.transform.gate.GateTransformer"));
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.select.cycselecter.CycSelecter"));
pipeline.addPlugIn(new URIImpl("urn:eu.larkc.plugin.reason.cycreasoner.CycReasoner"));
try {
pipeline.start(theQuery);
} catch (Exception e) {
// error
}
return (VariableBinding)pipeline.take();
Write a new plug-in
• Create new project– New Folder– Link bin directory– Make source directory– Add libraries
• Prepare code:– Copy-paste GateTransformer.Java– Rename it to SimpleNamedEntitiyExtractor– Insert code available in SimpleNamedEntitiyExtractor.txt
• Prepare/update meta-data files– SimpleNamedEntitiyExtractor.wsdl– SimpleNamedEntitiyExtractor.rdf
• Update CycGateDecider• Clean, Build and Run!