Post on 02-Jan-2016
description
1© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
The Suggested Upper Merged Ontology (SUMO) at Age 7:
Progress and Promise
Adam PeaseArticulate Softwareapease@articulatesoftware.com
http://www.articulatesoftware.com
http://www.ontologyportal.org/
http://home.earthlink.net/~adampease/professional/
Presented at Ontolog
6 September 2007
2© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Overview
• SUMO is a large, open source, formal ontology stated in first-order logic
• Mapped to a large multi-lingual lexicon
• With open source tools for ontology development and application
3© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
What's New
• More content about social relationships, justice and law, military events-people-processes
• Wikipedia (DBpedia) links
• Updated mappings to WordNet 3.0
• New tests of inference and many new inference engines
• SQL and XML generation tools
• Many new academic and commercial uses
4© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
SUMO Prize - 2007
• US$3000.00
• Due December 1, 2007
• Entries must be open source SUO-KIF files that extend SUMO
• Judged on several criteria:– Degree of formalization
– Scope and coverage
– Coherent new topic or domain
– Actual utility in an application
5© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Pursuit of Rigor in Data Standards
Old-style (most common) standards specifications: (ISO 14258, Requirements for enterprise-reference architectures and methodologies)
“3.6.1.1 Time representation If an individual element of the enterprise system has to be traced then
properties of time need to be modeled to describe short-term changes. If the property time is introduced in terms of duration, it provides the base to do further analyses (e.g., process time). There are two kinds of behavior description relative to time: static and dynamic.”
Data-model standards (ISO 10303-41, Product Description and Support)ENTITY product_context SUBTYPE OF (application_context_element); discipline_type : label;END_ENTITY;
Semantic-model standards (IEEE P1600.1 - SUMO, ISO 18629-11, PSL Core)(forall (?t1 ?t2 ?t3) (=> (and (before ?t1 ?t2) (before ?t2 ?t3)) (before ?t1 ?t3)))
Thanks to Steve Ray, NIST
6© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
C.K. Ogden/I.A. Richards, The Meaning of MeaningA Study in the Influence of Language upon Thought and The Science of SymbolismLondon 1923, 10th edition 1969
Concept
Referent
Refers To Symbolizes
Stands For“Orange”
Terms and Concepts
from the slide of [Bargmeyer, Bruce, Open Metadata Forum, Berlin, 2005]
Slide adpated from (c) Key-Sun Choi for Pan Localization 2005
Term
Ontology work should be here,since logic is needed to substitute for
human thought.
Lots of “ontology” workhas really been here.
7© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Imagine...your view of the web
CV
name
education
work
private
Joe Smith
BS Case Western Reserve,1982MS UC Davis, 1984
1985-1990 ACME Software,programmer
Married, 2 children
8© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
...and the Computer's View
name
CV
education
work
private
Thanks to Frank van Harmelen for the original idea of this slide and Peter Yim for the Chinese language content
9© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
But wait, we've got XML -
<job name=”Joe Smith” title=”Programmer”>
10© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
But wait, we've got XML -
<job name=”Joe Smith” title=”Programmer”>
<x83 m92=”|||||||||” title=”..............”>
11© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
But wait, we've got Taxonomies -
Person
Mammal
JoeSmith
12© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
But wait, we've got Taxonomies -
o4839
x931
i3729
13© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Wait, we've got semantics -
Person
Mammal
JoeSmith
instance
subclass
implies
Mammal
JoeSmith
instance
14© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Wait, we've got semantics -
Person
Mammal
JoeSmith
instance
subclass
implies
Mammal
JoeSmith
instance
u8475
x9834
p3489
r53
r22
implies
x9834
p3489
r53
15© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Semantics Helps a Machine Appear Smart
•A “smart” machine should be able to make the same inferences we do
•(let's not debate the AI philosophy about whether it would actually be smart)
16© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Definitions
•An ontology is a shared conceptualization of a domain
•An ontology is a set of definitions in a formal language for terms describing the world
17© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Frames
• Object- or term-centered
• Frames, slots, values, (and attributes)
Adam: Person
height
occupation
5'8”
consultant
cardinality: 1
18© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Frame Restrictions• b is between a and c
– (between1 a betweenness1)– (between2 b betweenness1)– (between3 c betweenness1)– vs– (between a b c)
• Adam is not an accountant– (notOccupation Adam Accountant)– vs– (not (occupation Adam Accountant))
• Existential vs. Universal quantification
• Similar problems for many description logics
• Very efficient computation however
19© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Digression: Implementation is Different from Representation
• Why lose meaning at design time just because of runtime issues?– We can’t reason with English definitions, but that
doesn’t mean we shouldn’t document our terms
• Many different implementations may be done from the same representation
• This does not mean that run time issues should be ignored at design time– If you represent information you know can’t be
reasoned with, it better not be essential in most conceivable applications
20© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Many Ways to Use Ontology
• As an information engineering tool– Create a database schema– Map the schema to an upper ontology– Use the ontology as a set of reminders for
additional information that should be included
• As more formal comments– Define an ontology that is used to create a DB or
OO system– Use a theorem prover at design time to check
for inconsistencies
• For taxonomic reasoning– Do limited run-time inference in Prolog, a
description logic, or even Java
• For first order logical inference– Full-blown use of all the axioms at run time
21© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Upper Ontology
•An attempt to capture the most general and reusable terms and definitions
•Provokes thought on clarifying the meaning of more specific terms
•Provides for large-scale reuse
22© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Ontology vs Language and Knowledge
Ontology
- Expandable- language independent- machine understandable
Language
- understood by humans- ambiguous
Knowledge
- changes rapidly- may be local to an entity
23© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Suggested Upper Merged Ontology
•1000 terms, 4000 axioms, 750 rules
•Mapped by hand to all of WordNet 1.6
• then ported to 3.0
•Development begun in 2000– US Government small business grant
•Associated domain ontologies totalling 20,000 terms and 70,000 axioms
•Free
• SUMO is owned by IEEE but basically public domain
• Domain ontologies are released under GNU
• www.ontologyportal.org
24© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
SUMO (continued)
•Formally defined, not dependent on a particular implementation
•Open source toolset for browsing and inference
– http://sigmakee.sourceforge.net
•Many uses of SUMO (independent of the SUMO authors and funders)
– http://www.ontologyportal.org/Pubs.html
25© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
SUMO Validation
• Mapping to all of WordNet lexicon– A check on coverage and completeness (at a given
level of generality)
• Peer review– Open source since its inception
• Formal validation with a theorem prover– Free of contradictions (within a generous time bound
for search)
• Application to dozens of domain ontologies
26© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
WordNet
•Lexical database
•100,000 word senses – synsets
•Created by George Miller's group at Princeton
•Free
•De facto standard in the linguistics world
27© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
WordNet to SUMO Mapping
•WordNet synset “plant, flora, plant_life” is equivalent to the formal SUMO term 'Plant'
00008864 03 n 03 plant 0 flora 0 plant_life 0 027@ . . . | a living organism lacking the power of locomotion &%Plant=
SUMO has axioms that explain formally what a plant is(=> (and (instance ?SUBSTANCE PlantSubstance) (instance ?PLANT Organism) (part ?SUBSTANCE ?PLANT)) (instance ?PLANT Plant))
28© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
WordNet to SUMO Mapping
•Most nouns map to classes•Most verbs map to subclasses of &
%Process•Most adjectives map to a &
%SubjectiveAssessmentAttribute•Most adverbs map to relations of &
%manner
29© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Internationalization
• Translation of SUMO paraphrases to diverse multiple languages– Some confidence there’s no cultural or linguistic bias– Chinese, Hindi, Tagalog, Czech, German, Italian,
Korean, Romanian, Arabic– Estonian and Hungarian in development
• SUMO is linked to multiple very large lexicons (Euro WordNet, Balkanet, HowNet etc)– English, Chinese, Italian, Arabic
30© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
SUMO Structure
Structural Ontology
Base Ontology
Set/Class Theory Numeric Temporal Mereotopology
Graph Measure Processes Objects
Qualities
31© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
SUMO+Domain OntologyStructuralOntology
BaseOntology
Set/ClassTheory
Numeric Temporal Mereotopology
Graph Measure Processes Objects
Qualities
SUMO
Mid-Level
Military
Geography
Elements
Terrorist Attack Types
Communications
People
TransnationalIssues Financial
Ontology
TerroristEconomy
NAICS TerroristAttacks
…
FranceAfghanistan
UnitedStates
DistributedComputing
BiologicalViruses
WMD
ECommerceServices
Government
Transportation
WorldAirports
Total Terms Total Axioms Rules
20399 67108 2500
32© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Are SUMO Terms Directly Usable?
• Yes.
• Study – 1/3 of upper ontology terms directly appear in answers on large test– Cohen, P., Chaudhri, V., Pease A., and Schrag, R.
(1999), Does Prior Knowledge Facilitate the Development of Knowledge Based Systems, In Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-1999). Menlo Park, Calif.: AAAI Press. http://home.earthlink.net/~adampease/professional/cohen-aaai99.ps
• before (in time), agent (of a process), etc.
33© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
High Level Distinctions
The first fundamental distinction is that between ‘Physical’ (things which have a position in space/time) and ‘Abstract’ (things which don’t)
Entity
Physical Abstract
34© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
High Level Distinctions
Partition of ‘Physical’ into ‘Objects’ and ‘Processes’
Physical
Object Process
35© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Objects
ObjectSelfConnectedObject
SubstanceCorpuscularObject
RegionCollection
36© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Processes
DualObjectProcess Substituting Transaction Comparing Attaching Detaching Combining SeparatingInternalChange BiologicalProcess QuantityChange Damaging ChemicalProcess SurfaceChange Creation StateChangeShapeChange
IntentionalProcess IntentionalPsychologicalProcess RecreationOrExercise OrganizationalProcess Guiding Keeping Maintaining Repairing Poking ContentDevelopment Making Searching SocialInteraction ManeuverMotion BodyMotion DirectionChange Transfer Transportation Radiating
37© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Abstract
SetOrClassRelationPropositionQuantity
NumberPhysicalQuantity
AttributeGraphGraphElement
38© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Case Roles
• Roles that entities play in a Process– agent, patient, instrument etc.
39© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Case Roles
• “Brutus stabbed Caesar with a knife on Tuesday.”
A Stabbing
A Tuesday
A KnifeBrutus
Caesar
patient
agent
time
instrument
40© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Case Roles
• “Brutus stabbed Caesar with a knife on Tuesday.”
(exists (?S ?K ?T) (and (instance ?S Stabbing) (instance ?K Knife) (instance ?T Tuesday) (agent ?S Brutus) (patient ?S Caesar) (time ?S ?T) (instrument ?S ?K)))
A Stabbing
A Tuesday
A KnifeBrutus
Caesar
patient
agent
time
instrument
41© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Attributes and Roles
• (attribute JohnDoe Unemployed)
• (attribute GIJane Soldier)
• (attribute MyCar Blue)
42© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Example Rules(=> (instance ?DRIVE Driving) (exists (?VEHICLE) (and (instance ?VEHICLE Vehicle) (patient ?DRIVE ?VEHICLE))))
“If there's an instance of Driving, there's a Vehicle that participatesin that action.”
Not just an English definition for humans to read, but a logicaldefinition that can be used in proofs.
43© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Commercial Application
• One year project for Articulate Software
• Working with a company that creates financial transaction systems for royalty payments
• Re-engineer current ontology management business process, tools and ontology
44© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Commercial Application
• Extensive current ontology
• Captured in spreadsheets
• Local term names and definitions for every customer– An essential part of their process
• Ontology management system that exports XML & RDF
• One end-user database is nearly 3GB– Ontology functions can be batch-process
45© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Project Goals
• To add formality to existing model– To support full logical inference, consistency checks
• Give customers user-friendly ontology editor – so that they can maintain the ontology
• Create broader set of definitions – Enable greater DB integration
– Enable expansion into new markets
• Leverage work
• Exercise SUMO and Sigma in business
46© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Initial Tasks
• Implement UI improvements to Sigma– Simplified tree-based editor
– Simplified frame-style browser
• XML/SQL ontology export– Uses meta-predicates for physical DB
structure
• Merge existing ontology with SUMO
47© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
DBPedia
• “People” content uses FOAF– Lightweight, redundant, ad-hoc
– Only a tiny portion is used• birthdate, deathdate, birthplace, deathplace, names,
firstname, lastname
– http://xmlns.com/foaf/spec/– 16MB KIF content
http://www.ontologyportal.org/content/DBPediaPeople.zip
• Recent announcement of DBPedia now mapped to WordNet– Which gets us links to SUMO
48© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
TPTP
• Research effort in automated theorem proving
• 40+ different first order logic provers
• Annual competition
• Thousands of test problems
• We will issue SUMO-based tests in TPTP format next month
• Sigma connected to TPTP prover suite
49© 2007 Adam Pease, Articulate Software - apease [at] articulatesoftware [dot] com
Controlled English to Logic Translation
• Automated translation from English to Logic
• Uses WordNet-SUMO mappings for 100,000 word sense vocabulary
• Domain-independent
• Development process– Start with a highly restricted language and
gradually add linguistic features