Cf intro

71
April 13, 2006 1 Introducing: The Cyc Foundation April 13, 2006

Transcript of Cf intro

April 13, 2006 1

Introducing:

The Cyc Foundation

April 13, 2006

April 13, 2006 2

Motivations

Wikimedia Foundation:“Imagine a world in which every single person is

given free access to the sum of all human knowledge. That's what we're doing.”

Cyc Foundation:“Imagine a world in which every single person is given free access to programs that reason with the sum of all

human knowledge. That's what we're doing.”

April 13, 2006 3

Topic Map – Top Level

CycCyc

Ontology &Knowledge

Base

ReasoningModules

Interface to External Data Sources

Interface to External Data Sources

Cyc API

Cyc API

Knowledge

Entry ToolsKnow

ledge Entry Tools

User Interface(with Natural Language Dialog)

User Interface(with Natural Language Dialog)

DataBases

WebPages

Text Sources

Other KBs

OtherApplications

KnowledgeAuthors

KnowledgeUsers

ExternalData

Sources

Cyc Reasoning SystemCyc Reasoning System

April 13, 2006 5

Query:“Someone happy”

Caption:“A man watching his daughter take her first step”

Help Find Information by Help Find Information by Inference (+KB)Inference (+KB)

April 13, 2006 6

Query:“Someone happy”

Caption:“A man watching his daughter take her first step”

Help Find Information by Help Find Information by Inference (+KB)Inference (+KB)

(∃x) (feelsEmotion x Happiness Positive)

(∃x,y) (and (father x y) (gender x Female) (sees x y) (walking

Logi

cal I

nfer

ence

(ded

uctio

n)

April 13, 2006 7

Help Find Information by Help Find Information by Inference (+KB)Inference (+KB)

(∃x) (feelsEmotion x Happiness Positive)

(∃x,y) (and (father x y) (gender x Female) (sees x y) (walking

Logi

cal I

nfer

ence

(ded

uctio

n)(implies (and (isa ?BIG-EVENT HumanLifecycleMilestone) (doneBy ?BIG-EVENT ?CHILD) (sees ?PARENT ?BIG-EVENT) (children ?PARENT ?CHILD)) (holdsIn ?BIG-EVENT (feelsEmotionTypeAtLevel ?PARENT (PositiveAmountFn Pride))))

. . .. . .

. . .. . .

ALICE• JUDGE11[9:26:0]I WILL ASK YOU SOME QUESTIONS• PROGRAM[9:26:0]Do you think your plan will succeed?• JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR• PROGRAM[9:26:16]WHAT COLOR IS a blue car• JUDGE11[9:26:35]WHERE IS SUE’S NOSE WHEN SUE IS IN

HER HOUSE• PROGRAM[9:26:35]Where it belongs.

Try searching the World Wide Web.• JUDGE11[9:26:49]WHAT HAPPENS TO AN ICE CUBE IN A

HOT DRINK• PROGRAM[9:26:49]Try it.• JUDGE11[9:27:25]HOW IS THE FATHER OF ANDY S

MOTHER RELATED TO ANDY • PROGRAM[9:27:25]Fine as far as I know.

Natural Language Understanding requires having lots of knowledge

1.The pen is in the box. The box is in the pen.

2. The police watched the demonstrators…

…because they feared violence.

…because they advocated violence.

April 13, 2006 10

System: Ready for input.

User: what do you know about anthrax?System: By “anthrax” do you mean

(1) the heavy metal band or (2) bacteria?

User: 2System: Here is what I know about it:

Anthrax is a species of organism.

All anthrax is a bacterium.

All anthrax can lethally infect mammals.

Anthrax is extremely toxic for people to inhale.

Iraq has biological weapons that deliver anthrax.

Anthrax is opaque, tangible, microscopic, .

Anthrax can be dried into spores stored indefinitely

April 13, 2006 11

System: Ready for input.

User: what do you know about TR1?System: I know nothing about it.

What sort of thing is TR1?

User: bacteriaSystem: TR1 lethally infects_________?User: People

System: By “People”, I assume you mean human beings, not People Magazine.

System: Does it infect some broader category of animals (e.g., air-breathing vertebrates, chordates,placental mammals, primates, eucaryotes, etc.)?User: mammals

April 13, 2006 12

System: Is TR1 similar to anthrax?

User: yesSystem: Which of the following are true of TR1:

.

.

.User: Al Qaida possesses some TR1.System: Ready for input.User: Does Osama Bin Laden control anything that can be the basis of weapons of mass destruction?

System: Yes, some TR1.

User: please explainSystem: Osama Bin Laden controls Al Qaida.

Al Qaida posseses some TR1. TR1 is a bacteria that lethally infects people.

April 13, 2006 13

April 13, 2006 14

April 13, 2006 15

April 13, 2006 16

April 13, 2006 17

April 13, 2006 18

April 13, 2006 19

April 13, 2006 20

April 13, 2006 21

April 13, 2006 22

Eff

icie

ncy

Expressiveness

Efficiency vs. ExpressivenessEfficiency vs. Expressiveness

C++

PASCAL

LISP

English, GermanFirst-order logic nth-order logic

HL (heuristic level language)

EL (epistemological level language)

Use two cooperating languages (EL and HL)

to escape the limitations of an age-old tradeoff.

Continuing improvements in inference performance

won’t negatively effect expressiveness.

April 13, 2006 23

NOW: CyN in Doom3 (2005)

April 13, 2006 24

BURC: Bootstrapping Using ResearchCyc

• Goal: To extend Cyc’s knowledge base using “relationships implied to be possible, normal or commonplace in the world”

• Prior work with Cyc knowledge entry has been manually oriented

• How will we collect common sense without a body and manual labor…?

• Read, Parse, Mine!• Proposal: Read text, Parse into a database, Extract

relations between words, Propose hypothetical relations between concepts

April 13, 2006 25

BURC: Basic Analogy

• The Shotgun approach to the Human Genome• Extract millions of fragments • Knit them back together by finding commonalities• Will it work for the Human Memome?• James Burke: ‘Mr. Connections’

Lenat’s Bootstrap Hypothesis: once Cyc reaches a certain level/scale it can help in its own development and start using NLP to augment its knowledge base

April 13, 2006 26

Mining Adjective Knowledge Example

• “white blouse” as factoid fragment

• Hypothesis: (plausibleValueOfType Blouse mainColorOfObject WhiteColor)

April 13, 2006 27

Flow of Processing

BNC DataBNC Data

Frag Frag FileFile

Merged Merged Frag FileFrag File

Cyc/RcycCyc/Rcyc

Hypothesis File

Extractor Extractor / DBDB ManagerManager

Parser 1 Parser 2 Parser 3 Parser 4 Parser 5

Frag Frag FileFile

Frag Frag FileFile

Frag Frag FileFile

Frag Frag FileFile

LinkLink Fragments DBFragments DB

Facts(Database)

Facts(Database)

UpperOntology

CoreTheories

Domain-SpecificTheories

April 13, 2006 28

(Very) Brief History of Cyc

• c. 1967 – AI is used on toy problems.

• c. 1977 – Expert systems reason in narrow domains.

• c. 1983 – Lenat, Minsky, Feigenbaum, Kay, and others recognize need for a substrate of shared world knowledge; and realize it would take hundreds of person-years to “prime the pump”.

• 1984 – Admiral Bob Inman convinces Lenat to leave Stanford and pursue this high-risk, high-payoff project (Cyc) within MCC.

• 1994 – Cycorp is formed.

April 13, 2006 29

April 13, 2006 30

“The driver of the power of intelligent systems is the knowledge the systems have about their universe of discourse, not the sophistication of the reasoning process the systems employ. Cyc has not only the world’s largest knowledge base, but the best represented from a technical point of view.”

Ed Feigenbauminventor of the first expert system

editor of the AI Handbook

April 13, 2006 31

“People have silly reasons why computers don’t really think. The answer is we haven’t programmed them right; they just don’t have much common sense.

There’s been only one large project to do something about that, that’s the famous Cyc project…”.

-- Marvin Minsky

April 13, 2006 32

How has Cycorp done?

• 20 years• 3 million facts and rules (hand-entered)• Compelling demos• Some applications (constrained by business model)• The basis for much greater growth• “If the right way to build an A.I. involves giving

Cyc away for free, that is what we will do.” – Doug Lenat (repeatedly)

– Note: Jury is out on what the “right way” is

April 13, 2006 33

Cycorp: True to its Promise• OpenCyc

– The entire Cyc structural ontology: FREE– 300,000 concept terms, ~2M facts and rules

• ResearchCyc– Equal to Full Cyc (w/ Research-only license)– Source code for inference engine not released– API with 18,000 functions and macros!– Ability to compile in your own additions

• Q: Will more be released? A: It depends. – Cycorp must financially support its own R&D.– Existing releases must result in major project benefits.

And it doesn’t really matter.

April 13, 2006 34

Time for the Next Phase• Cycorp has gotten us to where we are

– Representational ability– Inference ability– …and will continue (R&D leader, commercialization

• The rest of the world will help get us where we are going– Breadth of content– Broad real-world diffusion

The thinking that got us to where we are today is insufficient to solve the problems that exist today. To solve today's problems requires a new level of thinking.

-- Einstein

April 13, 2006 35

Building Cyc qua Engineering Task

amount known

rate

of l

earn

ing

learning by discove

ry

learning via

natural language

CYC

750 person-years

21 realtime years

$75 million

Frontier of human knowledge

198

4

200

420

06

codify & enter each piece of knowledge, by hand

April 13, 2006 36

Building Cyc qua Engineering Task

amount known

rate

of l

earn

ing

CYC

750 person-years

21 realtime years

$75 million

198

4

200

420

06

codify & enter each piece of knowledge, by hand 1000 years

10 years

April 13, 2006 37

How will we get the knowledge?

GamesThatMatter!

April 13, 2006 38

Foundation as Continuation

• Are we trying to make an A.I.?– No.

• Are we trying to make computers behave much more intelligently?– Yes!

April 13, 2006 39

Mission (DRAFT)

The Cyc Foundation has been formed as an independent not-for-profit organization to hasten the arrival of intelligent tools

that will help humanity.

April 13, 2006 40

Assumptions

• (Currently) 9 ideas that shape strategy, objectives and policy

• These may need to be validated, modified or augmented

• In some cases, assumptions are followed by related policy

April 13, 2006 41

Assumption #1

Long before computers are as smart as people, they will be (in some cases already have been) put to use to cure disease, address hunger problems, make important new scientific discoveries and help people work together.

Smarter computers will do a better job of this.

April 13, 2006 42

Assumption #2Cycorp has developed and cared for what we believe is an important piece of the AI puzzle.

They have always wanted to release it to the public, but it had to be when people could realistically develop it further on their own without in some way endangering the project.

One fear was “forking”, or creating incompatible variants of the knowledge base.

Cycorp and The Foundation will cooperate on 1 KB.

April 13, 2006 43

Flow of Cyc Data

Cycorp

Cyc Foundation

RCyc User

Gamer / Wikipedia user

Team:- Subject-matter expert- Ontologist

April 13, 2006 44

Assumption #3The knowledge that will give computers human-like intelligence ultimately needs to be free.

That's our best hope of having it put to best use.

Portions of knowledge will always be held proprietary.

The more shared a piece of knowledge, the greater will be the force pulling all of its representations toward freedom (to avoid the burden of maintaining a non-standard representation).

April 13, 2006 45

Assumption #4Proposed Semantic Web standards (such as those related to OWL) are an important step in the right direction, because they provide a foundation for working with meaning on the Web.The Cyc ontology will be a valuable addition, because it can act as a semantic hub, allowing us to have shared meaning.There is some concern that a top-down central ontology will dictate use of terms that may not meet a project’s needs. We will be able to show that use of the Cyc ontology can satisfy both needs and will be a useful complement to the great work that has already been done toward the Semantic Web.

April 13, 2006 46

Assumption #5

We all have something to learn.

We all have something to teach.

The Foundation mission will benefit from a very broad base of support, rather than the traditional rule by the technical elite.

April 13, 2006 47

Assumption #6

For this effort, focused work by many will be more valuable than genius work by a few.

To be most helpful, people should work together, and on tasks where they are capable of contributing successfully.

(Example: don’t go off and try to “solve the A.I. problem” by yourself.)

April 13, 2006 48

Assumption #7

Regular humans can be turned off by overly technical talk that is out of place – and rightly so.

We need to be inclusive in our language and in our activities in order to ensure the broadest base of support and participation.

This is especially true in the Cyclify initiative.

April 13, 2006 49

Assumption #8

There is no “us” and “them”

• The Foundation is managed by its volunteer board and run by its volunteer members

• The Foundation will start with no employees

• The will be no BDFL – Benevolent Dictator for Life

April 13, 2006 50

Assumption #9

Fun is mandatory!

• By comparison, contributing to SETI is like cleaning your oven while you sleep.

• This work will be hands-on, compelling and (hopefully) addictive.

• If you’re not having fun, find out why and fix it.

April 13, 2006 51

Foundation Goals

• Convert human knowledge to a form that computers can reason with

– Grow the Cyc Ontology and KB Exponentially

• Establish a standard vocabulary and language for representing concepts & knowledge

• Support the creation of intelligent tools

• Promote free and efficient knowledge transfer

Cyclify

April 13, 2006 52

Cyclify Knowledge Collection Activities• Web Games

– Validate acquired knowledge– Multiple-choice fact entry– More?

• Wikipedia Linking• KR Dating Service

– Wiki-based knowledge entry– A SME paired with an ontologist

• WordNet Linking

April 13, 2006 53

Playflow Within Cyclify

Wikipedia userTeam:- Subject-matter expert- Ontologist

GameServer

Wiki KnowledgeServer

WikipediaData

RCyc UserCycorp

K. AcquisitonData

Gamer

RCycRCyc

April 13, 2006 54

I’m thinking of a sentence…

Because I read about it on the web.

Status:I have 2 answers

TrueTrue Fibromyalgia is caused by ticks.

FalseFalse Don’t KnowDon’t KnowDoesn’t make senseDoesn’t make sense

Score: 24

April 13, 2006 55

Status:I think this sentence is probably not right

Submitting...

Thank you!Answers: 2

You agreed with: 100%

I now have a better understanding of:

Fibromyalgia is caused by ticks.

Score: +2NextNext

Score: 26

April 13, 2006 56

Cyc Image

DMZ Boundary

computer (inside) computer (outside)

KAGs

GAFsweb gatheredhypothesized

asserted…

Forwardrules

SubL form, runningKAG-collecting query

scpXMLfile

Populator (java)

Applet

XMLfile

PostGRESdatabase

Question Server (java)

Applet

Current Architecture

Applet

April 13, 2006 57

Cyc Foundation Projects

• Nonprofit Formation (planning/budgeting/filing)• Foundation Website• Cyclify• Fundraising• Membership management• Events• ResearchCyc

– Recommend Cyc features / functions / design– Help with ResearchCyc testing, documentation

April 13, 2006 58

Budgeting

• Must develop budget related to Year 1 plan

• Possible areas of spending– Legal filings– Server hosting– W3C membership

– Conference attendance– Fundraising

April 13, 2006 59

Foundation Website

• Requirements– Content management features– Collaboration features– Out-of-the-box ease of use

– Free

• Currently evaluating Joomla (Mambo)

• Desired launch: May 15

April 13, 2006 60

Cyclify Projects• First Web Game

– Develop game– Viral marketing– Add wiki linking activity

• Wiki Knowledge Collection– Set up wikip.cyclify.org– Add frame for ontologizing– Feed wikip links to Web game

• Back End– Design and implement PlayFlow– Submit collected knowledge to Cycorp

April 13, 2006 61

Fundraising• Individual Memberships

– Free membership for first 6 months for Cyclify members and ResearchCyc users?

– How much?– What do you get?

• Corporate Donations– Need to prepare story– Seems feasible to get donations

April 13, 2006 62

What does nonprofit mean?

• Cannot have investors or disburse earnings

• Can have earnings, though

• Revenues must come from services that are within mission

• 501(c)(3)? (like Wikimedia Foundation)

• Or 501(c)(6)? (like Eclipse Foundation)

April 13, 2006 63

The Foundation Board of Directors

John De Oliveira Founder and President Strategy, Corp. Fundraising

Mark Baltzegar Co-Founder and Vice President Strategy, Game Devel., IT

OPEN Secretary, Treasurer Secretary, Treasurer

David James Board Member Organizational Dynamics

OPEN Board Member Standards

OPEN Board Member Events, Operations Delegator

OPEN Board Member Architecture, Playflow Design

TBD Sept. 2006 Board Member Oversight

Name Position Role

April 13, 2006 64

The Foundation: MembershipProject Leader, Cyclify Stu Baurman Keith Wright

Project Leader, ResearchCyc Kino Coursey Pierluigi Miraglia

High Scorer (current month) Douglas Miles Gavin Matthews

High Scorer (all time) Arturo Hernandez Joe Simone

David Whitten Guyren Howe ~100 ResearchCyc Users

Brad Bouldin John Cabral YOU!

Larry Lefkowitz Ben Rode

Bill Jarrold Jason Azbahr

April 13, 2006 65

ResearchCyc Users

Xerox PARC

Daxtron Labs Lockheed Martin ATLD

Government

Government-related

Commercial

HoustonVA Medical Center

Air ForceRome Labs

Institute for the StudyOf Accelerating Change

U of Maryland

Language ComputerCorporation

NTTCommunications Science

Laboratories (Japan)

Northwestern U Stanford NLP Dept.

ANSER, Inc.

LBJ School of Public Affairs

Fraunhofer Institute

U of Illinois Urbana-Champaign

New MexicoHighlands Univ.

Harvard U

Linkoping U (Sweden)

Radboud U (Netherlands)

Tokyo Inst.of Technology

Terra IncognitaUniversity

Microfabrica, Inc.

U of Stuttgart

NPOs

MIT Media Lab

Witan International

U of Pennsylvania

SRI21st Century

Technologies

U of Minnesota

Stone’s Throw Technologies

ISI

Trimtab Consulting

U of Hawaii

Rensselaer AI and Reasoning LabTNO-DMV (Netherlands)

Sapio Systems (Denmark)

U of Toronto

Knowledge Media Institute, Open

University

Austin Info Systems

April 13, 2006 66

How can I help?

• Humans (a.k.a. common sense experts)

• Programmers– Web programmers– Cyc programmers

• Ontologists

• Subject-matter experts

• Bloggers

April 13, 2006 67

Human Cyclists*• Play the Web Game• Come up with new game ideas• Link Wikipedia to Cyc• Learn more about Cyc• Befriend an ontologist• Tell a friend about Cyclify• Write to a blog about Cyclify• Help with viral marketing• Design a logo• T-Shirts: Buy one, or Create and sell them

* From now on, we’re all “Cyclists” – people who interact with Cyc in one way or another.

April 13, 2006 68

Programmers

• Help design and build a web services interface• Learn the architecture of Web Game #1• Design an add-on for the Web game• Learn how to use the question server• Propose a new game• Help develop/support technical infrastructure• Help organize documentation• Help write the Cyc books

– to be published by O'Reilly

April 13, 2006 69

Ontologists

• Identify gaps in the knowledge base

• Befriend a Subject Matter Expert– Work together on a domain

• Befriend a Human Cyclist– Teach one who wants to learn basic ontology skills

• Help organize documentation

• Help write the Cyc books

April 13, 2006 70

Bloggers

• Blog about Cyclify

• Link to each other’s blogs

April 13, 2006 71

Timeline (Milestones)

• May 15 – Launch Foundation Website

• Build membership up until July 15

• June 15– File Articles of Formation w/ Sec. Of State– First Web game in beta

• July 15 – Launch Game

• October – First OpenCyc build containing game data