Doing More with LESS: Logic Embedded in SpreadSheets Andre Valente (KSVentures and USC/ISI) David...

Post on 22-Dec-2015

215 views 0 download

Tags:

Transcript of Doing More with LESS: Logic Embedded in SpreadSheets Andre Valente (KSVentures and USC/ISI) David...

Doing More with LESS: Logic Embedded in SpreadSheets

Andre Valente (KSVentures and USC/ISI)David Van Brackle (ISX)Hans Chalupsky (KSVentures and USC/ISI)Gary Edwards (ISX)

LESS: Logic Embedded in SpreadSheets

Logic Embedded in SpreadSheets (LESS) will provide the user with a system which combines the power of logic-based knowledge representation and reasoning with the familiar and easily-mastered user interface paradigm of a spreadsheet

LESS was developed under an SBIR contract from DARPA Project Manager: David Gunning

Motivation Spreadsheets are Widespread but Error Prone

[Abraham & Erwig, 2003] 55 Million users (= programmers) of spreadsheets in

2005 Spreadsheets contain alarming rates of errors [Erwig

and Burnett 2002] Spreadsheets have their strengths…

Rapid model development Interactive “What If” feedback Scalable application of easily defined computational

logic … and weaknesses

Hard to express complex logical statements Data structure hides relationships

Logic spreadsheets can

Improve quality (less errors) Increase maintainability Support the construction of more

explicit and complex models Be a practical platform for knowledge

acquisition

Key Challenges Basic Perspective:

A pragmatic, scruffy, engineering approach Start with open mind – try things out, see what works

Usefulness – How to develop a useful tool, not just an academic exercise? Answer: a real hybrid Enrich the functionality of a spreadsheet while maintaining

on its core interaction paradigm Spreadsheet should not be a GUI for knowledge bases Logic should not be just a function library in the

spreadsheet Usability – How to get the syntax and the human interaction

right to make it usable Answer: four tiers Hide as much logic as needed Allows different trade-offs

The LESS Model

Tier 1: Logic functions available programmatically in formulas

Tier 2: Tabular-oriented Knowledge Integrated into

Spreadsheet GUI

Tier 3: Full Integration of

Spreadsheet and Logic System

Tier 4: App Templates

Tier 1: Logic functions available programmatically in formulas

PowerLoom functionality available to Spreadsheet functions as function calls

Lifting Mappings – pushing info into the KB

Query Mappings – extracting info from the KB

ASSERT( “Age”, $A2, B2 )

A B C D

1 Person Age Car Happy?

2 Fred 14

3 Joe 17 Ford Yes

4 Sam 33 Saab

IF( ASK( “Happy”, $A3 ), “Yes” )

Tier 2: Tabular-oriented Knowledge Integrated into Spreadsheet GUI

Instance Tables: A collection of instances of a given class will be represented as a special spreadsheet table

Creating a new Instance Table is equivalent to creating a Class in an Ontology

Rules are entered as text, much like Spreadsheet Functions

Tier 3: Full Integration of Spreadsheet and Logic System

Increased usability Hide away details of logic formulations Develop special GUIs integrated into the

spreadsheet Wizards Special cell editors TreeTable metaphors

Requires tighter integration of logic and spreadsheet data models

Tier 4: Application Templates Templatize working LESS applications Hide even more of logic to users Better deployment, packaging mechanisms

Domain Paks Templates captured by experts in a given field

and widely distributed Production tools (meta level) Mechanisms for connecting to existing data

(e.g. in DBMSs)

LESS Architecture

LESS Components LESS integrates:

PowerLoom, a powerful logic KR&R system Microsoft Excel, the most widely used

spreadsheet in the world LESS is implemented as an extension (add-

in) to Microsoft Excel Status: prototype implementation to be

demonstrated here LESS Excel add-in has 1.6Mb (includes

PowerLoom libraries) Implemented in C++ and Visual Basic

Excel Spreadsheet

PowerLoom

Push

Know

ledg

e In

to P

ow

erLo

om

Invoke

Pow

erLo

om

Functio

ns

Pull K

now

led

ge

From

Pow

erLo

om

Invoke

Spre

ad

sheet

Functio

ns

Push

Know

ledg

e T

o S

pre

adsh

eet

Component Interactions

LESS – a mapping layer between Excel and PowerLoom

PowerLoom PowerLoom is result of 20 years of research

and practical use of KR&R systems at USC/ISI

Powerful and expressive Unique “pragmatic” stance to tackle real

world problems Usability more important than theoretical

"neatness" Allow trade-offs between

expressivity/completeness A KR system for real-world applications

Used in HPKB, CoABS, RKF, EELD, etc.

PowerLoom Highlights Representation in fully-expressive first-order predicate logic KIF syntax with many extensions to standard FOL:

Type, set & cardinality relations (e.g., subset-of, range-cardinality) Second-order definitions via holds Classical negation and negation-by-failure, defaults

Frame-style definition language as syntactic sugar Defconcept, defrelation, deffunction, definstance, defrule

Incremental monotonic and non-monotonic updates Interleave definitions, assertions, retractions with retrieval and

inference Context mechanism separates name and assertion spaces with

inheritance Provides structuring mechanism, facilitates hypothetical reasoning

Several reasoning mechanisms Reasoning specialists architecture for “plug-ins” (e.g., time reasoner)

RDBMS interface based on lifting axioms

PowerLoom Status and Distribution Written in STELLA

Available in Lisp, C++ and Java Both STELLA and PowerLoom are now Open Source

Recent development for PowerLoom GNU or Mozilla licenses

Current release: PowerLoom 3.0.2.beta Distributed as Lisp, C++ and Java source More recently in STELLA source ~600 downloads world-wide ~400 subscribers to the mailing lists

Why Have a Logic Spreadsheet at All?

A Key Insight: Relational vs. Positional Referencing (1) Spreadsheets use positional referencing to refer to

information G5 = relative cell reference $G$5 = absolute cell reference A5:D5 = range of cells {A5 B5 C5 D5}

Positional Referencing is very intuitive in the beginning…

But it is highly error-prone More complicated than it seems at a first glance Spreadsheets try hard do update references when

changes are made (e.g., a line is inserted), but errors are frequently introduced

Also, restricted to two dimensions

A Key Insight: Relational vs. Positional Referencing (2) Logic uses relational referencing to refer

to information Meaning is conveyed by relations and assertions Information is referenced by open variables as

well as role in relations Example:

Logic function (hours ?person ?month ?task) Means the number of hours a ?person spends on

a ?task in a ?month N-ary relations and more powerful

proposition specification languages add power to spreadsheets

A Key Insight: Relational vs. Positional Referencing (3)

Key insight of LESS is to add relational referencing to spreadsheets

Mixed model allies best of both worlds Cells can still be used to refer to data

Cell Variables can be bound to contents of cells Logic propositions express relationship

between several information elements

LESS Features – Highlights

LESS Features – Highlights

Explicit relationships Multi-dimensional relations Better, Explicit (Business) Rules Powerful information querying Platform for Knowledge

Acquisition/Engineering Platform for deploying Knowledge-

Based Systems

About Our Examples Three main examples:

Pilot training spreadsheet Budget spreadsheet Grading spreadsheet

All examples are real-life spreadsheets Pilot training example based on a spreadsheet used by ANG

149th Tactical Fighter Squadron Budget spreadsheet was used by a small business for

writing cost proposals for government contracts Grading spreadsheet was used to grade an actual course as

well as an exercise for a class in spreadsheets Sometimes simplified for presentation purposes

Syntax in these slides is simplified (we are still adding syntactic sugar)

LESS Features – Highlights

Explicit relationships Multi-dimensional relations Better, Explicit (Business) Rules Powerful information querying Platform for Knowledge

Acquisition/Engineering Platform for deploying Knowledge-

Based Systems

Positional Referencing Looks Easier Than It Is (1)

For example, B6 contains the formula =SUM(B3:B5)

Simple, easy…

A B C D E1 Task 12 Person Jan-05 Feb-05 Mar-05 Apr-053 Fred 12 4 12 14 4 Wilma 4 7 15 3 5 Barney 10 8 3 6 6 Total 26 19 30 23

Hours

Positional Referencing Looks Easier Than It Is (2)

Even such a simple sheet generates errors when a new employee (line) is added!

Totals stayed the same but should have been updated

A B C D E1 Task 12 Person Jan-05 Feb-05 Mar-05 Apr-053 Fred 12 4 12 14 4 Wilma 4 7 15 3 5 Barney 10 8 3 6 6 Pedrita 10 7 9 9 7 Total 26 19 30 23

Hours

Positional Referencing Looks Easier Than It Is (3)

What is the problem? In positional referencing SUM(B3:B5)

really means “sum the three cells above me” This clearly is not what is intended

Intention of the user is “sum the Jan-05 hours for all persons in Task 1”

Relational referencing allows a user to say just that!

Relational Referencing in LESS Makes Relationship Explicit (1) With LESS, we want to express the relationships explicitly

(hours ?person ?month ?task) Means the number of hours a ?person spends on a ?task in a ?month

In order to add relational information to a spreadsheet, we need to add a lifting formula

One way is to add a lifting formula on top of each column: =ASSERTMULTI(“hours”, A3:A5, B2, $A$1) This expands into and performs a series of assertions, e.g.:

(hours A3 B2 A1) = (hours “Fred” Jan-05 “Task 1”) = 12 (hours A4 B2 A1) = (hours “Wilma” Jan-05 “Task 1”) = 4

A B C D E1 Task 12 Person Jan-05 Feb-05 Mar-05 Apr-053 Fred 12 4 12 14 4 Wilma 4 7 15 3 5 Barney 10 8 3 6 6 Total 26 19 30 23

Hours

Relational Referencing in LESS Makes Relationship Explicit (2) Once the information is lifted into PowerLoom, we

can easily express a cell as “sum the Jan-05 hours in Task 1 for all persons”

B6 contains the formula:=SUM (RETRIEVEALL (“hours ?person Jan-05 “Task 1”)”))

Notice mix of logic and spreadsheet functions

A B C D E1 Task 12 Person Jan-05 Feb-05 Mar-05 Apr-053 Fred 12 4 12 14 4 Wilma 4 7 15 3 5 Barney 10 8 3 6 6 Total 26 19 30 23

Hours

LESS Features – Highlights

Explicit relationships Multi-dimensional relations Better, Explicit (Business) Rules Powerful information querying Platform for Knowledge

Acquisition/Engineering Platform for deploying Knowledge-

Based Systems

Multi-dimensional relations (1)

Problem: positional referencing only works when information has at most two dimensions (3-arity relations)

Intuitively easy to see: user cannot make a range with three dimensions

Multi-dimensional relations (2) Example: calculate totals

across tables Formula looks much less

intuitive, e.g. B24=B3+B10+B17 Can’t use ranges

Extremely hard to maintain and error prone! E.g., if a new task is added or

there are many tasks What you really want to say is

“sum the hours in Jan-05 for Fred in all tasks”

A B C D E1 Task 12 Person Jan-05 Feb-05 Mar-05 Apr-053 Fred 12 4 12 14 4 Wilma 4 7 15 3 5 Barney 10 8 3 6 6 Total 26 19 30 23 78 Task 29 Person Jan-05 Feb-05 Mar-05 Apr-05

10 Fred 4 7 4 7 11 Wilma 12 12 4 14 12 Barney 10 8 10 8 13 Total 26 27 18 29 1415 Task 316 Person Jan-05 Feb-05 Mar-05 Apr-0517 Fred 12 12 4 14 18 Wilma 4 4 7 3 19 Barney 10 10 8 6 20 Total 26 26 19 23 2122 Total23 Person Jan-05 Feb-05 Mar-05 Apr-0524 Fred 28 23 20 35 25 Wilma 20 23 26 20 26 Barney 30 26 21 20 27 Total 78 72 67 75

Hours

Hours

Hours

Hours

What happened? Task table can be seen as a model with two

dimensions – person and monthly hours BUT: Overall information model reflects

indexing in three dimensions – person, month and task

When information needs to be aggregated across more than two dimensions, the spreadsheet model suffers

Pivot tables can overcome some limitations, but only for some cases

LESS Supports Multi-Dimensional Relations Assuming the same lifting

described earlier, we can easily express “sum the hours in Jan-05 for Fred in all tasks” as:

=SUM (RETRIEVEALL ("(hours Fred Jan-05 ?task)”))

This can also easily be turned in to an Excel-like template formula by mixing cell variables:

=SUM (RETRIEVEALL ("(hours A24 B$23 ?task)”))

A B C D E1 Task 12 Person Jan-05 Feb-05 Mar-05 Apr-053 Fred 12 4 12 14 4 Wilma 4 7 15 3 5 Barney 10 8 3 6 6 Total 26 19 30 23 78 Task 29 Person Jan-05 Feb-05 Mar-05 Apr-05

10 Fred 4 7 4 7 11 Wilma 12 12 4 14 12 Barney 10 8 10 8 13 Total 26 27 18 29 1415 Task 316 Person Jan-05 Feb-05 Mar-05 Apr-0517 Fred 12 12 4 14 18 Wilma 4 4 7 3 19 Barney 10 10 8 6 20 Total 26 26 19 23 2122 Total23 Person Jan-05 Feb-05 Mar-05 Apr-0524 Fred 28 23 20 35 25 Wilma 20 23 26 20 26 Barney 30 26 21 20 27 Total 78 72 67 75

Hours

Hours

Hours

Hours

LESS Features – Highlights

Explicit relationships Multi-dimensional relations Better, Explicit (Business) Rules Powerful information querying Platform for Knowledge

Acquisition/Engineering Platform for deploying Knowledge-

Based Systems

LESS Provides Better, Explicit Rules Spreadsheets encode business rules in formulas Several problems:

Rules are hidden from view – harder to audit, opaque Rules have to be repeated everywhere they are used

– higher chance for errors, difficult to maintain Another problem is that spreadsheet formulas are

very poor in expressing rules that are not strictly mathematical formulas No good mechanism for complex conditions

LESS allows a user to model explicit (business) rules Better methodology (trend towards explicit business

rules) Easier to debug and maintain Improves visibility, communication

Better, Explicit Rules - Example (1) In the example below, the formula to calculate the grade letter

(column H) requires nested conditionals: IF($H4<70,"D", IF($H4<75,"C-", IF($H4<78,"C", IF($H4<80,"C+",

IF($H4<85,"B-", IF($H4<88,"B", IF($H4<90,"B+", IF($H4<95,"A-","A"))))))))

Opaque, hard to read, easy to make mistakes “Rule” is repeated once per line – error prone, hard to maintain

A B C D E F G H1 Exam Final Letter2 Student 1 2 1 2 Grade Grade Grade3 Parker 56 84 89 93 90 81 B-4 May 95 92 79 83 95 90 A-5 Watson 89 87 98 78 80 86 B6 Jameson 66 95 99 69 69 79 C+7 Osborn 88 51 79 63 79 72 C-

Assignments Tests

Better, Explicit Rules – Example (2) In LESS we can define rules elsewhere and simply apply

them In the same example, the formula to calculate the grade

letter (column H) is a simple query:RETRIEVE("1 (letter-grade ", A3, " ?x)" ) )

Notice how the rules are defined once, applied several times

A B C D E F G H1 Exam Final Letter2 Student 1 2 1 2 Grade Grade Grade3 Parker 56 84 89 93 90 81 B-4 May 95 92 79 83 95 90 A-5 Watson 89 87 98 78 80 86 B6 Jameson 66 95 99 69 69 79 C+7 Osborn 88 51 79 63 79 72 C-

Assignments Tests

Better, Explicit Rules – Example (3)

In LESS rules can be defined separately and explicitly, and reused everywhere:

A B C D E F G11 Classification rules - Version 112 !=assert((=> (>= ?x 95) (letter-grade ?x "A")))13 !=assert((=> (and (< ?x 95) (>= ?x 90))(letter-grade ?x "A-")))14 !=assert((=> (and (< ?x 90) (>= ?x 88))(letter-grade ?x "B+")))15 !=assert((=> (and (< ?x 88) (>= ?x 85))(letter-grade ?x "B")))16 !=assert((=> (and (< ?x 85) (>= ?x 80))(letter-grade ?x "B-")))17 !=assert((=> (and (< ?x 80) (>= ?x 78))(letter-grade ?x "C+")))18 !=assert((=> (and (< ?x 78) (>= ?x 75))(letter-grade ?x "C")))19 !=assert((=> (and (< ?x 75) (>= ?x 70))(letter-grade ?x "C-")))20 !=assert((=> (< ?x 70) (letter-grade ?x "D")))

Better, Explicit Rules – Example (4)

Rules can also be parameterized on Excel data – a nice fit with Excel modeling style:

A B C D E F G H I J K L M N O P22 Classification rules - Version 223 =POWERLOOM( "(defconcept student)" )24 =POWERLOOM( "(deffunction final ((?x student :-> (?n number))" )25 Min Max Letter =POWERLOOM( "(deffunction letter-grade ((?x student)) :-> (?s string))" )26 95 100 A =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A26), P( "=<", "?g", C38 )), P("letter-grade","?x",LIT(B26)))27 90 94 A- =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A27), P( "=<", "?g", C39 )), P("letter-grade","?x",LIT(B27)))28 88 89 B+ =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A28), P( "=<", "?g", C40 )), P("letter-grade","?x",LIT(B28)))29 85 87 B =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A29), P( "=<", "?g", C41 )), P("letter-grade","?x",LIT(B29)))30 80 84 B- =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A30), P( "=<", "?g", C42 )), P("letter-grade","?x",LIT(B30)))31 78 79 C+ =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A31), P( "=<", "?g", C43 )), P("letter-grade","?x",LIT(B31)))32 75 77 C =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A32), P( "=<", "?g", C44 )), P("letter-grade","?x",LIT(B32)))33 70 74 C- =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A33), P( "=<", "?g", C45 )), P("letter-grade","?x",LIT(B33)))34 0 69 D =ASSERT("=>",P("and",P( "Final", "?x", "?g" ), P(">=","?g",A34), P( "=<", "?g", C46 )), P("letter-grade","?x",LIT(B34)))

Better, Explicit Rules – Example (5)

For ultimate elegance, we can use templates (Phase II):

A B C D E F G H I38 Min Max Letter39 95 100 A40 90 94 A-41 88 89 B+42 85 87 B43 80 84 B-44 78 79 C+45 75 77 C46 70 74 C-47 0 69 D4849 Template: =assert((=> (and (<= ?x ??P1) (>= ?x ??P2) ((letter-grade ?x ??P3)))50 =apply-template(B49,A39:C47)

LESS Features – Highlights

Explicit relationships Multi-dimensional relations Better, Explicit (Business) Rules Powerful information querying Platform for Knowledge

Acquisition/Engineering Platform for deploying Knowledge-

Based Systems

Powerful Information Querying Spreadsheets have limitations on how to

summarize information Particularly if the information is symbolic

Basic problem is that it is hard to select information from other parts of the spreadsheet Worse if information is in multiple tables and/or

has more than three dimensions Special wizards such as Excel pivot tables

help overcome some of these deficiencies But they still have restricted expressivity

LESS Demo – Squadron Training Examples

Also – Tiers 2, 3

LESS Features – Highlights

Explicit relationships Multi-dimensional relations Better, Explicit (Business) Rules Powerful information querying Platform for Knowledge

Acquisition/Engineering Platform for deploying Knowledge-

Based Systems

New Possibilities Opened with LESS

One way to look at logic spreadsheets is for logic to add spice to spreadsheets Delivers incremental improvements (Only) user is spreadsheet user

We found that logic spreadsheets can be good for knowledge engineers as well

Two types of applications: Template-based knowledge acquisition Reasoning

Template-based knowledge acquisition (1) A common problem in knowledge acquisition is to acquire knowledge about

many similar objects KA approaches are hard to understand for SMEs

Require understanding of modeling decisions For instance, not reasonable to ask an SMEs to model (example from DARPA

ARPI and JFACC Programs):(defconcept F-15 :IS-PRIMITIVE Fighter-Aircraft :IMPLIES (:AND (:FILLED-BY NICKNAME "Eagle") (:FILLED-BY MADE-BY MCDONNELL-DOUGLAS) (:EXACTLY 2 ENGINES) (:ALL GUN M61A1) (:FILLED-BY IN-SERVICE 1972) (:FILLED-BY PRIMARY-FUEL JP-4) (:FILLED-BY ALTERNATE-FUEL JP-5 JP-8 JET-A

JET-A-1 JET-B)) :DEFAULTS (:AND (:FILLED-BY CEILING 65000ft) (:FILLED-BY COST 15000000) (:FILLED-BY CREW-SIZE 1)))

Template-based knowledge acquisition (2) An approach to deal with this problem is to have

“knowledge templates” Templates abstract the way information is used in the logic

formalism Middleware code (e.g. a Lisp macro) “routs” the information in

the template to its place in the knowledge base (e.g., slot filler, type, default value)

Example of template implemented as Lisp macro:(def-aircraft-c F15 :types fighter-attack-aircraft :nickname "Eagle" :made-by McDonnell-Douglas :engine-count 2 :primary-fuel JP-4 :alternate-fuel (JP-5 JP-8 Jet-A Jet-A-1 Jet-B))

Ideal division of labor: SME enters information easily Knowledge engineer designs templates to map into a KB

Template-based knowledge acquisition (3) Logic Spreadsheets are great for

implementing knowledge templates Table format is natural for most SMEs Spreadsheet is widely used (info may already be

in a spreadsheet!) Information also available for other uses

Example:

A C D F G H1 Aircraft Maker # Engines Primary Fuel Alternate Fuels …2 F-15 McDonnel Douglas 2 JP-4 JP-5 JP-8 Jet-A Jet-A-1 Jet-B …3 F-16 General Dynamics 1 JP-4 JP-5 JP-8 Jet-A Jet-A-1 Jet-B …4 … … … … … …

LESS Features – Highlights

Explicit relationships Multi-dimensional relations Better, Explicit (Business) Rules Powerful information querying Platform for Knowledge

Acquisition/Engineering Platform for deploying

Knowledge-Based Systems

Reasoning Spreadsheets can contain full-blown knowledge-based

systems Plays on availability of information already in spreadsheets

For instance, we can re-implement PROSPECT [Valente and Schacchi, 2002] in LESS Input was a list of (instances of) business process models KB has knowledge about critiquing process models Output is agenda of problems to evaluate Applied to critique accounting processes in large corporation

LESS implementation would: Use tables and templates to express input process models KB can be provided explicitly or as a domain pak Output agenda of problems in the process models can be

implemented as an instance table

Tier 2 Example

Create instance table

Creating instances