Summer School DSL 2013 - SpreadSheet Engineering

download Summer School DSL 2013 - SpreadSheet Engineering

If you can't read please download the document

Transcript of Summer School DSL 2013 - SpreadSheet Engineering

Jcome Cunha1,2, Joo P.Fernandes1,3, Joo Saraiva1

1 HASLab / INESC TEC & Universidade do Minho2 ESTGF, Instituto Politcnico do Porto3 (rel)ease, Univ. da Beira Interior Portugal

DSL 2013

Spreadsheet Engineering

This Tutorial

Domain Specific Languages

Visual Modeling Domain Specific Languages

Embedding Domain Specific Language

Model-Driven Engineering

Software Evolution

Bidirectional Software Evolution

Empirical Studies

This Tutorial

All defined in the context of Spreadsheets

And, implemented in the Functional Programing Language Haskell!

This Tutorial: Plan

Part I

Motivation

Spreadsheet AnalysisData Mining Techniques

Models for Spreadsheets ClassSheets Embedded ClassSheet Models

Model-driven Spreadsheets

Image taken from http://www.flickr.com/photos/cosmosfan/2414002070/

Spreadsheets are widely used

Spreadsheets are widely used

Why do Spreadsheets matter?

Probably the Biggest Programming Language!

Probably the Biggest Functional Programming Language!

Probably the Biggest Domain Specific Language!

Probably the Biggest software system!

Probably the biggest database system!

Spreadsheets are great!

Why do Spreadsheets matter?

Financial intelligence firm CODA reports that 95% of all U.S. Firms use spreadsheets for financial reporting

Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008

Why do Spreadsheets matter?

In 2004, RevenueRecognition.com (now Softtrax) had the International Data Corporation interview 118 business leaders.

IDC found that 85% were using spreadsheets in financial reporting and forecasting.

Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008

Why do Spreadsheets matter?

50% of all spreadsheets are the basis for decisions.

Supporting professional spreadsheet users by generating leveled dataflow diagrams, Felienne Hermans, Martin Pinzger and Arie van Deursen, 2011

Why do Spreadsheets matter?

They are the programming language of choice by non-professional programmers, a.k.a. end-users In the U.S. alone, the number of end-user programmers is conservatively estimated at 11 million, compared to only 2.75 million other, professional programmers

Estimating the numbers of end users and end user programmers, Christopher Scafdi, Mary Shaw, and Brad Myers, 2005

Why do Spreadsheets matter?

Why are they so popular?

Which characteristics make them so successful?

First empirical study: fill in the inquiry!

But, as a programming language...

Exercise: Write a program that sums a list of integer values. sum :: [Int] -> Int sum [] = 0 sum (h:t) = h + sum t

In fact, spreadsheets lack:

Abstraction

Encapsulation

Type system

Testing

IDE

...

And the consequences may be...

http://www.eusprig.org/stories.htm

Economy losses of $10 billion/year!

I. Spreadsheet Analysis

Spreadsheet: An Example

A flight scheduling spreadsheet.

Functional Dependencies

Informally, a functional dependency between a column A and another column B means that the values in column A determine the values in column B, that is, there are no two rows in the spreadsheet that have the same value in column A but differ in their values in column B.

Functional Dependencies

Functional Dependencies

Any more Functional Dependencies?

Functional Dependencies

Usually, too many!

Functional Dependencies

We compute the business logic from the data, by inferring Fds.

They are the building blocks inferring models for (legacy) spreadsheets.

The better are the FDs we infer, the better is the model we compute!

Functional Dependencies

We use a data mining algorithm that produces to many accidental Fds!

We introduce some spreadsheet specific heuristics to filter out accidental FDs

Functional Dependencies

Label semantics: often keys are labeled code or id

Label arrangement: we prefer FDs respecting the order of columns

Antecedent size: small keys are preferable

Ratio: small ratio between keys and non-keys

Single value columns: columns always with the same value appear in too many FDs

Functional Dependencies

Functional dependencies after prunning:

Relational Model

Having computed the FDs, we can now use the FUN algorithm to produce a relational model for the spreadsheet:

Relational Model

Discovery-based Edit Assistance for Spreadsheets, Jcome Cunha, Joo Saraiva, and Joost Visser. In proceedings of 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2009).

From Spreadsheets to Relational Databases and Back, Jcome Cunha, Joo Saraiva, and Joost Visser. In proceedings of the 2009 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM 2009).

Spreadsheets with embedded
Relational Model

Spreadsheets with embedded
Relational Model

Spreadsheet Analysis

Spreadsheet Querying: Rui Pereira (talk at the students workshop)

Spreadhseet Smells: Pedro Martins: talk at the students workshop (not related to spreadsheets!).

Spreadsheet Querying

Spreadsheet Querying

We need model-driven queries!

Spreadsheet Smells

We have implemented a full catalog of smells for spreadsheets:

empty cell, pattern finder, reference to empty cells, multiple operations, multiple references, conditional complexity, long calculation chain, duplicated formulas, innapropriate intimacy, etc

Still...

Around 200 people who thought their only experience of the London 2012Olympic Games would be minor heats of synchronised swimming havereceived an unexpected upgrade to the mens 100m final following anembarrassing ticketing mistake....Locog said the error occurred in the summer, between the first andsecond round of ticket sales, when a member of staff made a singlekeystroke mistake and entered 20,000 into a spreadsheet rather thanthe correct figure of 10,000 remaining tickets.

The Telegraph, 04 January 2012

II. Models inSpreadsheet

The Contribution of Models

ClassSheets - Models in Spreadsheets

ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications, Gregor Engels, Martin Erwig, ASE'05

ClassSheets are a high-level, object-oriented formalism to specify the business logic of spreadsheets

ClassSheets - Models in Spreadsheets

ClassSheets - Models in Spreadsheets

ClassSheets - Models in Spreadsheets

ClassSheets - Models in Spreadsheets

I. ClassSheet Model Inference

I. ClassSheet Model Inference

Automatically Inferring ClassSheet Models from Spreadsheets, Jcome Cunha, Martin Erwig, Joo Saraiva, VL/HCC'10

Data mining techniquesDatabase normalization theory

I. ClassSheet Model Inference

Still...

Harvard University economists Carmen Reinhart and Kenneth Rogoff have acknowledged making a spreadsheet calculation mistake in a 2010research paper, Growth in a Time of Debt, which has been widely cited to justify budget-cutting.

Business Week, 18 April 2013

In a 2010 paper* Carmen Reinhart, now a professor at Harvard Kennedy School, and Kenneth Rogoff, an economist at Harvard University...argued that GDP growth slows to a snails pace once government-debt levels exceed 90\% of GDP. The 90% figure quickly became ammunition in political arguments over austerity...This week a new piece of research poured fuel on the fire by calling the 90% finding into question..

The Economist, 17 April 2013

II. Embedding ClassSheets in Spreadsheets

Erwig implemented ClassSheets as a standalone language.

A new processor (for ClassSheets) was developed from scratch:

Embedding ClassSheets in Spreadsheets

From a ClassSheet it produces an initial Spreadsheet with the model embedded to guide users intorducing correct data.

Embedding ClassSheets in Spreadsheets

Embedding DSLs in general purpose programming languages is a recurring strategysystems inherit all the power of the host language

implementation effort is much reduced

We will present the embedding of the ClassSheet (DSL) model in traditional spreadsheet systems

Embedding ClassSheets in Spreadsheets

Embedding ClassSheets in Spreadsheets

Embedding and Evolution of Spreadsheet Models in Spreadsheet Systems, Jcome Cunha, Jorge Mendes, Joo P. Fernandes, Joo Saraiva, VL/HCC '11

Powerful interactive interface Single environment for spreadsheet evolution Model-instance synchronization

Syntactic restrictions

Vertically Expandable Tables

Horizontally Expandable Tables

Relationship Tables

III.Model-driven Spreadsheets

ClassSheet driven Spreadsheet Environment

Model-driven Spreadsheets

Type-safe Evolution of Spreadsheets, Jcome Cunha, Joost Visser, Tiago Alves, Joo Saraiva, In proceedings of the Fundamental Approaches to Software Engineering - FASE 2011.

MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jcome Cunha, Joo Paulo Fernandes, Jorge Mendes and Joo Saraiva.34th Internactional Conference on Software Engineering (ICSE 2012).

MDSheet Tool Demo Video

http://www.youtube.com/watch?feature=player_detailpage&v=6LNdTdCpV2U

Still...

Perante os deputados, Souto Moura relembrou o dia em que o jornal 24 Horas divulgou a existncia de uma listagem de chamadas de titulares de altos cargos de Estado, constantes de disquetes anexas ao processo Casa Pia . 'Foi um dia que nunca esquecerei, foi terrvel, dramtico', sustentou o agora juiz-conselheiro do Supremo Tribunal de Justia. Nesse dia (13 de janeiro de 2006), diz ter chamado PGR osprocuradores Joo Guerra e as procuradoras adjuntas Paula Ferraz e Cristina Faleiro. Vistas as disquetes - com o programa Excel, quecontinha um filtro informtico que ocultava parte da informao -, a concluso foi que no haveria nada naquele suporte para alm das listagens de Paulo Pedroso. 'Aqui s h nmeros do dr. Paulo Pedroso' foi a concluso do visionamento. tickets.

Dirio de Notcias, 10 February 2007

This Tutorial: Plan

Part II

Model Evolution

Bidirectional Evolution

Empirical Study

Still...

http://www.eusprig.org/horror-stories.htm

Title: Report identifies lack of spreadsheet controls, pressure to approve

Source:http://files.shareholder.com/downloads/ONE/2261602328x0x628656/4cb574a0-0bf5-4728-9582-625e4519b5ab/Task_Force_Report.pdf

Organization:JP Morgan

Region:EU

Release Date:18 January 2013

Risk:Lowering estimate of VaR in Basel II models

Tags:Financial

Spreadsheet Causes:Logic not reviewed, manual copy/paste operations

I. Model Evolution

FASE 2011. Type-safe Evolution of Spreadsheets, Jcome Cunha, Joost Visser, Tiago Alves, Joo SaraivaVL/HCC 2011. Embedding and Evolution of of Spreadsheet Models in Spreadsheet Systems, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Joo Saraiva

This generated spreadsheet guides users in introducing correct dataThe spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update

Suppose now you need to add new information to the spreadsheet

For instance, the number of passengers of each flight

It would require to do several error-prone tasks

Add columns, labels, update formulas, etc.

We can do it automatically!

Why do Spreadsheets Need Evolution?

Intended to be used by trained personProfessional on Spreadsheet models/ClassSheetsNot by instance/data end users

Data Refinements

Evolution Steps

Combinators: defined as helper steps

Semantic: steps that add information to the model

Layout: steps that do not add information to the model, just change its arrangement

Combinator Steps

Pull Up All ReferencesAll references must be at the top level

For instance AB becomes (AB)

Apply After and friends

Semantic Steps

Insert a column

Insert a block

Make it expandable

Split

Layout Steps

Change orientation

Normalize blocks

Shift

Move blocks

Haskell Implementation

expandBlock :: String RuleexpandBlock str (label : clas) | compLabel label str = do let rep = Rep {to = id tolist, from = id head } return (View rep (label : (clas)))

TOOL DEMO

II. Bidirectional Evolution

ICMT 2011. Bidirectional Transformation of Model-Driven Spreadsheets, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Hugo Pacheco, and Joo SaraivaICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Joo Saraiva

Some evolution steps are easier to perform on the instance

For instance, to add a column to one of the repetition blocks

People felt the need to evolve the data

Why do Spreadsheets Need Evolution, Again?

This generated spreadsheet guides users in introducing correct dataThe spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update

Bidirectional Transformation System

Example of a Transformation we Want:
Add a New Column

(Data) Operations on Instances

(Model) Operations on ClassSheets

Bidirectional Transformation Functions

Baseado em haskell

Integrado no OO

Clicanca-se em botoes

Espetacular

Basic

Composable
Example: Add a Column and a Class

TOOL DEMO

Available at http://ssaapp.di.uminho.pt

Built out of 7886 LOC:3179 in Haskell, for the evolution and inference

980 in Basic, for the embedding

2665 in C++, for gluing all components

340 in Perl, for compilation and setup

722, for makefiles

MDSheet

ICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Joo Saraiva

III. Empirical Studies

17 student from a MSc course

2 different spreadsheetsMicrosoft budget

Local company responsible for water supply of Braga, Portugal - agere


Study Settings

Hypotheses:(1) In order to perform a given set of tasks, users spend less time when using model- driven spreadsheets instead of plain ones.

(2) Spreadsheets developed in the model-driven environment hold less errors than plain ones.


Study Setting

Main Results

Number of tasks performed on the MS spreadsheet

Main Results

Error rate in the budget spreadsheet

Empirical Study with YOU

Inquiry Results!

Inquiry Results!

Inquiry Results!

Inquiry Results!

Inquiry Results!

IV. Conclusion

MOST
IMPORTANT
ANOUNCEMENT
OF
THIS
TUTORIAL
...

Acknowledgments

This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundao para a Cincia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-010048.