Summer School DSL 2013 - SpreadSheet Engineering
-
Upload
jacome-cunha -
Category
Technology
-
view
2.315 -
download
1
Transcript of Summer School DSL 2013 - SpreadSheet Engineering
Jcome Cunha1,2, Joo P.Fernandes1,3, Joo Saraiva1
1 HASLab / INESC TEC & Universidade do Minho2 ESTGF, Instituto Politcnico do Porto3 (rel)ease, Univ. da Beira Interior Portugal
DSL 2013
Spreadsheet Engineering
This Tutorial
Domain Specific Languages
Visual Modeling Domain Specific Languages
Embedding Domain Specific Language
Model-Driven Engineering
Software Evolution
Bidirectional Software Evolution
Empirical Studies
This Tutorial
All defined in the context of Spreadsheets
And, implemented in the Functional Programing Language Haskell!
This Tutorial: Plan
Part I
Motivation
Spreadsheet AnalysisData Mining Techniques
Models for Spreadsheets ClassSheets Embedded ClassSheet Models
Model-driven Spreadsheets
Image taken from http://www.flickr.com/photos/cosmosfan/2414002070/
Spreadsheets are widely used
Spreadsheets are widely used
Why do Spreadsheets matter?
Probably the Biggest Programming Language!
Probably the Biggest Functional Programming Language!
Probably the Biggest Domain Specific Language!
Probably the Biggest software system!
Probably the biggest database system!
Spreadsheets are great!
Why do Spreadsheets matter?
Financial intelligence firm CODA reports that 95% of all U.S. Firms use spreadsheets for financial reporting
Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008
Why do Spreadsheets matter?
In 2004, RevenueRecognition.com (now Softtrax) had the International Data Corporation interview 118 business leaders.
IDC found that 85% were using spreadsheets in financial reporting and forecasting.
Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008
Why do Spreadsheets matter?
50% of all spreadsheets are the basis for decisions.
Supporting professional spreadsheet users by generating leveled dataflow diagrams, Felienne Hermans, Martin Pinzger and Arie van Deursen, 2011
Why do Spreadsheets matter?
They are the programming language of choice by non-professional programmers, a.k.a. end-users In the U.S. alone, the number of end-user programmers is conservatively estimated at 11 million, compared to only 2.75 million other, professional programmers
Estimating the numbers of end users and end user programmers, Christopher Scafdi, Mary Shaw, and Brad Myers, 2005
Why do Spreadsheets matter?
Why are they so popular?
Which characteristics make them so successful?
First empirical study: fill in the inquiry!
But, as a programming language...
Exercise: Write a program that sums a list of integer values. sum :: [Int] -> Int sum [] = 0 sum (h:t) = h + sum t
In fact, spreadsheets lack:
Abstraction
Encapsulation
Type system
Testing
IDE
...
And the consequences may be...
http://www.eusprig.org/stories.htm
Economy losses of $10 billion/year!
I. Spreadsheet Analysis
Spreadsheet: An Example
A flight scheduling spreadsheet.
Functional Dependencies
Informally, a functional dependency between a column A and another column B means that the values in column A determine the values in column B, that is, there are no two rows in the spreadsheet that have the same value in column A but differ in their values in column B.
Functional Dependencies
Functional Dependencies
Any more Functional Dependencies?
Functional Dependencies
Usually, too many!
Functional Dependencies
We compute the business logic from the data, by inferring Fds.
They are the building blocks inferring models for (legacy) spreadsheets.
The better are the FDs we infer, the better is the model we compute!
Functional Dependencies
We use a data mining algorithm that produces to many accidental Fds!
We introduce some spreadsheet specific heuristics to filter out accidental FDs
Functional Dependencies
Label semantics: often keys are labeled code or id
Label arrangement: we prefer FDs respecting the order of columns
Antecedent size: small keys are preferable
Ratio: small ratio between keys and non-keys
Single value columns: columns always with the same value appear in too many FDs
Functional Dependencies
Functional dependencies after prunning:
Relational Model
Having computed the FDs, we can now use the FUN algorithm to produce a relational model for the spreadsheet:
Relational Model
Discovery-based Edit Assistance for Spreadsheets, Jcome Cunha, Joo Saraiva, and Joost Visser. In proceedings of 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2009).
From Spreadsheets to Relational Databases and Back, Jcome Cunha, Joo Saraiva, and Joost Visser. In proceedings of the 2009 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM 2009).
Spreadsheets with embedded
Relational Model
Spreadsheets with embedded
Relational Model
Spreadsheet Analysis
Spreadsheet Querying: Rui Pereira (talk at the students workshop)
Spreadhseet Smells: Pedro Martins: talk at the students workshop (not related to spreadsheets!).
Spreadsheet Querying
Spreadsheet Querying
We need model-driven queries!
Spreadsheet Smells
We have implemented a full catalog of smells for spreadsheets:
empty cell, pattern finder, reference to empty cells, multiple operations, multiple references, conditional complexity, long calculation chain, duplicated formulas, innapropriate intimacy, etc
Still...
Around 200 people who thought their only experience of the London 2012Olympic Games would be minor heats of synchronised swimming havereceived an unexpected upgrade to the mens 100m final following anembarrassing ticketing mistake....Locog said the error occurred in the summer, between the first andsecond round of ticket sales, when a member of staff made a singlekeystroke mistake and entered 20,000 into a spreadsheet rather thanthe correct figure of 10,000 remaining tickets.
The Telegraph, 04 January 2012
II. Models inSpreadsheet
The Contribution of Models
ClassSheets - Models in Spreadsheets
ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications, Gregor Engels, Martin Erwig, ASE'05
ClassSheets are a high-level, object-oriented formalism to specify the business logic of spreadsheets
ClassSheets - Models in Spreadsheets
ClassSheets - Models in Spreadsheets
ClassSheets - Models in Spreadsheets
ClassSheets - Models in Spreadsheets
I. ClassSheet Model Inference
I. ClassSheet Model Inference
Automatically Inferring ClassSheet Models from Spreadsheets, Jcome Cunha, Martin Erwig, Joo Saraiva, VL/HCC'10
Data mining techniquesDatabase normalization theory
I. ClassSheet Model Inference
Still...
Harvard University economists Carmen Reinhart and Kenneth Rogoff have acknowledged making a spreadsheet calculation mistake in a 2010research paper, Growth in a Time of Debt, which has been widely cited to justify budget-cutting.
Business Week, 18 April 2013
In a 2010 paper* Carmen Reinhart, now a professor at Harvard Kennedy School, and Kenneth Rogoff, an economist at Harvard University...argued that GDP growth slows to a snails pace once government-debt levels exceed 90\% of GDP. The 90% figure quickly became ammunition in political arguments over austerity...This week a new piece of research poured fuel on the fire by calling the 90% finding into question..
The Economist, 17 April 2013
II. Embedding ClassSheets in Spreadsheets
Erwig implemented ClassSheets as a standalone language.
A new processor (for ClassSheets) was developed from scratch:
Embedding ClassSheets in Spreadsheets
From a ClassSheet it produces an initial Spreadsheet with the model embedded to guide users intorducing correct data.
Embedding ClassSheets in Spreadsheets
Embedding DSLs in general purpose programming languages is a recurring strategysystems inherit all the power of the host language
implementation effort is much reduced
We will present the embedding of the ClassSheet (DSL) model in traditional spreadsheet systems
Embedding ClassSheets in Spreadsheets
Embedding ClassSheets in Spreadsheets
Embedding and Evolution of Spreadsheet Models in Spreadsheet Systems, Jcome Cunha, Jorge Mendes, Joo P. Fernandes, Joo Saraiva, VL/HCC '11
Powerful interactive interface Single environment for spreadsheet evolution Model-instance synchronization
Syntactic restrictions
Vertically Expandable Tables
Horizontally Expandable Tables
Relationship Tables
III.Model-driven Spreadsheets
ClassSheet driven Spreadsheet Environment
Model-driven Spreadsheets
Type-safe Evolution of Spreadsheets, Jcome Cunha, Joost Visser, Tiago Alves, Joo Saraiva, In proceedings of the Fundamental Approaches to Software Engineering - FASE 2011.
MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jcome Cunha, Joo Paulo Fernandes, Jorge Mendes and Joo Saraiva.34th Internactional Conference on Software Engineering (ICSE 2012).
MDSheet Tool Demo Video
http://www.youtube.com/watch?feature=player_detailpage&v=6LNdTdCpV2U
Still...
Perante os deputados, Souto Moura relembrou o dia em que o jornal 24 Horas divulgou a existncia de uma listagem de chamadas de titulares de altos cargos de Estado, constantes de disquetes anexas ao processo Casa Pia . 'Foi um dia que nunca esquecerei, foi terrvel, dramtico', sustentou o agora juiz-conselheiro do Supremo Tribunal de Justia. Nesse dia (13 de janeiro de 2006), diz ter chamado PGR osprocuradores Joo Guerra e as procuradoras adjuntas Paula Ferraz e Cristina Faleiro. Vistas as disquetes - com o programa Excel, quecontinha um filtro informtico que ocultava parte da informao -, a concluso foi que no haveria nada naquele suporte para alm das listagens de Paulo Pedroso. 'Aqui s h nmeros do dr. Paulo Pedroso' foi a concluso do visionamento. tickets.
Dirio de Notcias, 10 February 2007
This Tutorial: Plan
Part II
Model Evolution
Bidirectional Evolution
Empirical Study
Still...
http://www.eusprig.org/horror-stories.htm
Title: Report identifies lack of spreadsheet controls, pressure to approve
Source:http://files.shareholder.com/downloads/ONE/2261602328x0x628656/4cb574a0-0bf5-4728-9582-625e4519b5ab/Task_Force_Report.pdf
Organization:JP Morgan
Region:EU
Release Date:18 January 2013
Risk:Lowering estimate of VaR in Basel II models
Tags:Financial
Spreadsheet Causes:Logic not reviewed, manual copy/paste operations
I. Model Evolution
FASE 2011. Type-safe Evolution of Spreadsheets, Jcome Cunha, Joost Visser, Tiago Alves, Joo SaraivaVL/HCC 2011. Embedding and Evolution of of Spreadsheet Models in Spreadsheet Systems, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Joo Saraiva
This generated spreadsheet guides users in introducing correct dataThe spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
Suppose now you need to add new information to the spreadsheet
For instance, the number of passengers of each flight
It would require to do several error-prone tasks
Add columns, labels, update formulas, etc.
We can do it automatically!
Why do Spreadsheets Need Evolution?
Intended to be used by trained personProfessional on Spreadsheet models/ClassSheetsNot by instance/data end users
Data Refinements
Evolution Steps
Combinators: defined as helper steps
Semantic: steps that add information to the model
Layout: steps that do not add information to the model, just change its arrangement
Combinator Steps
Pull Up All ReferencesAll references must be at the top level
For instance AB becomes (AB)
Apply After and friends
Semantic Steps
Insert a column
Insert a block
Make it expandable
Split
Layout Steps
Change orientation
Normalize blocks
Shift
Move blocks
Haskell Implementation
expandBlock :: String RuleexpandBlock str (label : clas) | compLabel label str = do let rep = Rep {to = id tolist, from = id head } return (View rep (label : (clas)))
TOOL DEMO
II. Bidirectional Evolution
ICMT 2011. Bidirectional Transformation of Model-Driven Spreadsheets, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Hugo Pacheco, and Joo SaraivaICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Joo Saraiva
Some evolution steps are easier to perform on the instance
For instance, to add a column to one of the repetition blocks
People felt the need to evolve the data
Why do Spreadsheets Need Evolution, Again?
This generated spreadsheet guides users in introducing correct dataThe spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
Bidirectional Transformation System
Example of a Transformation we Want:
Add a New Column
(Data) Operations on Instances
(Model) Operations on ClassSheets
Bidirectional Transformation Functions
Baseado em haskell
Integrado no OO
Clicanca-se em botoes
Espetacular
Basic
Composable
Example: Add a Column and a Class
TOOL DEMO
Available at http://ssaapp.di.uminho.pt
Built out of 7886 LOC:3179 in Haskell, for the evolution and inference
980 in Basic, for the embedding
2665 in C++, for gluing all components
340 in Perl, for compilation and setup
722, for makefiles
MDSheet
ICSE 2012. MDSheet: A Framework for Model-driven Spreadsheet Engineering, Jcome Cunha, Joo P. Fernandes, Jorge Mendes, Joo Saraiva
III. Empirical Studies
17 student from a MSc course
2 different spreadsheetsMicrosoft budget
Local company responsible for water supply of Braga, Portugal - agere
Study Settings
Hypotheses:(1) In order to perform a given set of tasks, users spend less time when using model- driven spreadsheets instead of plain ones.
(2) Spreadsheets developed in the model-driven environment hold less errors than plain ones.
Study Setting
Main Results
Number of tasks performed on the MS spreadsheet
Main Results
Error rate in the budget spreadsheet
Empirical Study with YOU
Inquiry Results!
Inquiry Results!
Inquiry Results!
Inquiry Results!
Inquiry Results!
IV. Conclusion
MOST
IMPORTANT
ANOUNCEMENT
OF
THIS
TUTORIAL
...
Acknowledgments
This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundao para a Cincia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-010048.