1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of...

50
1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working Conference on Reverse Engineering

Transcript of 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of...

Page 1: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

1

Grokking Software Architecture

Richard C. HoltSoftware Architecture Group (SWAG)

School of Computer Science, University of Waterloo, Canada

2008 Working Conference on Reverse Engineering

Page 2: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

2

Retrospective1998 2008

Ten years ago. WCRE most

influential paper. “Structural

Manipulations of Software

Architecture using Tarski Relational

Algebra”

Today. Retrospective.

“Grokking Software

Architecture”

17 papers in WCRE

Page 3: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

3

Grokking Software Architecture

Grokking

Software architecture

Page 4: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

4

Overview of Talk: 4 Parts

• Part 1. 1998 paper: Hopes & claims

• Part 2. Software Architecture

• Part 3. Formalizing Boxology

• Part 4. ROP: Relation-Oriented Programming & Grok-Like Languages

Page 5: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

5

Part 1. 1998 paper: Hopes & claims

• Represent software architecture as a typed graph– Graphs with “colors” of edges & nodes

• Manipulate & visualize these architectural graphs• Manipulations can be specified algebraically

--- and automatically executed

In brief: Formalize architectural diagrams and reap the benefits arising from the corresponding mathematics.

Page 6: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

6

Top View of As-Built Software Architecture (250KLOC System)

Page 7: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

7

View of One Subsystem of the 250 KLOC System

ds

dsinit

mrgs

dslvbb

mdlv

include

dslvrg dselim

lvlist

memuse dbg

Optimiz

PL_ GEN VN SUPPORT FLOW

DS.ss

Page 8: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

8

CS 746G Topics in Software Architecture

University of Waterloo

1) CS746 in Winter 1998 Linux (Operating System) 2) CS746 in Winter 1999 Apache (Web Server) 3) CS746 in Winter 2000 Mozilla (Web Browser) 4) CS746 in Winter 2001 Eazel Nautilus (File Manager) 5) CS798 in Winter 2002 Postgres et al (Data Base) 6) CS746 in Winter 2003 EMACS et al (Editor) 7) CS746 in Winter 2004 Gnumeric (Spreadsheet) 8) CS746 in Fall 2004 Mozilla (Web Browser -- again) 9) CS746 in Fall 2005 Open Office (Open Source Office Suite)

10)CS746 in Fall 2006 Asterisk (Open Phone Switch) 11)CS746 in Fall 2008 MySQL

Page 9: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

9

Process of View Creation

Parser

Grok:Fact manipulator

Layouter Browser

Clustering

Source code

Facts extractedfrom code

Hierarchicdecomposition

Architecturaldiagram

Page 10: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

10

Transformations to do Hiding

a

b

cd

ef

g h

T

VS

b

aT

V

Graph G

Graph H = hide(hide(G,T),V)

d

ef

Graph I = hideExt(G, S)

Page 11: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

11

Lifting Calls Up to File Level

call is a procedure callfileCall is a file level call

fileCall := funcDef o call o inv funcDcl

main.c

startup

start.h

main call

funcDef funcDcl

Procedure body Procedure header

File FilefileCall

Page 12: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

12

Part 2. Software Architecture: Boxology Approach

• Software architecture: – What is it?– State of practice– How is it represented– Keep It simple– Models & tools– Views of architecture

• Extracting As-Built architecture

Page 13: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

13

Software Architecture:What is it?

• Confusion. I have a sneaking suspicion that ‘architecture’ is one of the most overused and least understood terms in professional software development circles. Gorton

• Consensus. Architecture captures system structure in terms of components [parts] and how they interact. Gorton

Page 14: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

14

Software Architecture: State of the Practice

• “It’s common for there to be little or no documentation covering the architecture in many projects.” Gorton

• “I'm hopeless when it comes to documentation.” Torvalds

• “The architecture that actually predominates in practice is the ‘big ball of mud’ ” Foote et al

Page 15: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

15

Software as Spaghetti

Foote et al

Page 16: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

16

Software Architecture: How is it Represented in Practice?

• …predominant tools used for architecture documentation are Microsoft Word, Visio and Power Point Gorton

• What’s needed: Concepts, notations and tools that are – easy to use and– help us produce useful, understandable

documentation

Page 17: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

17

KISS: Keep it Simple Stupid

“Any fool can make things bigger, more complex, and more violent. It takes a touch of genius - and a lot of courage - to move in the opposite direction.” Einstein

“Make everything as simple as possible, but not simpler.” Einstein

Page 18: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

18

Models and Tools for Software Architecture

• “UML has, for better or (many would say) worse, become the industry standard ADL [Architecture Design Language]” Shaw

• UML “lacks, however, a robust suite of tools for analysis, consistency checking” Shaw

Page 19: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

19

UML Component Diagram: Box and Arrow Diagram

id Component View

OrderProcessing

MailQueue

SendEmail

MailServer

OrderSystem

CustomerSystem OrderQueue

«table»

NewOrders

1validate

1

readQ

1writeQ

1

read

1send

1

1readQ

1

1

writeQ

1

Gorton

Page 20: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

20

As-Built View

Views of Software Architecture Kruchten

Users’ View

DeploymentView

ConcurrencyView

End user

System EngineerIntegrator

Programmers& software managers

Scenarios

Page 21: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

21

Extracting the As-Built Architecture from the Code

• “Reverse engineering is the process of analyzing a subject system to create representations of the system at a higher level of abstraction.” Chikofsky

• Relational approach. – Parse the code to produce relations, e.g

• (call, P, Q) means proc P calls Q

– Manipulate edges into as-built architecture

Page 22: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

22

Boxology as a Central ADL (Architectural Design Language)

• “The most widely used design notation [for software architecture] is informal ‘block and arrow’ diagrams.” Gorton

Page 23: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

23

Cross Fertilization!! Rev Eng, S/W Arch, Relational Approach

• Reverse engineering – Architecture extraction– As-Built view: Code is king– Traceability

• Software architecture– Need for representation & tools – Simplicity & utility

• Relational approach– Boxology– Formalization --- Tarski algebra

Page 24: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

24

Part 3. Formalizing Boxology

• Boxology is the “Representation of an organized structure as a graph of labeled nodes (‘boxes’) and connections between them (as lines or arrows).” Wikipedia

• “Toward boxology: preliminary classification of architectural styles” Shaw

Page 25: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

25

Example Typed Graphr

a b

CC

v w x y z

C C C E C C

I

U U

v

w

x y

za b

r

UU

I

E

C = { (r,a), (r,b), (a,v), (a,w) (a,x), (b,y), (b,z) }I = { (a,b) }E = { (b,y) }U = { (v,w), (x,y) }

Page 26: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

26

Boxology is Just Scribbling?

• Box & arrow diagrams – Are just scribbles? No– Formalized by typed graphs– Visualized as (nested) boxes & arrows– Manipulated by Tarski algebra etc.– Exchanged as

• Triples (RSF), extended to TA, or GXL or …

Page 27: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

27

Boxology has Semantics? Yes

• Compare to BNF– Semantics by informal attachment to productions

• Compare to Codd’s relational approach– Semantics by interpretation of tables.

• Semantics by attributes & descriptions– Separation of concerns – Structure then semantics

• Use box/arrow diagrams as underlying formalism for software architecture (Mini-MOF?)

Page 28: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

28

Adding Algebra to Boxology

• Tables then Codd relational algebra– N-ary relations

• Boxes/arrows then Tarski relational algebra– Binary relations

Page 29: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

29

Example Typed Graphr

a b

CC

v w x y z

C C C E C C

I

U U

v

w

x y

za b

r

UU

I

E

C = { (r,a), (r,b), (a,v), (a,w) (a,x), (b,y), (b,z) }I = { (a,b) }E = { (b,y) }U = { (v,w), (x,y) }

Page 30: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

30

Tarski Algebraic Operators

Union I + E = {(a,b), (b,y)}Intersection E ^ C = {(b,y)}Difference C - E = {(r,a), (r,b), (a,v), (a,w), (a,x), (b,z)}Inverse inv E = {(y,b)}Composition I o E = {(a,y)}Identity id = {(r,r), (a, a), (b,b), (w,w) … }Transitive Cl. C+ = {(r,a), (r, b), (r,v), (r,w), (r,x), (r,y),

(r,z), (a,v), (a,w), (a,x), (b,y), (b,z)}Reflex. T.C. C* = ID + C+

Page 31: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

31

• A Schema in TA– Determines

• Types of boxes

• Types of edges

• Allowed connectivity between edges

• Supports inheritance in schemas

– Also attributes (strings) on boxes & on edges

call

TA Schemas for Box and Arrow Diagrams

instance

proc var

p q x y

call

instanceinstance

instance

ref

ref

Malton WCRE 2005

Page 32: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

32

Why Formalize Boxology??Cause it Makes Our Life Better

• Clear understanding & clear specification– What does RSF meaning?– Meaning is independent of implementation– Clarifies deeper concepts, e.g., expressiveness

• Generality• Progress in reverse engineering• Progress in software architecture• Not just scribbling

Page 33: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

33

Part 4. ROP: Relation-Oriented Programming &

Grok-Like Languages

• A paradigm shift

Page 34: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

34

Example: Mickey Eats Swiss Cheese• Mickey . eat

– Swiss– Roquefort

• eat . Mickey– Garfield– Fluffy

• eat o eat– (Garfield Swiss)– (Garfield Roquefort)– (Fluffy Swiss)– (Fluffy Roquefort)

• eat+– ,,,

Garfield Fluffy

NancyMickey

RoquefortSwiss

The “eat” relation

Page 35: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

35

Example ROP/Grok Program:Is relation R a tree?

How you would program this test …

Page 36: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

36

Grok Program: Is R a Tree?

if R has no loops &

R has one root &

R has only single parents then

put “R is a tree”

Pseudo code

Assume each node is a source or target of the contain C relation

Page 37: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

37

Grok Program: Is R a Tree?

if R has no loops

Pseudo code Grok code

if # ( R+ ^ ID ) = 0

a b c dR

R

R R

Does transitive closure of R have any self-loops? Yes

Page 38: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

38

Grok Program: Is R a Tree?

if R has no loops &

R has one root

Pseudo code Grok code

if # ( R+ ^ ID ) = 0 &

# (dom R - rng R) = 1

a

b c

d ge f

dom

rng

Does R have exactly one source? Yes

Page 39: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

39

Grok Program: Is R a Tree?

if R has no loops &

R has one root &

R has only single parents

Pseudo code Grok code

if # ( R+ ^ ID ) = 0 &

# (dom R - rng R) = 1 &

# ((R o inv R) - ID) != 0

b

c

d

a

Rinv R

R o inv R

Does my child have another parent? Yes

Page 40: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

40

Grok Program: Is R a Tree?

if R has no loops &

R has one root &

R has only single parents then

put “R is a tree”

Pseudo code Grok code

if # ( R+ ^ ID ) = 0 &

# (dom R - rng R) = 1 &

# ((R o inv R) - ID) != 0

then

put “R is a tree”

Moral: Relational progamming is not like low level (Java level) programming. Loops typically disappear.

Page 41: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

41

Notation: Does it Matter?

By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental power of the race. Alfred North Whitehead

Page 42: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

42

Wins & Losses Using Tarski Algebra

• Wins– Good for computing new edges, for finding

properties of edges, eg, nodes in loops, leaves, etc.

• Losses– Not good for locating patterns involving several

nodes, e.g., find complete connected sub-graphs

Page 43: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

43

Notation: Grok (Tarski) vs. Crocopat

S := P o C S(x,z) := EX(y, P(x,y) & C(y,z))

y

zx

My parent’s (P) children (C) are my (reflexive) siblings (S)

Grok Crocopat

P PC C

S S

Should Crocopat add Tarski operators??

Page 44: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

44

Characterizing Grok-Like Languages

• Relational• Useful for software analysis• Expressiveness

– How powerful can a query be?• Codd algebra and Crocopat are more powerful.

– How well can a query meet our needs? How writeable? How readable?

• Performance of implementation– Can hold large graphs?– Fast enough to manipulate large graphs?

Page 45: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

45

Performance of Grok-Like Languages

• Size & speed: OK for --- Grok & Crocopat

– All memory resident, no disk access

– Hundreds of thousands of edges

– Modeling million-line systems

– Most operations not more than a few seconds

– Crocopat scales up a bit more for transitive closure

– House keeping, e.g., time to read files, is critical

– Need to test on 64-bit implementations

Page 46: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

46

Data Structures for Binary Relations

• Tables: One for each type of relation DBMS

• Single table of triples Grok

• Linked lists– Pointers and nodes Lsedit, JGrok (caches sorted lists)

• BDD: Binary Decision Diagram Relview, Crocopat

– Memory efficient storage of binary relations– Works well with dense graphs– Proven useful RelView, Crocopat

– Surprising (to me): BDD efficient for transitive closure

Page 47: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

47

Grok-Like LanguagesLanguage Author Date

Prolog Colmerauer et al.

1972

SQL Chamberlin & Boyce

1974

GraphLog Consens et al.

1989

Relview Berghammer et al.

1993

Grok Holt 1996

RPA Feijs et al. 1998

GReQL Kullbach & Winter

1999

JGrok Wu 2001

CrocoPat Beyer 2003

PS: Paul Klint’s relational language ...

Discussion of

Grok-Like Languages

Page 48: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

48

Progress: Using Grok-Like Languages

1. Enforce architecture rules. Holt 96, Feijs 98, Knodel 08 2. Lift dependency edges. Holt 98, Feijs 1998 3. Find design pattern instances. Consens 98, Beyer 02 4. Find violations of patterns. Guo 99 5. Find anti-patterns. vanEmden 02, Feijs 98 6. Change impact analysis. Feijs 98 7. Specify extraction from syntax. Lin 08 8. Find source of dependency. Fahmy 01, Feijs 98 9. Locate uses of protocols. Wu 01 10. Type inference using transitive closure. vanDeursen 99

Page 49: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

49

Grokking Software Architecture

Conclusions

Page 50: 1 Grokking Software Architecture Richard C. Holt Software Architecture Group (SWAG) School of Computer Science, University of Waterloo, Canada 2008 Working.

50

Conclusions

• Typed graphs nicely formalize various software structures• Software architecture can benefit from a ROP approach • Tarski algebra, added to boxology, is elegant

– Does not handle multi-node patterns

• Grok-like (ROP) languages are elegant and sufficiently efficient– ROP is high level, is faster, more reliable, more flexible

• Lots of – Work done so far– Room for more work