reading & understanding code

92
reading & understanding code • experts are better at code comprehension because they focus on higher level patterns – patterns can be considered “discourse rules” – naming conventions, design patterns, schemas • experts work significantly better when reading & writing code according to these patterns 1

description

reading & understanding code. experts are better at code comprehension because they focus on higher level patterns patterns can be considered “discourse rules” naming conventions, design patterns, schemas experts work significantly better when reading & writing code according to these patterns. - PowerPoint PPT Presentation

Transcript of reading & understanding code

Page 1: reading & understanding code

1

reading & understanding code

• experts are better at code comprehension because they focus on higher level patterns– patterns can be considered “discourse rules”– naming conventions, design patterns, schemas

• experts work significantly better when reading & writing code according to these patterns

Page 2: reading & understanding code

2

reading & understanding code

program comprehensionexpertise effectsmental models

tools

Page 3: reading & understanding code

3

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 4: reading & understanding code

4

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 5: reading & understanding code

5

mental model

• explanation of a someone’s thought process when carrying out a task– our someone: programmers– our task: program comprehension

• several models exist

Page 6: reading & understanding code

6

mental model classes

• bottom-up– read code statement by statement then ascend for

a higher-level picture• top-down– start with a high-level picture of what the code is

doing then descend into code• mixed– incorporate elements from both, based on the

situation

Page 7: reading & understanding code

7

mental model classes

• bottom-up– read code statement by statement then ascend for

a higher-level picture• top-down– start with a high-level picture of what the code is

doing then descend into code• mixed– incorporate elements from both, based on the

situation

Page 8: reading & understanding code

8

bottom-up mental models

• 1st: read code statements• 2nd: chunking: group statements as abstractions• 3rd: repeat

Page 9: reading & understanding code

9

chunkingsequence

chunk 1 chunk 2 chunk n

element 1 element 2 element k

modified from wikipedia

Page 10: reading & understanding code

10

chunking

• program model– reasoning about the order of computation, how

control moves throughout a program– “control flow”

• situation model– reason about how data moves through atomic

models– “data flow”

N. PenningtonStimulus Structures and Mental Representations in Expert Comprehension of Computer ProgramsCognitive Psychology, 1987

Page 11: reading & understanding code

11

program & situation model studies

• participants first primed for either control flow or data flow– shown a piece of code, asked to recall another piece of

code which is related through either control flow or data flow

• participants then asked a question that relates to either control or data flow

• participants primed to think about control flow answered other control-flow questions faster, same with data flowN. Pennington

Stimulus Structures and Mental Representations in Expert Comprehension of Computer ProgramsCognitive Psychology, 1987

Page 12: reading & understanding code

12

types of programmer knowledge

• semantic: general programming concepts– low-level knowledge, e.g. what a=1 means– high-level knowledge, e.g. sorting algorithms

• syntactic: language detail– overlaps between languages

• stylistic: programming conventions– “discourse rules”

B. Shneiderman and R. MayerSyntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental ResultsJournal of Computer & Information Sciences, 1979

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 13: reading & understanding code

13

problem statement programshort term

memoryinternal semantics (working memory)

knowledge (long term memory)

syntactic knowledge

COBOL

FORTRANPL/I

LISP

semantic knowledge

high level concepts

low level concepts

B. Shneiderman and R. MayerSyntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental ResultsJournal of Computer & Information Sciences, 1979

high level concepts

low level concepts

Page 14: reading & understanding code

14

evidence forsemantic & syntactic knowledge

• lab studies using FORTRAN– participants: programmers and non-programmers– asked to perform tasks that used one type of

knowledge– six studies (will describe two)

B. Shneiderman and R. MayerSyntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental ResultsJournal of Computer & Information Sciences, 1979

Page 15: reading & understanding code

program memorization• study

– two subject types: non-programmers & programmers– two program versions: normal & shuffled– participants asked to memorize a program

• results– non-programmers performed equally poorly with normal & shuffled

programs– programmers performed poorly with shuffled program, well with normal

• were able to remember semantic details with syntactic variations

• conclusion– programmers were not memorizing the program, but internal semantics

to represent its functionB. Shneiderman and R. MayerSyntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental ResultsJournal of Computer & Information Sciences, 1979

Page 16: reading & understanding code

16

commenting• study

– two program versions• 5-line high-level block comment at top• numerous interspersed low-level comments

– participants asked to make modifications to program & memorize program• result

– high-level comment participants performed better– strong correlation between ability to make modifications and ability to

memorize• conclusion

– memorization is a strong correlate to comprehension– hierarchical chunking to organize statements into a unit facilitate

comprehension processB. Shneiderman and R. MayerSyntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental ResultsJournal of Computer & Information Sciences, 1979

Page 17: reading & understanding code

17

mental model classes

• bottom-up– read code statement by statement then ascend for

a higher-level picture• top-down– start with a high-level picture of what the code is

doing then descend into code• mixed– incorporate elements from both, based on the

situation

Page 18: reading & understanding code

18

mental model classes

• bottom-up– read code statement by statement then ascend for

a higher-level picture• top-down– start with a high-level picture of what the code is

doing then descend into code• mixed– incorporate elements from both, based on the

situation

Page 19: reading & understanding code

19

top-down models

• 1st: develop hypotheses about the program• 2nd: evaluate and refine hypotheses– with the help of beacons

• 3rd: repeat

• a process of “reconstructing knowledge”

Page 20: reading & understanding code

beacons

• “indexes into existing knowledge”• recognizable features in that are cues to the

presence of certain structures• e.g., looking for a listener pattern

M. StoreyTheories, Methods, and Tools in Program Comprehension: Past, Present, and FutureIEEE Workshop on Program Comprehension, 2005

R. BrooksTowards a theory of the comprehension of computer programsInternational J. on Man-Machine Studies, 1981

Page 21: reading & understanding code

21

beacon types

• semantic knowledge “plans”– reusable generic program fragments– high-level or low-level

• programming discourse conventions– “rules” that make program comprehension easier– found across programmers

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 22: reading & understanding code

22

brooks’ model

R. BrooksTowards a theory of the comprehension of computer programsInternational J. on Man-Machine Studies, 1981

modified from Jonathan I. Maletic’s slides:An Overview of Mental Models for Program Understanding

requirement documentation

internal representation –hypotheses and subgoals

design documentprogram

code

verify internal schema vs external representation

external representation

beaconsbeaconsbeacons

syntactic knowledge

problem

semantic knowledge

match

Page 23: reading & understanding code

23

mental model classes

• bottom-up– read code statement by statement then ascend for

a higher-level picture• top-down– start with a high-level picture of what the code is

doing then descend into code• mixed– incorporate elements from both, based on the

situation

Page 24: reading & understanding code

24

mental model classes

• bottom-up– read code statement by statement then ascend for

a higher-level picture• top-down– start with a high-level picture of what the code is

doing then descend into code• mixed– incorporate elements from both, based on the

situation

Page 25: reading & understanding code

25

opportunistic & systematic strategies

• programmers enhancing existing program• two strategies:– systematically read code in detail, tracing through

control and data flow manually• developed control and data flow knowledge

– focus only on code relevant to a task• developed only control flow knowledge, resulted in a

weaker understanding

Margaret-Anne StoreyTheories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 26: reading & understanding code

integrated model

• maintainers switch between top-down and bottom-up comprehension– top-down if code or code type is familiar– program model (control-flow) when code is

completely unfamiliar– situation model (data-flow) after a partial data-flow

understanding is developed through top-down or program model methods

– knowledge base: information from previous three modelsA. von Mayrhauser and A.M. Vans

From Program Comprehension to Tool Requirements for an Industrial EnvironmentIEEE Workshop on Program Comprehension, 1993

Margaret-Anne StoreyTheories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 27: reading & understanding code
Page 28: reading & understanding code

28

validating the integrated model

• taped professional maintenance programmers– worked with a large code base– classified as domain and language experts

• tape transcriptions classified into model types• one of few studies with real world tasks

Page 29: reading & understanding code

29

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 30: reading & understanding code

30

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 31: reading & understanding code

31

programming discourse rules

• specify the conventions of programming– e.g., a variable’s name should reflect its function– e.g., don’t include code that won’t be used

• similar to writing discourse rules, as outlined in books like Elements of Style– e.g., you expect to find the description for fig. 7

between those for fig. 6 and fig. 8

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 32: reading & understanding code

32

rules of programming discourse

1. variable names should reflect function2. don’t include code that won’t be used

a. if there is a test for a condition, then the condition must have the potential of being true

3. a variable that is initialized via an assignment statement should be updated via an assignment statement

4. don’t do double duty with code in a non-obvious way5. an if should be used when a statement body is

guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 33: reading & understanding code

33

testing discourse rules

• lab study with expert & novice programmers• two program types– α (plan-like): obeyed discourse rules– β (un-plan-like): disobeyed discourse rules

• participants given either α or β code, with one blank

• task: fill the blank with what seems “natural”– participants were not told about α or β code

• conclusion: experts fared best with α code

Page 34: reading & understanding code

34

why have un-plan-like (β) code?

• machine limitations– limited memory, processing, bandwidth, etc.

• language limitations– less common. bugs, efficiency issues, etc.

• programmer limitations– does not have full mastery of discourse

• historical traces– resistance to changing legacy code, permanent

“temporary” codesource:The Psychology ofComputer Programming

Page 35: reading & understanding code

35

XXX: PROCEDURE OPTIONS(MAIN);DECLARE B(1000) FIXED(7,2),C FIXED(11,2),(I, J) FIXED BINARY;C = 0;DO I = 1 TO 10;GET LIST((B(J) DO J = 1 TO 1000));DO J = 1 TO 1000;C = C + B(J);END;END;PUT LIST(‘RESULT IS ’, C);END XXX;

modified from The Psychology ofComputer Programming

Page 36: reading & understanding code

36

XXX: PROCEDURE OPTIONS(MAIN);DECLARE A(1000) FIXED(7,2),

C FIXED(11,2),I FIXED BINARY;

C = 0;GET LIST((A(J) DO I = 1 TO

10000));DO I = 1 TO 10000;

C = C + B(I);END;

PUT LIST(‘RESULT IS ’, C);END XXX;

modified fromThe Psychology ofComputer Programming

Page 37: reading & understanding code

37

rules of programming discourse

1. variable names should reflect function2. don’t include code that won’t be used

a. if there is a test for a condition, then the condition must have the potential of being true

3. a variable that is initialized via an assignment statement should be updated via an assignment statement

4. don’t do double duty with code in a non-obvious way5. an if should be used when a statement body is

guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 38: reading & understanding code

38

rules of programming discourse

1. variable names should reflect function2. don’t include code that won’t be used

a. if there is a test for a condition, then the condition must have the potential of being true

3. a variable that is initialized via an assignment statement should be updated via an assignment statement

4. don’t do double duty with code in a non-obvious way5. an if should be used when a statement body is

guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 39: reading & understanding code

39

naming conventions

• meaningful names– variable naming reflects cognitive structure

• grammatical sensibility– interact with language spec. to form expressions

• containers & paths– objects & pointers

• polysemy, homonymy, & overloading– operators, name sharing

B. Liblit, A. Begel, and E. SweetserCognitive Perspectives on the Role of Naming in Computer ProgramsPsychology of Programming Interest Group, 2006

Page 40: reading & understanding code

40

naming conventions

• meaningful names– variable naming reflects cognitive structure

• grammatical sensibility– interact with language spec. to form expressions

• containers & paths– objects & pointers

• polysemy, homonymy, & overloading– operators, name sharing

B. Liblit, A. Begel, and E. SweetserCognitive Perspectives on the Role of Naming in Computer ProgramsPsychology of Programming Interest Group, 2006

Page 41: reading & understanding code

41

meaningful names

• metaphors for domain tasks– e.g. pushing objects onto a stack

• keywords for grouping– e.g. common prefixes & suffixes

• informative names– balanced with name length

A. BlackwellMetaphor or analogy: how should we see programming abstractions?Psychology of Programming Interest Group, 1996

B. Liblit, A. Begel, and E. SweetserCognitive Perspectives on the Role of Naming inComputer ProgramsPsychology of Programming Interest Group, 2006

Page 42: reading & understanding code

42

name length

• length harm readability and recall ability• idioms and memory ties improve readability

and recall ability

• takeaway: variable names with consistent and abbreviated vocabulary are optimal– (variable names that concisely express a metaphor)

D. Binkley, D. Lawrie, S. Maex, and C. MorrellIdentifier length and limited programmer memoryScience of Computer Programming, 2009

Page 43: reading & understanding code

43

grammatical sensibility

• names as phrase fragments– methods as actions (change state of program)

• e.g. addElement, setSize, removeAll– methods as mathematical functions (compute result, don’t

alter state)• e.g. true/false: contains, equals, isEmpty• e.g. data: capacity, indexOf, size

• valence cues (phrase fragments w/ open slot)– e.g. roster.contains(player)– smalltalk makes use of this extensively:

• roster insert: player at: positionB. Liblit, A. Begel, and E. SweetserCognitive Perspectives on the Role of Naming in Computer ProgramsPsychology of Programming Interest Group, 2006

Page 44: reading & understanding code

44

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 45: reading & understanding code

45

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 46: reading & understanding code

46

20:1 programmer performance

• Sackman et al.: best programmers are 20x better than worst programmers @ bug fixing– study originally meant to evaluate the

effectiveness of time-shared systems

H. Sackman, W. J. Erikson, and E. E. GrantExploratory experimental studies comparing online and offline programming performanceCommunications of the ACM, 1968

Page 47: reading & understanding code

47

10:1 programmer performance

• there are substantial programmer efficiency differences, but not as dramatic as initially reported

• what makes experts so much better at understanding code?

Page 48: reading & understanding code

48

testing discourse rules

• lab study with expert & novice programmers• two program types– α (plan-like): obeyed discourse rules– β (un-plan-like): disobeyed discourse rules

• participants given either α or β code, with one blank

• task: fill the blank with what seems “natural”– participants were not told about α or β code

Page 49: reading & understanding code

49

α problemPROGRAM Magenta(input, output)VAR Max, I, Num INTEGERBEGIN

Max = 0.FOR I = 1 TO 10 DOBEGINREADLN(Num)If Num Max THEN Max = NumENDWRITELN(Max).

END

?

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 50: reading & understanding code

50

α solutionPROGRAM Magenta(input, output)VAR Max, I, Num INTEGERBEGIN

Max = 0.FOR I = 1 TO 10 DOBEGINREADLN(Num)If Num > Max THEN Max = NumENDWRITELN(Max).

ENDE. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 51: reading & understanding code

51

β problemPROGRAM Magenta(input, output)VAR Max, I, Num INTEGERBEGIN

Max = 999999.FOR I = 1 TO 10 DOBEGINREADLN(Num)If Num Max THEN Max = NumENDWRITELN(Max).

END

?

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 52: reading & understanding code

52

β solutionPROGRAM Magenta(input, output)VAR Max, I, Num INTEGERBEGIN

Max = 999999.FOR I = 1 TO 10 DOBEGINREADLN(Num)If Num < Max THEN Max = NumENDWRITELN(Max).

ENDE. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 53: reading & understanding code

53

percentage of correct responses

alpha

beta

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

advancednovice

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

Page 54: reading & understanding code

54

debugging differences between novices and experts

• experts: situation-dependent problem solvers

• novices: situation-independent problem solvers

I. VesseyExpertise in Debugging Computer Programs: An analysis of the Content of Verbal ProtocolsIEEE Trans on Systems, Man, Cybernetics, 1986

Page 55: reading & understanding code

55

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 56: reading & understanding code

56

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 57: reading & understanding code

57

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 58: reading & understanding code

58

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 59: reading & understanding code

59

browsing support

• traverse control and data flow paths• switching between top-down and bottom-up

models• breadth-first and depth-first

Page 60: reading & understanding code

60

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 61: reading & understanding code

61

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 62: reading & understanding code

62

searching

• search for code snippets– not just by text

• example: query the role of a variable, when a function is called

• useful for top-down hypothesis testing

Page 63: reading & understanding code

63

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 64: reading & understanding code

64

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 65: reading & understanding code

65

multiple views

• multiple ways of viewing programs– call graph– object hierarchy– etc.

• different views are applicable for different tasks

Page 66: reading & understanding code

66

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 67: reading & understanding code

67

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 68: reading & understanding code

68

context-driven views

• alter views based on program metrics– size of program– interdependence of modules– flatness of hierarchy– etc.

Page 69: reading & understanding code

69

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 70: reading & understanding code

70

tool implications

• browsing support– browse from high to low level and low to high level

• searching– looking for snippets by analogy

• multiple views– show orthogonal object relationships

• context-driven views– determine best view based on context

• additional cognitive support– external devices to support cognitive tasks neededMargaret-Anne Storey

Theories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

Page 71: reading & understanding code

71

additional cognitive support

• experts:– tools to support cognitive tasks• external devices• scratchpads

• novices– pedagogical support• programming language• task domain

Page 72: reading & understanding code

72

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 73: reading & understanding code

73

outline

• mental models– types– models

• conventions & “discourse rules”• expertise effects• tool implications• interesting tools

Page 74: reading & understanding code

structured editors

• reduce burden or memorizing syntax– focus on semantics

A. Ko and B. MyersCitrus: A Language and Toolkit for Simplifying the Creation of Structured Editors for Code and DataUIST, 2005

Page 75: reading & understanding code
Page 76: reading & understanding code
Page 77: reading & understanding code

77

literate programming

• source code interwoven with exposition of logic, like an essay

• allows programmers to work top-down or bottom-up

D. KnuthLiterate ProgrammingJournal of Computer & Information Sciences, 1979

Page 78: reading & understanding code

78

The purpose of wc is to count lines, words, and/or characters in a list of files. The number of lines in a file is ......../more explanations/ Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw: <<*>>= <<Header files to include>> <<Definitions>> <<Global variables>> <<Functions>> <<The main program>> @ We must include the standard I/O definitions, since we want to send formatted output to stdout and stderr. <<Header files to include>>= #include <stdio.h> @

D. KnuthLiterate ProgrammingJournal of Computer & Information Sciences, 1979

Page 79: reading & understanding code

79

conclusion

• beginners start off with an incomplete mental model for how code works

• experts are better at code comprehension because they focus on higher level patterns– patterns can be considered “discourse rules”– naming conventions, design patterns, schemas

• experts work significantly better when reading & writing code according to these patterns

Page 80: reading & understanding code

80

discussion

• what other discourse rules can you think of?• do these mental models resonate with your

style of understanding code?• what are some other tool implications of

these models?

Page 81: reading & understanding code

81

references - 1H. Sackman, W. J. Erikson, and E. E. GrantExploratory experimental studies comparing online and offline programming performanceCommunications of the ACM, 1968

B. Liblit, A. Begel, and E. SweetserCognitive Perspectives on the Role of Naming in Computer ProgramsPsychology of Programming Interest Group, 2006

A. BlackwellMetaphor or analogy: how should we see programming abstractions?Psychology of Programming Interest Group, 1996

E. Soloway, K. EhrlichEmpirical Studies of Programming KnowledgeIEEE Transactions of Software Engineering, 1984

B. Shneiderman and R. MayerSyntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental ResultsJournal of Computer & Information Sciences, 1979

R. BrooksTowards a theory of the comprehension of computer programsInternational J. on Man-Machine Studies, 1981

N. PenningtonStimulus Structures and Mental Representations in Expert Comprehension of Computer ProgramsCognitive Psychology, 1987

D. KnuthLiterate ProgrammingJournal of Computer & Information Sciences, 1979

Page 82: reading & understanding code

82

references - 2A. von Mayrhauser and A.M. VansFrom Program Comprehension to Tool Requirements for an Industrial EnvironmentIEEE Workshop on Program Comprehension, 1993

Margaret-Anne StoreyTheories, Methods, and Tools in Program Comprehension: Past, Present, and FutureInt. Workshop on Program Comprehension, 2005

I. VesseyExpertise in Debugging Computer Programs: An analysis of the Content of Verbal ProtocolsIEEE Trans on Systems, Man, Cybernetics, 1986

A. Ko and B. MyersCitrus: A Language and Toolkit for Simplifying the Creation of Structured Editors for Code and DataUIST, 2005

Page 83: reading & understanding code

83

does visual programming help?

non-significant result42%

significant result46%

significant result, but contribution of

AV uncertain8%

significant result in wrong direction

4%

C. Hundhausen, S. Douglas, J. StaskoA meta-study of algorithmvisualization effectivenessJournal of Visual Languages & Computing, 2002

Page 84: reading & understanding code

84

underlying questions

• how do programmers read and come to understand unfamiliar code?

• what kinds of mental models to programmers create to think about code?

• why are experts significantly better than novices when looking at unfamiliar code?– hint: experts aren’t as good as you might expect!

Page 85: reading & understanding code

85

why does it matter?

• reading code is done when:– searching for relevant code– re-acquainting oneself with a project– reading someone else’s code– refactoring– …

Page 86: reading & understanding code

86

the gist of the talk

• beginners start off with an incomplete mental model for how code works

• experts are better at code comprehension because they focus on higher level patterns– patterns can be considered “discourse rules”– naming conventions, design patterns, schemas

• experts work significantly better when reading & writing code according to these patterns

Page 87: reading & understanding code

87

var Dict = function() {this.keys = [];this.values = [];

};

Dict.prototype.set = function(key, value) {var keyIndex = this.keys.indexOf(key);

if(keyIndex<0) {this.keys.push(key);this.values.push(value);

}else {

this.values[keyIndex] = value;}

};

Dict.prototype.get = function(key) {var keyIndex = this.keys.indexOf(key);if(keyIndex>=0) return this.values[keyIndex];return undefined;

};

Page 88: reading & understanding code

88

mental models

top-down models• 1st: hypothesize about code• 2nd: check hypotheses• start on a high level, dig in

bottom-up models• 1st: read code statements• 2nd: mental chunking• start on a low level, ascend

hybrid models• incorporate elements from

both, based on the situation

Page 89: reading & understanding code

89

shneiderman & mayer’s model

• semantic knowledge: general programming concepts– low-level knowledge, e.g. what assignments do– high-level knowledge, e.g. algorithms

• syntactic knowledge: programming language details– sometimes overlaps across programming langs.

B. Shneiderman and R. MayerSyntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental ResultsJournal of Computer & Information Sciences, 1979

Page 90: reading & understanding code

90

brooks’ model

• “top-down”– analyze code on a high level, then look at specifics

• argues that programmers form a series of hypotheses

• beacons help verify or reject these hypotheses

R. BrooksTowards a theory of the comprehension of computer programsInternational J. on Man-Machine Studies, 1981

Page 91: reading & understanding code

91

containers & paths

B. Liblit, A. Begel, and E. SweetserCognitive Perspectives on the Role of Naming in Computer ProgramsPsychology of Programming Interest Group, 2006

Page 92: reading & understanding code

92

polysemy, homonymy, & overloading

B. Liblit, A. Begel, and E. SweetserCognitive Perspectives on the Role of Naming in Computer ProgramsPsychology of Programming Interest Group, 2006