EarlyBridge case from product centric to customer centric eb
A Human-Centric Approach to Program...
Transcript of A Human-Centric Approach to Program...
![Page 1: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/1.jpg)
A Human-Centric Approach to Program Understanding
Ray Buse - PhD ProposalUniversity of Virginia, Department of Computer Science
DocumentationRuntime BehaviorReadability
1.20.2010
“The real question is not whether machines think, but whether men do.“ -- B. F. Skinner
![Page 2: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/2.jpg)
2
Requirements
Design
Implementation
Verification
Maintenance
![Page 3: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/3.jpg)
3
Requirements
Design
Implementation
Verification
Maintenance
Maintenance accounts for about
70-90% of the total lifecycle budget of a
software project.1,2
1. T. M. Pigoski. Practical Software Maintenance: Best Practices for Managing Your Software Investment. John Wiley & Sons, Inc., 1996.
2. R. C. Seacord, D. Plakosh, and G. A. Lewis. Modernizing Legacy Systems: Software Technologies, Engineering Process and Business Practices. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003.
![Page 4: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/4.jpg)
4
Requirements
Design
Implementation
Verification
Maintenance
Reading Code is the most time consuming part of all maintenance activities.3,4,5
3. L. E. Deimel Jr. The uses of program reading. SIGCSE Bull., 17(2):5-14, 1985.
4. R. Glass. Facts and Fallacies of Software Engineering. Addison-Wesley, 2003.
5. S. Rugaber. The use of domain knowledge in program understanding. Ann. Softw. Eng.,(1-4):143-192, 2000.
Reading Code
![Page 5: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/5.jpg)
5
“Understanding code is by far the activity at which professional
developers spend most of their time.” 6Writing New
Code
Modifying Existing Code
UnderstandingCode
6. Peter Hallam. What Do Programmers Really Do Anyway? Microsoft Developer Network (MSDN) – C# Compiler. Jan 2006.
![Page 6: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/6.jpg)
6
Reading Code is the most
Poorly UnderstoodSoftware Engineering activity.7,8
7. D. Parnas. Software aging. In Software Fundamentals. Addison-Wesley, 2001.
8. D. Zokaities. Writing understandable code. In Software Development, pages 48-49, jan 2002.
![Page 7: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/7.jpg)
7
Reading Code is the most
Poorly UnderstoodSoftware Engineering activity.
4,387
780
16 1
ICSE PLDI
all program understanding
![Page 8: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/8.jpg)
8
Understanding is difficult to…
Model
• Based on a complex combination of factors
Evaluate
• Lack of established metrics/baselines
• User studies are unattractive
![Page 9: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/9.jpg)
9
Two Key Insights
• Machine Learning allows us to combine many semantically shallow features of code to gain new deep insights.
• PL Techniques can be adapted to generate documentation artifacts that are directly comparable to human created ones.
![Page 10: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/10.jpg)
10
Thesis
We can combine insights from Machine Learning and Programming Languages to
• Model aspects of code understanding accurately and
• Generate output that compares favorably with human documentation.
![Page 11: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/11.jpg)
11
Proposal: Three Dimensions of Understanding
DocumentationRuntime BehaviorReadability
![Page 12: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/12.jpg)
12
Proposal: Three Dimensions of Understanding
DocumentationRuntime BehaviorReadability
Textual characteristics that make code understandable.
Structural characteristics that help developers understand what a program is expected to do.
Non-code text that helps developers understand a program.
![Page 13: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/13.jpg)
13
Metrics for:
• Code Readability
• Path Execution Frequency
Algorithms for Documentation of:
• Exceptions
• Code Changes
• APIs
Research Projects
![Page 14: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/14.jpg)
14
Broader Impact
New algorithms and metrics to support:
• Software Development and Composition
– Metrics for Software Quality Assurance
– Automatic Documentation
• Software Analysis
– Runtime Behavior model for optimizing compilers
– Metrics for targeting analyses, prioritizing output, and evaluating research
![Page 15: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/15.jpg)
15
The rest of this proposal
• A review of each proposed contribution
– Technical Merit
– Evaluation Strategy
– Related Work
• Research timeline and other bookkeeping
• Concluding Remarks
![Page 16: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/16.jpg)
16
Metrics for:
• Code Readability
• Path Execution Frequency
Algorithms for Documentation of:
• Exceptions
• Code Changes
• APIs Published
In Progress
ISSTA ‘08
ICSE ‘09
ISSTA ‘08 TSE ‘10
![Page 17: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/17.jpg)
17
Metrics for:
• Code Readability
• Path Execution Frequency
Algorithms for Documentation of:
• Exceptions
• Code Changes
• APIs Published
In Progress
ISSTA ‘08
ICSE ‘09
ISSTA ‘08 TSE ‘10
![Page 18: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/18.jpg)
18
/**
* Extend this Execution path by one level.
*
* @throws IllegalStateException If the move path invalid..
*/
private List<ExecutionPath> extend (ExecutionPath ep)
{
paths = new LinkedList<ExecutionPath>();
Unit last = ep.getLast();
List<Unit> succs = graph.getSuccsOf(last);
//this is the end of the path
if (succs.isEmpty())
{
ep.setComplete(true);
paths.add(ep);
return paths;
}
if (succs.size() == 1)
{
Unit s = succs.get(0);
if (ep.contains(s))
{
//do nothing
}
else
{
ep.addLast(s);
if (graph.getTails().contains(s))
{
ep.setComplete(true);
}
Readability
Model human judgments about code readability
Create a readability metric
Key: Use textual features to approximate human judgments
![Page 19: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/19.jpg)
19
Hypothesis
With a simple set of textual features, we can derive from a set of human judgments an accurate model of readability for code.
Success depends on
• Gathering human judgments
• Choosing predictive textual features
![Page 20: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/20.jpg)
20
Data Gathering
• We asked 120 students at UVa to rate the readability of a set of snippets…
![Page 21: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/21.jpg)
21
Data Set Vertical bands indicate snippets were
agreement was high
![Page 22: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/22.jpg)
22
Choosing predictive textual features
We choose local code features
• Line length
• Length of identifier names
• Comment density
• Blank lines
• Presence of numbers
• [and 20 others]
Modeled with a Bayesian Classifier
![Page 23: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/23.jpg)
23
Model Performance
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 20 40 60 80 100 120
Spe
arm
an c
orr
ela
tio
n b
etw
ee
n
ann
ota
tor
sco
res
and
ave
rage
sco
res
Human Annotators (sorted)
average human
our metric
Model agrees with humans as much as they agree with
each other on average
![Page 24: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/24.jpg)
24
Related Work
• Readability metrics for natural languages
– Very popular, DOD standards etc
• In the software domain
– Complexity metrics (often used, but utility is questionable)
![Page 25: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/25.jpg)
25
Conclusions
• We can automatically judge readability about as well as the average human can
• This notion of readability shows significant correlation with:
• Code churn
• A bug finder
• Program maturity
![Page 26: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/26.jpg)
26
Metrics for:
• Code Readability
• Path Execution Frequency
Algorithms for Documentation of:
• Exceptions
• Code Changes
• APIs Published
In Progress
ISSTA ‘08
ICSE ‘09
ISSTA ‘08 TSE ‘10
![Page 27: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/27.jpg)
27
/**
* Extend this Execution path by one level.
*
* @throws IllegalStateException If the move path invalid..
*/
private List<ExecutionPath> extend (ExecutionPath ep)
{
paths = new LinkedList<ExecutionPath>();
Unit last = ep.getLast();
List<Unit> succs = graph.getSuccsOf(last);
//this is the end of the path
if (succs.isEmpty())
{
ep.setComplete(true);
paths.add(ep);
return paths;
}
if (succs.size() == 1)
{
Unit s = succs.get(0);
if (ep.contains(s))
{
//do nothing
}
else
{
ep.addLast(s);
if (graph.getTails().contains(s))
{
ep.setComplete(true);
}
Runtime Behavior
Model path execution frequency statically
Key: Use path surface features to uncover developer expectations
![Page 28: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/28.jpg)
28
Key Idea
• Developers often have expectations about common and uncommon cases in programs
• The structure of code they write can sometimes reveal these expectations
![Page 29: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/29.jpg)
29
Intuition
public V put(K key , V value)
{
if ( value == null )
throw new Exception();
if ( count >= threshold )
rehash();
index = key.hashCode() % length;
table[index] = new Entry(key, value);
count++;
return value;
}
*simplified from java.util.HashTable jdk6.0
Exception
Invocation that changesa lot of the object state
Some computation
![Page 30: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/30.jpg)
30
Hypothesis
We can accurately predict the runtime frequency of program paths by analyzing their static surface features
Goal:
• Know what programs are likely to do without having to run them (produce a static profile)
![Page 31: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/31.jpg)
31
Applications for Static Profiles
Indicative (dynamic) profiles are often unavailable
Profile information can improve many analyses
• Profile guided optimization
• Complexity/Runtime estimation
• Anomaly detection
• Significance of difference between program versions
• Prioritizing output from other static analyses
![Page 32: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/32.jpg)
32
Approach
• Model path with a set of features that may correlate with runtime path frequency
• Learn from programs for which we have indicative workloads, we used a Logistic Regression
• Predict which paths are most or
least likely in other programs
![Page 33: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/33.jpg)
33
Evaluation
Choose 5% of all paths and get 50% of
runtime behaviorRanking by our metric
Baseline: random ranking
![Page 34: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/34.jpg)
34
Evaluation
Baseline: random ranking
Choose 1 path per method and get 94%of runtime behavior
Ranking by our metric
![Page 35: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/35.jpg)
35
Related Work
• Static Branch Prediction [Ball & Larus ’92]
– For each branch, which direction is most likely
– In a direct comparison, our tool is better
![Page 36: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/36.jpg)
36
Conclusion
• A formal model that statically predicts relative dynamic path execution frequencies
• The promise of helping other program analyses and transformations
![Page 37: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/37.jpg)
37
Metrics for:
• Code Readability
• Path Execution Frequency
Algorithms for Documentation of:
• Exceptions
• Code Changes
• APIs Published
In Progress
ISSTA ‘08
ICSE ‘09
ISSTA ‘08 TSE ‘10
![Page 38: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/38.jpg)
38
/**
* Extend this Execution path by one level.
*
* @throws IllegalStateException If the move path invalid..
*/
private List<ExecutionPath> extend (ExecutionPath ep)
{
paths = new LinkedList<ExecutionPath>();
Unit last = ep.getLast();
List<Unit> succs = graph.getSuccsOf(last);
//this is the end of the path
if (succs.isEmpty())
{
ep.setComplete(true);
paths.add(ep);
return paths;
}
if (succs.size() == 1)
{
Unit s = succs.get(0);
if (ep.contains(s))
{
//do nothing
}
else
{
ep.addLast(s);
if (graph.getTails().contains(s))
{
ep.setComplete(true);
}
Documentation
ExceptionsAPIs
Generate for:
/**
* Extend this Execution path by one level.
*
* @throws IllegalStateException If the move path invalid..
*/
private List<ExecutionPath> extend (ExecutionPath ep)
{
paths = new LinkedList<ExecutionPath>();
Unit last = ep.getLast();
List<Unit> succs = graph.getSuccsOf(last);
//this is the end of the path
if (succs.isEmpty())
{
ep.setComplete(true);
paths.add(ep);
return paths;
}
if (succs.size() == 1)
{
Unit s = succs.get(0);
if (ep.contains(s))
{
//do nothing
}
else
{
ep.addLast(s);
if (graph.getTails().contains(s))
{
ep.setComplete(true);
}
Version Changes
Key: Use symbolic execution and summarization heuristics to generate human-readable results.
/**
* Extend this Execution path by one level.
*
* @throws IllegalStateException If the move path invalid..
*/
private List<ExecutionPath> extend (ExecutionPath ep)
{
paths = new LinkedList<ExecutionPath>();
Unit last = ep.getLast();
List<Unit> succs = graph.getSuccsOf(last);
//this is the end of the path
if (succs.isEmpty())
{
ep.setComplete(true);
paths.add(ep);
return paths;
}
if (succs.size() == 1)
{
Unit s = succs.get(0);
if (ep.contains(s))
{
//do nothing
}
else
{
ep.addLast(s);
if (graph.getTails().contains(s))
{
ep.setComplete(true);
}
![Page 39: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/39.jpg)
39
Use
• For Internal Developers
– Easier to keep track of what’s going on
• For Maintenance and Testing
– Easier to read old code.
• For External Developers
– Easier to integrate off-the-shelf software libraries
![Page 40: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/40.jpg)
40
Three Types of Documentation
• Exceptions
• Code Changes
• APIs
![Page 41: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/41.jpg)
41
Documenting Exceptions
/**
* @throws Exception If the value is null
*/
public V put(K key , V value)
{
if ( value == null )
throw new Exception();
if ( count >= threshold )
rehash();
index = key.hashCode() % length;
...
*simplified from java.util.HashTable jdk6.0
Best practice dictatesthat exceptions should be documented
![Page 42: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/42.jpg)
42
Documenting Exceptions
/**
* @throws Exception If the value is null
*/
public V put(K key , V value)
{
if ( value == null )
throw new Exception();
if ( count >= threshold )
rehash();
index = key.hashCode() % length;
...
*simplified from java.util.HashTable jdk6.0
Best practice dictatesthat exceptions should be documented
Does this method throw an exception?
![Page 43: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/43.jpg)
43
Importance
Mishandling or Not handling can lead to:
• Security vulnerabilities
• May disclose sensitive implementation details
• Breaches of API encapsulation
• Any number of minor to serious system failures
![Page 44: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/44.jpg)
44
Hypothesis
Mechanical documentation of exceptions can be at least as good as human on average.
• More complete
• More accurate
We extract paths to throw statements and use symbolic execution to generate path predicates
![Page 45: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/45.jpg)
45
Examples
• Sometimes we do better:
• Sometimes we do about the same:
• Sometimes we do worse:
![Page 46: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/46.jpg)
46
Key ResultsOur documentation is
as good as human over 80% of the time
![Page 47: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/47.jpg)
47
Code Change Examples
jfreechart rev 3405
(start): Changed from Date to long,(end): Likewise,(getStartMillis): New method,(getEndMillis): Likewise,(getStart): Returns new date instance,(getEnd): Likewise.
Jabref rev 2917
Fixed NullPointerException when downloading external file and file directory is undefined.
Phex 3542
Minor change
![Page 48: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/48.jpg)
48
Subject: An appeal for more descriptive commit messagesI know there is a lot going on but please can we be a bit moredescriptive when commitingchanges. Recent log messages have included:"some cleanup""more external service work""Fixed a bug in wiring"which are a lot less informative than others...
http://osdir.com/ml/apache.webservices.tuscany.devel/2006-02/msg00227.html
Toby,Going forward, could you I ask you to be more descriptive in your commit messages? Ideally you should state what you've changed and also why (unless it's obvious)... I know you're busy and this takes more time, but it will help anyone who looks through the log ...
http://lists.macosforge.org/pipermail/macports-dev/2009-June/008881.html
Sorry to be a pain in the neck about this, but could we please use more descriptive commit messages? I do try to read the commit emails, but since the vast majority of comments are "CAY-XYZ", I can't really tell what's going on unless I then look it up.
http://osdir.com/ml/java.cayenne.devel/2006-10/msg00044.html
![Page 49: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/49.jpg)
49
Key Idea
• Generate Documentation that describes the effect of a change on the runtime behavior of a program
– What conditions are necessary to activate the change
– What the new behavior is
![Page 50: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/50.jpg)
50
Algorithm
• Generate predicates for each statement
• Compare predicates across versions
• Summarize change and distill structured output
When X,
Do Y
Instead of Z
![Page 51: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/51.jpg)
51
EvaluationOur documentation is
as good as human over 80% of the time
![Page 52: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/52.jpg)
52
API Usage Documentation
“The greatest obstacle to learning an API … is insufficient or inadequate examples” 9
9. M. P. Robillard. What Makes APIs Hard to Learn? Answers from Developers. IEEE Softw., 26(6):27-34, 2009.
![Page 53: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/53.jpg)
53
API Usage Documentation
java.util.ObjectOutputStream
FileOutputStream fos = new FileOutputStream("t.tmp");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeInt(12345); oos.writeObject("Today"); oos.writeObject(new Date()); oos.close();
weka.core.Instance
// Create the instance Instance iExample = new Instance(4); iExample.setValue((Attribute)fvWekaAttributes.elementAt(0), 1.0); iExample.setValue((Attribute)fvWekaAttributes.elementAt(1), 0.5); iExample.setValue((Attribute)fvWekaAttributes.elementAt(2), "gray"); iExample.setValue((Attribute)fvWekaAttributes.elementAt(3), "positive");
isTrainingSet.add(iExample);
java.util.BufferedReader
BufferedReader in = new BufferedReader(new FileReader("foo.in"));
![Page 54: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/54.jpg)
54
Key Idea
• Combine insights from specification mining, automatic documentation, and code summarization
• Specification mining false positives – usage patterns that are common but aren't required – are exactly what we want to find.
![Page 55: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/55.jpg)
55
Algorithm
Given a target class to document, and a set of code files that use the class (e.g., mined from the web).
– Model usages of the classes as a finite state machine or regular expression
– Combine machines that are similar
– Output most common machines as usage examples
![Page 56: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/56.jpg)
56
Evaluation
Manual comparison to JavaDoc examples
• Are we able to come up with the same examples?
– Precision / Recall / F-measure
• User Study
![Page 57: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/57.jpg)
57
Conclusion
To create algorithms for three types of documentation:
– Exceptions
– Code Changes
– API Usage
Evaluate by comparing to human generated documentation and/or with a user study
![Page 58: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/58.jpg)
58
Research Timeline
![Page 59: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/59.jpg)
59
A 2005 NASA survey found that the most significant barrier to code reuse is that software is “too difficult to understand” or is “poorly documented.” 10
10. Nasa Software Reuse Working Group. Software reuse survey. http://www.esdswg.com/softwarereuse/Resources/library/working_ group_ documents/survey2005, 2005.
![Page 60: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/60.jpg)
60
Conclusion: Understanding programs at many levels
• How easy is it to understand and maintain this software? Readability
• Where are the corner cases, and where are the common paths? Runtime Behavior
• How can this code go wrong? Documenting Exceptions
• How do I use this code? Documenting APIs
• What does proposed fix really do? Documenting Changes
![Page 61: A Human-Centric Approach to Program Understandingweb.eecs.umich.edu/~weimerw/students/rayproposal_talk.pdf · A Human-Centric Approach to Program Understanding Ray Buse - PhD Proposal](https://reader030.fdocuments.in/reader030/viewer/2022041123/5d29bac488c993f3778dda88/html5/thumbnails/61.jpg)
All Questions Encouraged
A Human-Centric Approach to Program Understanding
DocumentationRuntime BehaviorReadability
These slides, the proposal document, and much more information is available at:
http://arrestedcomputing.com/proposal
Thanks for Coming!