The Synergy of Human and Artificial Intelligence in Software Engineering
Tao Xie
North Carolina State UniversityRaleigh, NC, USA
RAISE 2013
Turing Test Tell Machine and Human Apart
Human vs. Machine Machine Better Than Human?
IBM's Deep Blue defeated chess champion Garry Kasparov in 1997
IBM Watson defeated top human Jeopardy! players in 2011
Global Trend: Artificial Intelligence Replacing Human Intelligence
Google’s driverless car
Microsoft's instant voice translation tool
IBM Watson as Jeopardy! player
CAPTCHA: Human Intelligence is Better
"Completely Automated Public Turing test to tell Computers and Humans Apart"
Human-Computer Interaction
Movie: Minority Report
CNN News
iPad
Human-Centric Software Engineering
…
Task Allocation of Artificial and Human IntelligenceMachine is better at task set A
Mechanical, tedious, repetitive tasks, … Ex. solving constraints along a long path
Human is better at task set B Intelligence, human intent, abstraction,
domain knowledge, … Ex. local reasoning after a loop, recognizing
naming semantics
= A U
B8
Mutually Enhanced Demands on Artificial and Human Intelligence
Malaysia Airlines Flight 124 @2005Lisanne Bainbridge, "Ironies of Automation”, Automatica 1983 .
Ironies of Automation“Even highly automated systems, such as electric power networks, need human beings... one can draw the paradoxical conclusion that automated systems still are man-machine systems, for which both technical and human factors are important.”
“As the plane passed 39 000 feet, the stall and overspeed warning indicators came on simultaneously—something that’s supposed to be impossible, and a situation the crew is not trained to handle.” IEEE Spectrum 2009
Mutually Enhanced Demands on Artificial and Human Intelligence
Malaysia Airlines Flight 124 @2005Lisanne Bainbridge, "Ironies of Automation”, Automatica 1983 .
Ironies of Automation“The increased interest in human factors among engineers reflects the irony that the more advanced a control system is, so the more crucial may be the contribution of the human operator.”
Takeaway Messages
Don’t forget human intelligence Using your tools as end-to-end solutions Helping your tools
Don’t forget cooperations of human and tool intelligence; human and human intelligence Human can help your tools too Human and human could work together to help
your tools, e.g., crowdsourcing
11
Takeaway Messages
Don’t forget human intelligence Using your tools as end-to-end solutions Helping your tools
Don’t forget cooperations of human and tool intelligence; human and human intelligence Human can help your tools too Human and human could work together to help
your tools, e.g., crowdsourcing
12
Google Scholar: “Pointer Analysis”
“Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE 2001]
14
“During the past 21 years, over 75 papers and 9 Ph.D. theses have been published on pointer analysis. Given the tones of work on this topic one may wonder, “Haven't we solved this problem yet?'' With input from many researchers in the field, this paper describes issues related to pointer analysis and remaining open problems.”Michael Hind. Pointer analysis: haven't we solved this
problem yet?. In Proc. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE 2001)
“Pointer Analysis: Haven’t We Solved This Problem Yet?” [Hind PASTE 2001]
15
Section 4.3 Designing an Analysis for a Client’s Needs
“Barbara Ryder expands on this topic: “… We can all write an unbounded number of papers that compare different pointer analysis approximations in the abstract. However, this does not accomplish the key goal, which is to design and engineer pointer analyses that are useful for solving real software problems for realistic programs.”
Google Scholar: “Clone Detection”
Some Success Stories of Applying Clone Detection
17
Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP-Miner: a tool for finding copy-paste and related bugs in operating system code. In Proc. OSDI 2004.
MSRAXIAO
Yingnong Dang, Dongmei Zhang, Song Ge, Chengyun Chu, Yingjun Qiu, and Tao Xie. XIAO: Tuning Code Clones at Hands of Engineers in Practice. In Proc. ACSAC 2012,
MSR 2011 Keynote by YY Zhou: Connecting Technology with Real-world Problems – From Copy-paste Detection to Detecting Known Bugs
Human Intelligence to Determine What are Serious Bugs
18
XIAO: Clone Detection@MSRA
Available in Visual Studio 2012Searching similar snippets
for fixing bug once
Finding refactoring opportunity
Yingnong Dang, Dongmei Zhang, Song Ge, Yingjun Qiu, and Tao Xie. XIAO: Tuning Code Clones at Hands of Engineers in Practice. In Proc. Annual Computer Security Applications Conference (ACSAC 2012)
XIAO Code Clone Search service integrated into workflow of Microsoft Security Response Center (MSRC)
Microsoft Technet Blog about XIAO:We wanted to be sure to address the vulnerable code wherever it appeared across the Microsoft code base. To that end, we have been working with Microsoft Research to develop a “Cloned Code Detection” system that we can run for every MSRC case to find any instance of the vulnerable code in any shipping product. This system is the one that found several of the copies of CVE-2011-3402 that we are now addressing with MS12-034.
19
XIAO: Enabling Human Intelligence
XIAO enables code clone analysis withHigh scalability, High compatibilityHigh tunability: what you tune is what you getHigh explorability:
1. Clone navigation based on source tree hierarchy2. Pivoting of folder level statistics3. Folder level statistics4. Clone function list in selected folder5. Clone function filters6. Sorting by bug or refactoring potential7. Tagging
1 2 3 4 5 6
7
1. Block correspondence2. Block types3. Block navigation4. Copying5. Bug filing6. Tagging
1
2
3
4
1
6
5
How to navigate through the large number of detected clones? How to quickly review a pair of clones?
"Are Automated Debugging [Research] Techniques Actually Helping Programmers?"
50 years of automated debugging research N papers only 5 evaluated with actual
programmers“
”Chris Parnin and Alessandro Orso. Are automated debugging techniques actually helping programmers?. In Proc. ISSTA 2011
Human Factors in Real World Academia
Tend to leave human out of loop (involving human makes evaluations difficult to conduct or write)
Tend not to spend effort on improving tool usability ▪ tool usability would be valued more in HCI than in SE▪ too much to include both the approach/tool itself and
usability/its evaluation in a single paper
Real-world Often has human in the loop (familiar IDE integration,
social effect, lack of expertise/willingness to write specs,…)
Examples Agitar [ISSTA 2006] vs. Daikon [TSE 2001] Test generation in Pex based on constraint solving
NSF Workshop on Formal methods
Goal: to identify the future directions in research in formal methods and its transition to industrial practice.
The workshop will bring together researchers and identify primary challenges in the field, both foundational, infrastructural, and in transitioning ideas from research labs to developer tools.
http://goto.ucsd.edu/~rjhala/NSFWorkshop/
Example Barriers Related to Human Factors “Lack of education amongst
practitioners” “Education of students in logic and
design for verification” “Expertise required to create and use a
verification tool. E.g., both Astre for Airbus and SDV for Windows drivers were closely shepherded by verification experts.”
“Tools require lots of up-front effort (e.g., to write specifications)”
“User effort required to guide verification tools, such as assertions or specifications”
Example Barriers Related to Human Factors “Not integrated with standard
development flows (testing)” “Too many false positives and no ranking
of errors” “General usability of tools, in terms of
false alarms and error messages. The Coverity CACM paper pointed out that they had developed features that they do not deploy because they baffle users. Many tools choose unsoundness over soundness to avoid false alarms.”
Example Barriers Related to Human Factors “The necessity of detailed specifications
and complex interaction with tools, which is very costly and discouraging for industrial, who lack high-level specialists.”
“Feedback to users. It’s difficult to explain to users why automated verification tools are failing. Counterexamples to properties can be very difficult for users to understand, especially when they are abstract, or based on incomplete environment models or constraints.”
Automation in Software Testing
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
Automation in Software Testing
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
Human Factors
http://www.dagstuhl.de/programm/kalender/semhp/?semnr=1011
Human-Centric SE Example: Whyline
Andy Ko and Brad Myers. Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior. In Proc. ICSE 2008
Takeaway Messages
Don’t forget human intelligence Using your tools as end-to-end solutions Helping your tools
Don’t forget cooperations of human and tool intelligence; human and human intelligence Human can help your tools too Human and human could work together to help
your tools, e.g., crowdsourcing
29
Reflexion Models Motivation
Architecture recovery is challenging (abstraction gap)
Human typically has high-level view in mind Repeat
Human: define/update high-level model of interest Tool: extract a source model Human: define/update declarative mapping
between high-level model and source model Tool: compute a software reflexion model Human: interpret the software reflexion modelUntil happy
Gail C. Murphy, David Notkin. Reengineering with Reflection Models: A Case Study. IEEE Computer 1997
State-of-the-Art/Practice Test Generation Tools
Running Symbolic PathFinder ...…=============================
========================= results
no errors detected=============================
========================= statistics
elapsed time: 0:00:02states: new=4, visited=0,
backtracked=4, end=2search: maxDepth=3, constraints=0choice generators: thread=1, data=2heap: gc=3, new=271, free=22instructions: 2875max memory: 81MBloaded code: classes=71, methods=884
…
31
Challenges Faced by Test Generation Tools
object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%
Total block coverage achieved is 50%, lowest coverage 16%.
32
Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing [Godefroid et al. 05][Sen et al. 05][Tillmann et al. 08]
Instrument code to explore feasible paths Challenge: path explosion
When desirable receiver or argument
objects are not generated
Example Object-Creation Problem
33
A graph example from QuickGraph library
Includes two classes GraphDFSAlgorithm
GraphAddVertexAddEdge: requires
both vertices to be in graph
00: class Graph { …03: public void AddVertex (Vertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (Vertex v1, Vertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }
//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (Vertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } } 33
[OOPSLA 11]
34
Test target: Cover true branch (B4) of Line 24
Desired object state: graph should include at least one edge
Target sequence:
Graph ag = new Graph();Vertex v1 = new Vertex(0);Vertex v2 = new Vertex(1);ag.AddVertex(v1);ag.AddVertex(v2);ag.AddEdge(v1, v2);DFSAlgorithm algo = new
DFSAlgorithm(ag);algo.Compute(v1);
34
00: class Graph { …03: public void AddVertex (Vertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (Vertex v1, Vertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }
//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (Vertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } }
Example Object-Creation Problem
[OOPSLA 11]
Challenges Faced by Test Generation Tools
object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%
Total block coverage achieved is 50%, lowest coverage 16%.
35
Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing [Godefroid et al. 05][Sen et al. 05][Tillmann et al. 08]
Instrument code to explore feasible paths Challenge: path explosion
Typically DSE instruments or explores only methods @ project under test;Third-party API external methods (network, I/O, ..):
• too many paths• uninstrumentable
Example External-Method Call Problems (EMCP)
36
Challenges Faced by Test Generation Tools
Total block coverage achieved is 50%, lowest coverage 16%.
37
Ex: Dynamic Symbolic Execution (DSE) /Concolic Testing [Godefroid et al. 05][Sen et al. 05][Tillmann et al. 08]
Instrument code to explore feasible paths Challenge: path explosion
Xusheng Xiao, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011
What to Do Next?
2010 Dagstuhl Seminar 10111
Practical Software Testing: Tool Automation and Human Factors
Conventional Wisdom: Improve Automation Capability
Tackling object-creation problems Seeker [OOSPLA 11] , MSeqGen [ESEC/FSE 09]
Covana [ICSE 2011], OCAT [ISSTA 10]Evacon [ASE 08], Symclat [ASE 06]
Still not good enough (at least for now)! ▪ Seeker (52%) > Pex/DSE (41%) > Randoop/random
(26%)
Tackling external-method call problems DBApp Testing [ESEC/FSE 11], [ASE 11]
CloudApp Testing [IEEE Soft 12]
Deal with only common environment APIs
@NCSU ASE
40
Test target: Cover true branch (B4) of Line 24
Desired object state: graph should include at least one edge
Target sequence:
Graph ag = new Graph();Vertex v1 = new Vertex(0);Vertex v2 = new Vertex(1);ag.AddVertex(v1);ag.AddVertex(v2);ag.AddEdge(v1, v2);DFSAlgorithm algo = new
DFSAlgorithm(ag);algo.Compute(v1);
40
00: class Graph { …03: public void AddVertex (Vertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (Vertex v1, Vertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }
//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (Vertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } }
Example Object Creation Problem (OCP)
Unconventional Wisdom: Human Can Help! Object Creation Problems (OCP)Tackle object-creation problems with Factory Methods
41
Unconventional Wisdom: Human Can Help! External-Method Call Problems (EMCP)Tackle external-method call problems with Mock Methods or Method InstrumentationMocking System.IO.File.ReadAllText
42
Cooperation Between Human and Machine
Human-Assisted Computing Driver: tool Helper: human Ex. Covana [Xiao et al. ICSE 2011]
Human-Centric Computing Driver: human Helper: tool Ex. Coding duels @Pex for Fun
Interfaces are important. Contents are important too!
43
Human-Assisted ComputingMotivation
Tools are often not powerful enough Human is good at some aspects that tools are not
What difficulties does the tool face? How to communicate info to the user to get help?
How does the user help the tool based on the info?
44
Iterations to form Feedback Loop
Cooperation Between Human and Machine
Human-Assisted Computing Driver: tool Helper: human Ex. Covana [Xiao et al. ICSE 2011]
Human-Centric Computing Driver: human Helper: tool Ex. Coding duels @Pex for Fun
Interfaces are important. Contents are important too!
45
Microsoft Research Pex for FunTeaching/Learning CS via Interactive Gaming
1,230,309 clicked 'Ask Pex!'
www.pexforfun.com
46
Nikolai Tillmann, Jonathan De Halleux, Tao Xie, Sumit Gulwani and Judith Bishop. Teaching and Learning Programming and Software Engineering via Interactive Gaming. In Proc. ICSE 2013 SEE.
Behind the Scene of Pex for Fun
Secret Implementation class Secret {
public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); }}
Player Implementation
class Player { public static int Puzzle(int x) { return x; }}
class Test {public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); }}
behaviorSecret Impl == Player Impl
47
Human-Centric Computing
Coding duels at http://www.pexforfun.com/ Brain exercising/learning while having fun Fun: iterative, adaptive/personalized, w/ win
criterion Abstraction/generalization, debugging,
problem solving
Brain exercising
Coding Duel Competition @ICSE 2011
Coding Duels for Course Assignments
@Grad Software Engineering Course
http://pexforfun.com/gradsofteng
Observed Benefits• Automatic Grading• Real-time Feedback (for Both Students and Teachers)• Fun Learning Experiences
Example User Feedback
“It really got me *excited*. The part that got me most is about spreading interest in teaching CS: I do think that it’s REALLY great for teaching | learning!”
“I used to love the first person shooters and the satisfaction of blowing away a whole team of Noobies playing Rainbow Six, but this is far more fun.”
“I’m afraid I’ll have to constrain myself to spend just an hour or so a day on this really exciting stuff, as I’m really stuffed with work.”
X
Human-Human Cooperation: Pex for Fun (Crowdsourcing)
52
Internet
class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); } }
Everyone can contribute Coding duels Duel solutions
Human-Human Cooperation: Puzzle Games (Crowdsourcing)
InternetPuzzle Games Made from Difficult Constraints or Object-Creation Problems
Supported by MSR SEIF Award
Ning Chen and Sunghun Kim. Puzzle-based Automatic Testing: bringing humans into the loop by solving puzzles. In Proc. ASE 2012
http://www.cs.washington.edu/verigames/
Human-Human/Tool Cooperation: Performance Debugging in the Large
55
Pattern Matching
Bug update
Problematic Pattern
Repository
Bug Database
Trace analysis
Bug filing
StackMine [Han et al. ICSE 12]
Trace StorageTrace collection
Internet
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012
StackMine: Industry Impact
“We believe that the MSRA tool is highly valuable and much more efficient for mass trace (100+ traces) analysis. For 1000 traces, we believe the tool saves us 4-6 weeks of time to create new signatures, which is quite a significant productivity boost.”
- from Development Manager in WindowsHighly effective new issue
discovery onWindows mini-hang
Continuous impact on future Windows versions
56
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012
Takeaway Messages
Don’t forget human intelligence Using your tools as end-to-end solutions Helping your tools
Don’t forget cooperations of human and tool intelligence; human and human intelligence Human can help your tools too Human and human could work together to help
your tools, e.g., crowdsourcing
57
Summary: Cooperative Testing and Analysis
Human-Assisted Computing
Human-Centric Computing
Human-Human Cooperation
Acknowledgment Wonderful current/former students@NCSU ASE
Collaborators, especially those from Microsoft Research Redmond/Asia, Peking University
Colleagues who gave feedback and inspired me
NSF grants CCF-0845272, CCF-0915400, CNS-0958235, ARO grant W911NF-08-1-0443, an NSA Science of Security, Lablet grant, a NIST grant, a 2011 Microsoft Research SEIF Award
Thank you!
Questions ?
https://sites.google.com/site/asergrp/
Top Related