Summarization Techniques for Code, Changes, and Testing

Summarization Techniques for Code, Changes, and Testing

Sebastiano Panichella Institut für Informatik

Universität Zürich [email protected]

http://www.ifi.uzh.ch/seal/people/panichella.html

http://www.ifi.uzh.ch/seal/people/panichella.html

OutlineI. Source Code Summaries

II. Code Change Summarization

- Why? Prevent Maintenance Cost.

- How? Using Term-based Text Retrieval (TR) techniques

- Generating Commit Messages via Summarization of Source Code Changes

- Automatic Generation of Release Notes

III. Test Cases Summarization- Generating Human Readable Test Cases via Summarization of Source Code Techniques

- Evaluation involving 30 developers

I. Source Code Summaries Why? How?

Source Code Summaries: Why?

Prevent Maintenance Cost….

4

Activities in Software Maintenance

Change documentation

5%

Change implementation

10%

Change planning

10%

Change Testing25%Source code

comprehension50%

Source: Principles of Software Engineering and Design, Zelkovits, Shaw, Gannon 1979

Source Code Summaries: Why?

Prevent Maintenance Cost….

5

Understanding Code…

Not So Happy Developers

Happy Developers

Absence of Comments in the Codeagain !!

Comments in the Codeagain !!

SOLUTION???

6

Source Code Summaries: How?

Generating Summaries of Source Code:

7

“Automatically generated, short, yet accurate descriptions of source code entities”.

When Navigating Java Classes…

8https://github.com/larsb/atunesplus/blob/master/aTunes/src/main/java/net/sourceforge/atunes/kernel/modules/repository/audio/AudioFile.java

we look at - Name of the Class - Attributes - Methods - Dependencies between Classes

Questions when Generating Summaries of Java Classes

9

■ 1) What information to include in the summaries?

■ 2) How much information to include in the summaries?

■ 3) How to generate and present the summaries?

What information include in the summaries?

■ Methods and attributes relevant for the class

■ Class stereotypes [Dragan et al., ICSM’10]

■ Method stereotypes [Dragan et al., ICSM’06]

■ Access-level heuristics

■ Private, protected, package-protected, public

10

[ L. Moreno at al. - ASE 2012- “JStereoCode: automatically identifying method and class stereotypes in Java code”]”

Example of Important Attributes/Methods of an Entity Java Class

11

we look at - Attributes- Methods- Dependencies between Classes

An approach for Summarizing a Java Class (JSummarizer)

12

http://www.cs.wayne.edu/~severe/jsummarizer/

How to present and generate the summaries?

Other Code Artefacts can be Summarised as well: - Packages- Classes- Methods- etc.

II. Code Change Summarization

Task-Driven Summaries

[ Binkley at al. - ICSM 2013 ]

1) Generating Commit Messages via Summarization of Source Code Changes

2) Automatic Generation of Release Notes

To Improve CommitsQuality

To Improve Releases Note Quality

15

Task-Driven Summaries

[ Binkley at al. - ICSM 2013 ]

1) Generating Commit Messages via Summarization of Source Code Changes

2) Automatic Generation of Release Notes

To Improve CommitsQuality

To Improve Releases Note Quality

16

Commit Message Should Describe…

The what: changes implemented during the incremental change

The why: motivation and context behind the changes

17

Commit Message Should Describe…

The what: changes implemented during the incremental change

The why: motivation and context behind the changes

18

>20% of the messages were removed: - they were empty - had very short strings or lacked any

semantical sense

Maalej and Happel - MSR 10

Java projectversion i-1

Java projectversion i

��%" ��"�� "��!"��

��"��!"� ��"&��"��"��#!��

��"� �� "��

��"�!"� ��"&��"��"��

�� "��

��

��$�� "$� !��

��$�� "$� !��

��"��!�� "��"��

��"�"��"�!�

��"��"�!��"��

��"��!!�� "��

��

�"� ��"&��!��"��"��

��

1. Changes Extractor

2. Stereotypes Detector

3. Message Generator

Generating Commit Messages via Summarization of Source Code Changes

19https://github.com/SEMERU-WM/ChangeScribe

Example: This is a degenerate modifier commit: this change set is composed of empty, incidental, and abstract methods. These methods indicate that a new feature is planned. This change set is mainly composed of:

1. Changes to package org.springframework.social.connect.web:

1.1. Modifications to ConnectController.java:

1.1.1. Add try statement at oauth1Callback(String,NativeWebRequest) method 1.1.2. Add catch clause at oauth1Callback(String,NativeWebRequest) method 1.1.3. Add method invocation to method warn of logger object at oauth1Callback(String,NativeWebRequest) method

1.2. Modifications to ConnectControllerTest.java:

1.2.1. Modify method invocation mockMvc at oauth1Callback() method 1.2.2. Add a functionality to oauth 1 callback exception while fetching access token

2. Changes to package org.springframework.social.connect.web.test:

2.1. Add a ConnectionRepository implementation for stub connection repository. It allows to:

Find all connections; Find connections; Find connections to users; Get connection; Get primary connection; Find primary connection; Add connection; Update connection; Remove connections; Remove connection

[..............]

20

http://web.test

Impact = relative number of methods impacted by a class in the commit

Generating Commit Messages via Summarization of Source Code Changes

This is a degenerate modifier commit: this change set is composed of empty, incidental, and abstract methods. These methods indicate that a new feature is planned. This change set is mainly composed of:








Find all connections; Find connections; Find connections to users; Get connection;

This is a degenerate modifier commit: this change set is composed of empty, incidental, and abstract methods. These methods indicate that a new feature is planned. This change set is mainly composed of:








Find all connections; Find connections; Find connections to users; Get connection;

17%

Example: impact >= 17%

http://web.test

http://web.test

Original Message

This is a large modifier commit: this is a commit with many methods and combines multiple roles. This commit includes changes to internationalization, properties or configuration files (pom.xml). This change set is mainly composed of:

1. Changes to package retrofit.converter: 1.1. Add a Converter implementation for simple XML converter. It allows to:

Instantiate simple XML converter with serializer; Process simple XML converter simple XML converter from body; Convert simple XML converter to body

Referenced by:

SimpleXMLConverterTest classMessage Automatically

Generated

22

III. Test Cases Summarization

Manual Testing V.S. Automatic Testing

24

Manual Testing is still Dominant in Industry… ?Why

Automatically generated tests do not improve the ability of developers to detect faults when compared to manual testing.Fraser et al.

Modeling Readability to Improve Unit Tests

Ermira Daka, José Campos, and

Gordon Fraser

University of Sheffield

Sheffield, UK

Jonathan Dorn and Westley Weimer

University of Virginia

Virginia, USA

ABSTRACTWriting good unit tests can be tedious and error prone, but evenonce they are written, the job is not done: Developers need to reasonabout unit tests throughout software development and evolution, inorder to diagnose test failures, maintain the tests, and to understandcode written by other developers. Unreadable tests are more dif-ficult to maintain and lose some of their value to developers. Toovercome this problem, we propose a domain-specific model of unittest readability based on human judgements, and use this model toaugment automated unit test generation. The resulting approach canautomatically generate test suites with both high coverage and alsoimproved readability. In human studies users prefer our improvedtests and are able to answer maintenance questions about them 14%more quickly at the same level of accuracy.

Categories and Subject Descriptors. D.2.5 [Software Engineer-ing]: Testing and Debugging – Testing Tools;

Keywords. Readability, unit testing, automated test generation

1. INTRODUCTIONUnit testing is a popular technique in object oriented program-

ming, where efficient automation frameworks such as JUnit allowunit tests to be defined and executed conveniently. However, pro-ducing good tests is a tedious and error prone task, and over theirlifetime, these tests often need to be read and understood by differentpeople. Developers use their own tests to guide their implemen-tation activities, receive tests from automated unit test generationtools to improve their test suites, and rely on the tests written bydevelopers of other code. Any test failures require fixing either thesoftware or the failing test, and any passing test may be consultedby developers as documentation and usage example for the codeunder test. Test comprehension is a manual activity that requiresone to understand the behavior represented by a test — a task thatmay not be easy if the test was written a week ago, difficult if itwas written by a different person, and challenging if the test wasgenerated automatically.

How difficult it is to understand a unit test depends on manyfactors. Unit tests for object-oriented languages typically consist of

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

ElementName elementName0 = new ElementName("", "");Class<Object> class0 = Object.class;VirtualHandler virtualHandler0 = new VirtualHandler(

elementName0, (Class) class0);Object object0 = new Object();RootHandler rootHandler0 = new RootHandler((ObjectHandler

) virtualHandler0, object0);ObjectHandlerAdapter objectHandlerAdapter0 = new

ObjectHandlerAdapter((ObjectHandlerInterface)rootHandler0);

assertEquals("ObjectHandlerAdapter",objectHandlerAdapter0.getName());

ObjectHandlerAdapter objectHandlerAdapter0 = newObjectHandlerAdapter((ObjectHandlerInterface) null);

assertEquals("ObjectHandlerAdapter",objectHandlerAdapter0.getName());

Figure 1: Two versions of a test that exercise the same functionalitybut have a different appearance and readability.

sequences of calls to instantiate various objects, bring them to appro-priate states, and create interactions between them. The particularchoice of sequence of calls and values can have a large impact on theresulting test. For example, consider the pair of unit tests shown inFigure 1. Both tests exercise the same functionality with respect tothe constructor of the class ObjectHandlerAdaptor in the Xi-neo open source project (which treats null and rootHandler0arguments identically). Despite this identical coverage of the subjectclass in practice, they are quite different in presentation.

In terms of concrete features that may affect comprehension, thefirst test is longer, uses more different classes, defines more variables,has more parentheses, has longer lines. The visual appearanceof code in general is referred to as its readability — if code isnot readable, intuitively it will be more difficult to perform anytasks that require understanding it. Despite significant interest frommanagers and developers [8], a general understanding of softwarereadability remains elusive. For source code, Buse and Weimer [7]applied machine learning on a dataset of code snippets with humanannotated ratings of readability, allowing them to predict whethercode snippets are considered readable or not. Although unit testsare also just code in principle, they use a much more restrictedset of language features; for example, unit tests usually do notcontain conditional or looping statements. Therefore, a general codereadability metric may not be well suited for unit tests.

In this paper, we address this problem by designing a domain-specific model of readability based on human judgements that ap-plies to object oriented unit test cases. To support developers inderiving readable unit tests, we use this model in an automated ap-proach to improve the readability of unit tests, and integrate this intoan automated unit test generation tool. In detail, the contributionsof this paper are as follows:

• An analysis of the syntactic features of unit tests and their

Does Automated White-Box Test GenerationReally Help Software Testers?

Gordon Fraser1 Matt Staats2 Phil McMinn1 Andrea Arcuri3 Frank Padberg4

1Department of 2Division of Web Science 3Simula Research 4Karlsruhe Institute ofComputer Science, and Technology, Laboratory, Technology,

University of Sheffield, UK KAIST, South Korea Norway Karlsruhe, Germany

ABSTRACTAutomated test generation techniques can efficiently produce testdata that systematically cover structural aspects of a program. Inthe absence of a specification, a common assumption is that thesetests relieve a developer of most of the work, as the act of testingis reduced to checking the results of the tests. Although this as-sumption has persisted for decades, there has been no conclusiveevidence to date confirming it. However, the fact that the approachhas only seen a limited uptake in industry suggests the contrary, andcalls into question its practical usefulness. To investigate this issue,we performed a controlled experiment comparing a total of 49 sub-jects split between writing tests manually and writing tests with theaid of an automated unit test generation tool, EVOSUITE. We foundthat, on one hand, tool support leads to clear improvements in com-monly applied quality metrics such as code coverage (up to 300%increase). However, on the other hand, there was no measurableimprovement in the number of bugs actually found by developers.Our results not only cast some doubt on how the research commu-nity evaluates test generation tools, but also point to improvementsand future work necessary before automated test generation toolswill be widely adopted by practitioners.

Categories and Subject Descriptors. D.2.5 [Software Engineer-ing]: Testing and Debugging – Testing Tools;

General Terms. Algorithms, Experimentation, Reliability, Theory

Keywords. Unit testing, automated test generation, branch cover-age, empirical software engineering

1. INTRODUCTIONControlled empirical studies involving human subjects are not

common in software engineering. A recent survey by Sjoberg etal. [28] showed that out of 5,453 analyzed software engineeringarticles, only 1.9% included a controlled study with human sub-jects. For software testing, several novel techniques and tools havebeen developed to automate and solve different kinds of problemsand tasks—however, they have, in general, only been evaluated us-ing surrogate measures (e.g., code coverage), and not with human

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ISSTA ’13, July 15âAS20, 2013, Lugano, SwitzerlandCopyright 13 ACM 978-1-4503-2159-4/13/07 ...$10.00.

testers—leaving unanswered the more directly relevant question:Does technique X really help software testers?

This paper addresses this question in the context of automatedwhite-box test generation, a research area that has received muchattention of late (e.g., [8, 12, 18, 31, 32]). When using white-boxtest generation, a developer need not manually write the entire testsuite, and can instead automatically generate a set of test inputsthat systematically exercise a program (for example, by coveringall branches), and only need check that the outputs for the test in-puts match those expected. Although the benefits for the developerseem obvious, there is little evidence that it is effective for practicalsoftware development. Manual testing is still dominant in industry,and research tools are commonly evaluated in terms of code cover-age achieved and other automatically measurable metrics that canbe applied without the involvement of actual end-users.

In order to determine if automated test generation is really help-ful for software testing in a scenario without automated oracles, weperformed a controlled experiment involving 49 human subjects.Subjects were given one of three Java classes containing seededfaults and were asked to construct a JUnit test suite either manu-ally, or with the assistance of the automated white-box test genera-tion tool EVOSUITE [8]. EVOSUITE automatically produces JUnittest suites that target branch coverage, and these unit tests containassertions that reflect the current behaviour of the class [10]. Con-sequently, if the current behaviour is faulty, the assertions reflectingthe incorrect behaviour must be corrected. The performance of thesubjects was measured in terms of coverage, seeded faults found,mutation score, and erroneous tests produced.

Our study yields three key results:1. The experiment results confirm that tools for automated test

generation are effective at what they are designed to do—producing test suites with high code coverage—when com-pared with those constructed by humans.

2. The study does not confirm that using automated tools de-signed for high coverage actually helps in finding faults. Inour experiments, subjects using EVOSUITE found the samenumber of faults as manual testers, and during subsequentmutation analysis, test suites did not always have higher mu-tation scores.

3. Investigating how test suites evolve over the course of a test-ing session revealed that there is a need to re-think test gen-eration tools: developers seem to spend most of their timeanalyzing what the tool produces. If the tool produces a poorinitial test suite, this is clearly detrimental for testing.

These results, as well as qualitative feedback from the study par-ticipants, point out important issues that need to be addressed inorder to produce tools that make automated test generation withoutspecifications practicably useful for testing.

A

Does Automated Unit Test Generation Really Help Software Testers?A Controlled Empirical Study

Gordon Fraser, Department of Computer Science, University of Sheffield,Regent Court, 211 PortobelloS1 4DP, Sheffield, [email protected] Staats, SnT Centre for Security, Reliability and Trust, University of Luxembourg,4 rue Alphonse WeickerL-2721 Luxembourg, Luxembourg,[email protected] McMinn, Department of Computer Science, University of Sheffield,Regent Court, 211 PortobelloS1 4DP, Sheffield, [email protected] Arcuri, Certus Software V&V Center at Simula Research Laboratory,P.O. Box 134, Lysaker, [email protected] Padberg, Karlsruhe Institute of Technology,Karlsruhe, Germany

Work on automated test generation has produced several tools capable of generating test data which achieveshigh structural coverage over a program. In the absence of a specification, developers are expected to manuallyconstruct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated testsease the task of testing for the developer, as testing is reduced to checking the results of tests. While thisassumption has persisted for decades, there has been no conclusive evidence to date confirming it. However,the limited adoption in industry indicates this assumption may not be correct, and calls into question thepractical value of test generation tools. To investigate this issue, we performed two controlled experimentscomparing a total of 97 subjects split between writing tests manually and writing tests with the aid of anautomated unit test generation tool, EVOSUITE. We found that, on one hand, tool support leads to clearimprovements in commonly applied quality metrics such as code coverage (up to 300% increase). However, onthe other hand, there was no measurable improvement in the number of bugs actually found by developers.Our results not only cast some doubt on how the research community evaluates test generation tools, butalso point to improvements and future work necessary before automated test generation tools will be widelyadopted by practitioners.

Categories and Subject Descriptors: D.2.5 [Software Engineering]: Testing and Debugging

General Terms: Algorithms, Experimentation, Reliability, Theory

Additional Key Words and Phrases: Unit testing, automated test generation, branch coverage, empiricalsoftware engineering

This work is supported by a Google Focused Research Award on “Test Amplification”; the National ResearchFund, Luxembourg (with grant FNR/P10/03); EPSRC grants EP/K030353/1 (“EXOGEN”) and EP/I010386/1(“RE-COST: REducing the Cost of Oracles in Software Testing”); and the Norwegian Research Council.Permission to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrightsfor components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of thiswork in other works requires prior specific permission and/or a fee. Permissions may be requested fromPublications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected]� YYYY ACM 1049-331X/YYYY/01-ARTA $15.00DOI:http://dx.doi.org/10.1145/0000000.0000000

ACM Transactions on Software Engineering and Methodology, Vol. V, No. N, Article A, Pub. date: January YYYY.

Developers spend up to 50% of their time in understanding and analyzing the output of automatic tools. Fraser et al.

“Professional developers perceive generated test cases as hard to understand.” Dana et al.

25

Example of Test Case Generated by Evosuite

Test Case Automatically Generated by Evosuite

(for the class apache.commons.Option.Java)

}26




Not Meaningful Names for Test Methods

It is difficult to tell, without reading the contents of the target c lass,what is the behavior under test.}

27



28




Our Solution: Automatically Generate Summaries of Test Cases

29

Our Solution: Automatically Generate Summaries of Test Cases

Sebastiano Panichella, Annibale Panichella, Moritz Beller, Andy Zaidman, and Harald Gall: “The impact of test case summaries on bug fixing performance: An empirical investigation” - ICSE 2016.

30

Empirical Study: Evaluating the Usefulness of the Generated Summaries

Bug Fixing:

WITHComments

WITHOUTComments

WITHOUT

WITH

WITHComments

WITHOUTComments

WITHOUT

WITH

30 Developers:- 22 Researchers- 8 Professional Developers

31

15

15

Results

30 Developers:- 22 Researchers- 8 Professional Developers

32

Future work…

Automatic (Re-)Documenting Test Cases…

Automatic Optimize Test Cases Readabilityby Minimizing (the Generated)Code Smells

Automatic Assigning/Generating Meaningful names for test cases

Summarization Techniques for Code, Changes, and Testing

Presentations & Public Speaking

Transcript of Summarization Techniques for Code, Changes, and Testing