Graph-based, Pattern-oriented, Context-sensitive Code Completion

40
Graph-based, Pattern-oriented, Context-sensitive Code Completion Anh Nguyen, Tung Nguyen, Hoan Nguyen, Ahmed Tamrawi, Hung Nguyen , Jafar Al-Kofahi, and Tien N. Nguyen Electrical and Computer Engineering Department Iowa State University

description

Graph-based, Pattern-oriented, Context-sensitive Code Completion. Anh Nguyen, Tung Nguyen, Hoan Nguyen, Ahmed Tamrawi , Hung Nguyen , Jafar Al- Kofahi , and Tien N. Nguyen Electrical and Computer Engineering Department Iowa State University. Eclipse’s Built-in Code Completion. - PowerPoint PPT Presentation

Transcript of Graph-based, Pattern-oriented, Context-sensitive Code Completion

Page 1: Graph-based, Pattern-oriented, Context-sensitive Code Completion

Graph-based, Pattern-oriented, Context-sensitive Code Completion

Anh Nguyen, Tung Nguyen, Hoan Nguyen, Ahmed Tamrawi, Hung Nguyen, Jafar Al-Kofahi, and Tien N. Nguyen

Electrical and Computer Engineering Department

Iowa State University

Page 2: Graph-based, Pattern-oriented, Context-sensitive Code Completion

2

Page 3: Graph-based, Pattern-oriented, Context-sensitive Code Completion

3

Eclipse’s Built-in Code Completion

code completion

invocation point

List of recommended methods

Documentation on a proposed

method

Page 4: Graph-based, Pattern-oriented, Context-sensitive Code Completion

4

Eclipse’s Built-in Code Completion

Filled-in code

Page 5: Graph-based, Pattern-oriented, Context-sensitive Code Completion

5

Source Code Completion

Plays an important role in modern IDEsSupports developers by

Recommending relevant codeAutomatically filling in code

Page 6: Graph-based, Pattern-oriented, Context-sensitive Code Completion

6

State-of-the-Art Code Completion

Single method/field recommendation Non-ranked list (sorted by alphabetical order) Ranked list• By return type (Ye et al. ICSE ‘02)• Via co-occurring methods (Bruch et al. FSE ‘09)• Via editing history (Robbes and Lanza ASE ’08)

Template-based recommendation (e.g., Eclipse)

Page 7: Graph-based, Pattern-oriented, Context-sensitive Code Completion

7

Template-based Code Completion

Recommending a templatewithout considering context

Page 8: Graph-based, Pattern-oriented, Context-sensitive Code Completion

8

Our Goal

A code completion approach and tool Auto-completing a high volume of code Taking into consideration the context of the

currently edited code

Page 9: Graph-based, Pattern-oriented, Context-sensitive Code Completion

9

GraPacc Approach

Developing a pattern-oriented, context-sensitive code completion approach Evaluating usefulness of our code completion method and tool

Page 10: Graph-based, Pattern-oriented, Context-sensitive Code Completion

10

Programming Pattern

//Reading a text file char-by -char using FileReader and BufferedReaderString fileName = “myfile.txt”;FileReader fReader = new FileReader(fileName);BufferedReader bReader = new BufferedReader(fReader);while(bReader.ready()){ bReader.read();}bReader.close();fReader.close();

A correct and frequent usage of API elementsIs used to perform a specific programming task

Declaring

Reading characters

Closing

Page 11: Graph-based, Pattern-oriented, Context-sensitive Code Completion

11

Pattern-oriented Completion

Multiple method invocations of multiple variables with different types and control structure (if, for,…) are recommended to adapt the currently editing code

Page 12: Graph-based, Pattern-oriented, Context-sensitive Code Completion

12

Context-sensitive Recommendation

Query A code fragment under editing A sequence of textual tokens Often incomplete and may be

un-parseable

Page 13: Graph-based, Pattern-oriented, Context-sensitive Code Completion

13

Context-sensitive Recommendation

Different cursor positions Potentially different recommendation lists

a) b)

Page 14: Graph-based, Pattern-oriented, Context-sensitive Code Completion

14

GraPacc Overview

Pattern Database

Query Processing

Searching & Ranking

Code Completion

Queryfeatures {fq}

Patternfeatures {fp}

Ranked list of patterns {P}Query Q

Patterns {P}

Filled-in code

Page 15: Graph-based, Pattern-oriented, Context-sensitive Code Completion

15

Pattern Management

Pattern Database

Query Processing

Searching & Ranking

Code Completion

Queryfeatures {fq}

Patternfeatures {fp}

Ranked list of patterns {P}Query Q

Patterns {P}

Filled-in code

Page 16: Graph-based, Pattern-oriented, Context-sensitive Code Completion

16

Pattern RepresentationGraph-based Object Usage Model - Groum [Nguyen et al. FSE ’09]

A directed acyclic graphRepresenting control and data dependencies

FileReader.new

BufferedReader.new

FileReader

FileReader fReader = new FileReader(“c:/aTextFile.txt”);BufferedReader bReader = new BufferedReader(fReader);while (bReader.ready()){}

BufferedReader

Data dependency

Control dependency

WHILE

Action node

Data node

Control node

BufferedReader.ready

Page 17: Graph-based, Pattern-oriented, Context-sensitive Code Completion

17

Features

Graph-based feature: a sequence of the textual labels of the nodes along a path of a GroumToken-based feature: a lexical token extracted in a query

Page 18: Graph-based, Pattern-oriented, Context-sensitive Code Completion

18

FeaturesFileReader.newFileReaderBufferedReader.newBufferedReaderBufferedReader.readyWHILEFileReader.newFileReaderFileReaderBufferedReader.newFileReader.newBufferedReader.newBufferedReader.new BufferedReaderBufferedReader BufferedReader.readyBufferedReader.ready WHILEFileReader.newFileReaderBufferedReader.newFileReader.new BufferedReader.new BufferedReaderFileReader.new BufferedReader.new BufferedReader.readyFileReader BufferedReader.new BufferedReaderFileReader BufferedReader.new BufferedReader.readyBufferedReader.newBufferedReaderBufferedReader.readyBufferedReader.newBufferedReader.readyWHILEBufferedReaderBufferedReader.ready->WHILE…

FileReader.new

BufferedReader.new

FileReader

BufferedReader

WHILE

BufferedReader.ready

Size-1 features

Size-2 features

Size-3 features

A feature’s size: number of nodes of the path

Page 19: Graph-based, Pattern-oriented, Context-sensitive Code Completion

19

Patterns’ Feature Weighting

Significance of feature f in pattern P (tf-idf):

Nf,P: number of occurrences of f in P NP: number of features in P Nf: number of patterns containing f N: number of patterns in pattern database

Popularity of pattern P: Pr(P)

𝑠 ( 𝑓 ,𝑃 )=𝑁 𝑓 ,𝑃

𝑁 𝑃∗ log 𝑁𝑁 𝑓

Page 20: Graph-based, Pattern-oriented, Context-sensitive Code Completion

20

Storing Patterns

GraPacc stores each pattern with Features and their weights Code templates

Inverted indexing is applied to patterns via their features.

Page 21: Graph-based, Pattern-oriented, Context-sensitive Code Completion

21

Query Processing

Pattern Database

Query Processing

Searching & Ranking

Code Completion

Queryfeatures {fq}

Patternfeatures {fp}

Ranked list of patterns {P}Query Q

Patterns {P}

Filled-in code

Page 22: Graph-based, Pattern-oriented, Context-sensitive Code Completion

Partial Program Analysis [Dagenais et al. OOPSLA ’08]

Tokenizing and parsing the code under editing into an Abstract Syntax Tree (AST)

22

public void readText(){ FileReader fReader; BufferedReader bReader = new BufferedReader(fReader);}

Method declaration

Declaration

fReader Assignment

Declaration

InitializationbReader

fReader

Method body

Page 23: Graph-based, Pattern-oriented, Context-sensitive Code Completion

23

Building Groum

Query’s AST is used to build query’s Groum

Method declaration

Declaration

fReader Assignment

Declaration

InitializationbReader

fReader

Method body

BufferedReader.new

FileReader

BufferedReader

Page 24: Graph-based, Pattern-oriented, Context-sensitive Code Completion

24

Feature Extraction

BufferedReader.new

FileReader

BufferedReader

FileReaderBufferedReader.newBufferedReaderFileReaderBufferedReader.newBufferedReader.newBufferedReaderFileReader.newBufferedReader.newBufferedReader

Remaining textual tokens

Page 25: Graph-based, Pattern-oriented, Context-sensitive Code Completion

25

Weighting Query’s Features

Context-sensitive: taking into account the current editing point and surrounding codeFeatures are weighted to represent their significance in a query

𝑤 (𝑞)=(𝑤𝑠 (𝑞)+𝑤𝑐 (𝑞 ) )∗𝑤 𝑓 (𝑞)

Size-based factor Centrality-based factor Location-based factor(Distance to the focus point)

Page 26: Graph-based, Pattern-oriented, Context-sensitive Code Completion

26

Searching and Ranking

Pattern Database

Query Processing

Searching & Ranking

Code Completion

Queryfeatures {fq}

Patternfeatures {fp}

Ranked list of patterns {P}Query Q

Patterns {P}

Filled-in code

Page 27: Graph-based, Pattern-oriented, Context-sensitive Code Completion

27

Searching and Ranking

Query Q

Pattern P1

Pattern P2

Pattern Pn

…..

rel(Pi,Q)

Page 28: Graph-based, Pattern-oriented, Context-sensitive Code Completion

28

Pattern Relevancy

Relevancy between pattern P and query Q is defined as:

𝑟𝑒𝑙 (𝑃 ,𝑄 )=Pr (𝑃 )∗ ∑𝑝∈𝑃 ,𝑞=𝑀 (𝑝)

𝑟𝑒𝑙 (𝑝 ,𝑞)

Popularity of P Weighted maximum matching

Relevancy between pairs of features p in P and q in Q

Combined relevancy

Page 29: Graph-based, Pattern-oriented, Context-sensitive Code Completion

29

Feature Relevancy

The relevancy between two features p in pattern P and q in query Q:

𝑟𝑒𝑙 (𝑝 ,𝑞 )=s (𝑝 , P )∗w (𝑞)∗∼(𝑝 ,𝑞 )

Significance of p in P

Weight of q in Q

Similarity between p and q

𝑠 (𝑝 ,𝑃 )=𝑁 𝑝 ,𝑃

𝑁 𝑃∗ log 𝑁𝑁 𝑝

𝑤 (𝑞)=(𝑤𝑠 (𝑞)+𝑤𝑐 (𝑞 ) )∗𝑤 𝑓 (𝑞)

Page 30: Graph-based, Pattern-oriented, Context-sensitive Code Completion

30

Feature Similarity

The similarity between two size-k features: Feature p of P: p1p2..pk

Feature q of Q: q1q2..qk

𝑠𝑖𝑚 (𝑝 ,𝑞 )=∏𝑖=1

𝑘

𝑛𝑠𝑖𝑚(𝑝𝑖 ,𝑞𝑖)

name-based similarity between the pair of elements pi and qi

p: FileReaderBufferedReader.newq: FileReadBufferedReader.new

Example:𝑠𝑖𝑚 (𝑝 ,𝑞 )=12∗1=

12

Page 31: Graph-based, Pattern-oriented, Context-sensitive Code Completion

31

Code Completion

Pattern Database

Query Processing

Searching & Ranking

Code Completion

Queryfeatures {fq}

Patternfeatures {fp}

Ranked list of patterns {P}Query Q

Patterns {P}

Filled-in code

Page 32: Graph-based, Pattern-oriented, Context-sensitive Code Completion

32

Aligning Nodes

FileReader.new

BufferedReader.new

FileReader

BufferedReader

FileReader.new

BufferedReader.new

FileReader

BufferedReader

String

String.new

BufferedReader.ready

FOR

BufferedReader.close

FileReader.close

Maximal alignment

Query Pattern

maximum weighted bipartite matching

Page 33: Graph-based, Pattern-oriented, Context-sensitive Code Completion

33

Inserting Nodes & Edges

FileReader.new

BufferedReader.new

FileReader

BufferedReader

FileReader.new

BufferedReader.new

FileReader

BufferedReader

String

String.new

BufferedReader.ready

FOR

BufferedReader.close

FileReader.close

Aligned nodes

Query PatternInserting unaligned nodes

Page 34: Graph-based, Pattern-oriented, Context-sensitive Code Completion

34

Empirical Evaluation

Goal: measure how accurately GraPacc recommends and fills in the current code28 subject systems: 4 systems for mining patterns 24 systems for testing

Using 197 patterns from java.util and java.io

Page 35: Graph-based, Pattern-oriented, Context-sensitive Code Completion

35

Simulation

divideand takethe firsthalf

GraPacc’sRecomm-endation

Comparing and calculating accuracy- # shared nodes between recommended and real code- # nodes that appear in real code- # nodes that appear in recommended code

Page 36: Graph-based, Pattern-oriented, Context-sensitive Code Completion

36

Accuracy results

71% of API usages are covered by API usage patterns.

Approximately 1.2 patterns are recommended for 1 test method.

Page 37: Graph-based, Pattern-oriented, Context-sensitive Code Completion

37

Conclusions

Code completion using graph-based patterns and context information

Future work includes user study and adaptive code completionDemo on Friday 10:45-12:45

GraPacc

http://home.engineering.iastate.edu/~anhnt/Research/GraPacc

Page 38: Graph-based, Pattern-oriented, Context-sensitive Code Completion

38

Page 39: Graph-based, Pattern-oriented, Context-sensitive Code Completion

39

Return-type-based Ranking

Page 40: Graph-based, Pattern-oriented, Context-sensitive Code Completion

40

Sources of inaccuracies

Customization of a pattern (e.g., two consecutive readLine method calls) Don't aim to replace developers Developers can easily customize

Lack of patterns in databaseAPI usage spans 2 methods