Using Compilation/Decompilation to Enhance Clone Detection

30
IWSC ‘17 CREST, University College London, UK Using Compilation/Decompilation to Enhance Clone Detection Chaiyong Ragkhitwetsagul, Jens Krinke

Transcript of Using Compilation/Decompilation to Enhance Clone Detection

Page 1: Using Compilation/Decompilation to Enhance Clone Detection

IWSC ‘17

CREST, University College London, UK

Using Compilation/Decompilation to Enhance Clone Detection

Chaiyong Ragkhitwetsagul, Jens Krinke

Page 2: Using Compilation/Decompilation to Enhance Clone Detection

Clone det.

Plag det.

Comp.

Others

ccfxdeckard

iclonesnicad

simianjplag-javajplag-text

plaggiesherlocksimjavasimtext

7zncd-BZip27zncd-LZMA

7zncd-LZMA27zncd-Deflate

7zncd-Deflate647zncd-PPMd

bzip2ncdgzipncd

icdncd-bzlib

ncd-zlibxz-ncd

bsdiffdiff

py-difflibpy-fuzzywuzzy

py-jellyfishpy-ngram

py-sklearn

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1

Orig.

Dec.Ragkhitwetsagul et al., 2016

Page 3: Using Compilation/Decompilation to Enhance Clone Detection

Clone det.

Plag det.

Comp.

Others

ccfxdeckard

iclonesnicad

simianjplag-javajplag-text

plaggiesherlocksimjavasimtext

7zncd-BZip27zncd-LZMA

7zncd-LZMA27zncd-Deflate

7zncd-Deflate647zncd-PPMd

bzip2ncdgzipncd

icdncd-bzlib

ncd-zlibxz-ncd

bsdiffdiff

py-difflibpy-fuzzywuzzy

py-jellyfishpy-ngram

py-sklearn

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 F1

Orig.

Dec.

Page 4: Using Compilation/Decompilation to Enhance Clone Detection

4

O. Kononenko, C. Zhang, and M. W. Godfrey, ICSME ‘14

What Happens?Compiling Clones:

Page 5: Using Compilation/Decompilation to Enhance Clone Detection

and Decompiling

4

What Happens?Compiling Clones:

Page 6: Using Compilation/Decompilation to Enhance Clone Detection

Source Code

and Decompiling

4

What Happens?Compiling Clones:

Page 7: Using Compilation/Decompilation to Enhance Clone Detection

Existing tools

Source Code

and Decompiling

4

What Happens?Compiling Clones:

Page 8: Using Compilation/Decompilation to Enhance Clone Detection

Missing Source

Existing tools

Source Code

and Decompiling

4

What Happens?Compiling Clones:

Page 9: Using Compilation/Decompilation to Enhance Clone Detection

C. Ragkhitwetsagul J. Krinke CREST, UCL, UK

decomp. clones

clone mapper

decomp. & mapped clones

compiler

decompiler

decompiled software

clone detector

original clones

5

software

common clones

disjoint clones

manual investigation

Experimental Framework

Page 10: Using Compilation/Decompilation to Enhance Clone Detection

C. Ragkhitwetsagul J. Krinke CREST, UCL, UK 6

System Ver.Original Decompiled

Files SLOC Files SLOC

4.1.3 203 9,777 311 11,233

1.5.0 644 96,711 669 85,251

9.0 1,688 241,924 2603 256,974Apache Tomcat®

Software Systems

Page 11: Using Compilation/Decompilation to Enhance Clone Detection

7

Tool Config. Parameters

NiCad Type-1 UPI=0.0, renaming=none

Type-2 UPI=0.0, renaming=consistent

Type-3 UPI=0.3, renaming=consistent

Tools

javac Procyon NiCad

Compiler Decompiler Clone Detector

Page 12: Using Compilation/Decompilation to Enhance Clone Detection

8

Clone Mapper

decompiled clone report

DCP1(dm1, dm2)

decompiled clone pairs

softwarem1

m2

m4 m3

mn

DCP2(dm1, dm3)

DCP3(dm2, dm4)

… DCPn(dmm, dmo)

set of methods (M)mo

decompiled-and-mapped clone report

DCP*1((dm1,dm2),(m1,m2))DCP*2((dm1,dm3),(m1,m3))DCP*3((dm2,dm4),(m2,m4))

DCP*n((dmm,dmo),(mm,mo))

decompiled-and-mapped clone pairs

Page 13: Using Compilation/Decompilation to Enhance Clone Detection

C. Ragkhitwetsagul J. Krinke CREST, UCL, UK 9

Common & Disjoint Clone Pairs

Ccommon

Corig-only Cdecomp-only

Original Decompiled

Page 14: Using Compilation/Decompilation to Enhance Clone Detection

C. Ragkhitwetsagul J. Krinke CREST, UCL, UK 10

Results

Page 15: Using Compilation/Decompilation to Enhance Clone Detection

11

JUnitOriginal Decompiled

Type-1

Type-2

Type-3

6

3

Page 16: Using Compilation/Decompilation to Enhance Clone Detection

12

JFreeChartOriginal Decompiled

Type-1

Type-2

Type-3

159

155

33

15

48

1

17

27

3

Page 17: Using Compilation/Decompilation to Enhance Clone Detection

13

TomcatOriginal Decompiled

Type-1

Type-2

Type-3

217

608

20

25

141

22

3

23

1

Page 18: Using Compilation/Decompilation to Enhance Clone Detection

C. Ragkhitwetsagul J. Krinke CREST, UCL, UK 14

Manual Investigation

Page 19: Using Compilation/Decompilation to Enhance Clone Detection

15

No.

of c

lone

pai

rs

0

10

20

30

40

50

Type-1 Type-2 Type-3

47

15

1

48

15

1

Candidates TP

JFreeChart

Cforig-only

No.

of c

lone

pai

rs

0

6

12

18

24

30

Type-1 Type-2 Type-3

27

17

3

27

17

3

Cfdecomp-only

Page 20: Using Compilation/Decompilation to Enhance Clone Detection

16

No.

of c

lone

pai

rs

0

32

64

96

128

160

Type-1 Type-2 Type-3

141

2522

141

2522

Candidates TP

Tomcat

Cforig-only

No.

of c

lone

pai

rs

0

6

12

18

24

30

Type-1 Type-2 Type-3

23

31

23

31

Cfdecomp-only

Page 21: Using Compilation/Decompilation to Enhance Clone Detection

Clone set Reasons

Cforig-only

Too small after decomp.

Too diff. after decomp.

Smaller after decomp. higher dissimilarity

Unknown

Cfdecomp-only

Having deleted/added stmt., type cast, package name.

Different if-else statements

Different loop statements

Inner class methods

Unknown

Characteristics of Disjoint Clones

Page 22: Using Compilation/Decompilation to Enhance Clone Detection

Clone set ReasonsCforig-only Too small after decomp.

Too diff. after decomp.

Smaller after decomp. higher dissimilarity

Unknown

Cfdecomp-onlyHaving deleted/added stmt., type cast, package name.

Different if-else statementsDifferent loop statementsInner class methods

Unknown

JFreeChart

0 10 20 30 40 50

5

11

32

6

9

T1 T2 T3

0 2 4 6 8 10 12 14 16

12

4

3

8

12

53

Page 23: Using Compilation/Decompilation to Enhance Clone Detection

Clone set ReasonsCforig-only Too small after decomp.

Too diff. after decomp.

Smaller after decomp. higher dissimilarity

Unknown

Cfdecomp-onlyHaving deleted/added stmt., type cast, package name.

Different if-else statementsDifferent loop statementsInner class methods

Unknown

Tomcat

0 28 56 84 112 140

16

5

120

19

6

T1 T2 T3

0 5 10 15 20 25 30

20

2

1

3

2

Page 24: Using Compilation/Decompilation to Enhance Clone Detection

@Override publicRangefindRangeBounds(XYDatasetdataset){ if(dataset!=null){ Ranger=DatasetUtilities.findRangeBounds(dataset,false); if(r==null){ returnnull; }else{ returnnewRange(r.getLowerBound()+this.yOffset,r.getUpperBound()+this.blockHeight+this.yOffset); } }else{ returnnull; } }

@Override publicRangefindDomainBounds(XYDatasetdataset){ if(dataset==null){ returnnull; } Ranger=DatasetUtilities.findDomainBounds(dataset,false); if(r==null){ returnnull; } returnnewRange(r.getLowerBound()+this.xOffset, r.getUpperBound()+this.blockWidth+this.xOffset); }

O R I G I N A L

Page 25: Using Compilation/Decompilation to Enhance Clone Detection

@Override publicRangefindRangeBounds(XYDatasetdataset){ if(dataset!=null){ Ranger=DatasetUtilities.findRangeBounds(dataset,false); if(r==null){ returnnull; }else{ returnnewRange(r.getLowerBound()+this.yOffset,r.getUpperBound()+this.blockHeight+this.yOffset); } }else{ returnnull; } }

@Override publicRangefindDomainBounds(XYDatasetdataset){ if(dataset==null){ returnnull; } Ranger=DatasetUtilities.findDomainBounds(dataset,false); if(r==null){ returnnull; } returnnewRange(r.getLowerBound()+this.xOffset, r.getUpperBound()+this.blockWidth+this.xOffset); }

O R I G I N A L

Page 26: Using Compilation/Decompilation to Enhance Clone Detection

@Override publicRangefindDomainBounds(finalXYDatasetdataset){ if(dataset==null){ returnnull; } finalRanger=DatasetUtilities.findDomainBounds(dataset,false); if(r==null){ returnnull; } returnnewRange(r.getLowerBound()+this.xOffset,r.getUpperBound()+this.blockWidth+this.xOffset); }

@Override publicRangefindRangeBounds(finalXYDatasetdataset){ if(dataset==null){ returnnull; } finalRanger=DatasetUtilities.findRangeBounds(dataset,false); if(r==null){ returnnull; } returnnewRange(r.getLowerBound()+this.yOffset, r.getUpperBound()+this.blockHeight+this.yOffset); }

D E C O M P I L E D

Page 27: Using Compilation/Decompilation to Enhance Clone Detection

publicvoidclearRangeMarkers(){if(this.backgroundRangeMarkers!=null){Set<Integer>keys=this.backgroundRangeMarkers.keySet();for(Integerkey:keys){clearRangeMarkers(key);}this.backgroundRangeMarkers.clear();}if(this.foregroundRangeMarkers!=null){Set<Integer>keys=this.foregroundRangeMarkers.keySet();for(Integerkey:keys){clearRangeMarkers(key);}this.foregroundRangeMarkers.clear();}fireChangeEvent();}

publicvoidclearRangeMarkers(){if(this.backgroundRangeMarkers!=null){Setkeys=this.backgroundRangeMarkers.keySet();Iteratoriterator=keys.iterator();while(iterator.hasNext()){Integerkey=(Integer)iterator.next();clearRangeMarkers(key.intValue());}this.backgroundRangeMarkers.clear();}if(this.foregroundRangeMarkers!=null){Setkeys=this.foregroundRangeMarkers.keySet();Iteratoriterator=keys.iterator();while(iterator.hasNext()){Integerkey=(Integer)iterator.next();clearRangeMarkers(key.intValue());}this.foregroundRangeMarkers.clear();}fireChangeEvent();}

ORIGINAL

Page 28: Using Compilation/Decompilation to Enhance Clone Detection

publicvoidclearRangeMarkers(){if(this.backgroundRangeMarkers!=null){Set<Integer>keys=this.backgroundRangeMarkers.keySet();for(Integerkey:keys){clearRangeMarkers(key);}this.backgroundRangeMarkers.clear();}if(this.foregroundRangeMarkers!=null){Set<Integer>keys=this.foregroundRangeMarkers.keySet();for(Integerkey:keys){clearRangeMarkers(key);}this.foregroundRangeMarkers.clear();}fireChangeEvent();}

publicvoidclearRangeMarkers(){if(this.backgroundRangeMarkers!=null){Setkeys=this.backgroundRangeMarkers.keySet();Iteratoriterator=keys.iterator();while(iterator.hasNext()){Integerkey=(Integer)iterator.next();clearRangeMarkers(key.intValue());}this.backgroundRangeMarkers.clear();}if(this.foregroundRangeMarkers!=null){Setkeys=this.foregroundRangeMarkers.keySet();Iteratoriterator=keys.iterator();while(iterator.hasNext()){Integerkey=(Integer)iterator.next();clearRangeMarkers(key.intValue());}this.foregroundRangeMarkers.clear();}fireChangeEvent();}

ORIGINAL

Page 29: Using Compilation/Decompilation to Enhance Clone Detection

publicvoidclearRangeMarkers(){if(this.backgroundDomainMarkers!=null){finalSet<Integer>keys=this.backgroundDomainMarkers.keySet();for(finalIntegerkey:keys){this.clearDomainMarkers(key);}this.backgroundDomainMarkers.clear();}if(this.foregroundDomainMarkers!=null){finalSet<Integer>keys=this.foregroundDomainMarkers.keySet();for(finalIntegerkey:keys){this.clearDomainMarkers(key);}this.foregroundDomainMarkers.clear();}this.fireChangeEvent();}

publicvoidclearRangeMarkers(){if(this.backgroundRangeMarkers!=null){finalSetkeys=this.backgroundRangeMarkers.keySet();for(finalIntegerkey:keys){this.clearRangeMarkers(key);}this.backgroundRangeMarkers.clear();}if(this.foregroundRangeMarkers!=null){finalSetkeys=this.foregroundRangeMarkers.keySet();for(finalIntegerkey:keys){this.clearRangeMarkers(key);}this.foregroundRangeMarkers.clear();}this.fireChangeEvent();}

DECOMPILED

Page 30: Using Compilation/Decompilation to Enhance Clone Detection

26

Study on 3 real-world systems: JUnit, JFreeChart, Tomcat

Using Compilation/Decompilation to Enhance Clone Detection

1 Clone pairs before and after decompilation are mostly similar for all three clone types.

Findings:

2 One can complement the original clone results by incorporating clones after decompilation.

Characteristics of disjoint clones3

C. Ragkhitwetsagul, J. Krinke

cragkhit.github.io/crjk-iwsc17