Code-tagging and similarity-based retrieval with myCBR

44
Code-tagging and similarity- based retrieval with myCBR Thomas Roth-Berghofer & Daniel Bahls Senior researcher, [email protected] German Research Centre for Artificial Intelligence DFKI GmbH CAMBRIDGE, UK, 10 DEC 2008 Samstag, 18. Juli 2009

description

This paper describes the code tagging plug-in coTag, which allows annotating code snippets in the integrated development environment eclipse. coTag offers an easy-to-use interface for tagging and searching. Using the similarity-based search engine of the open-source tool myCBR, the user can search not only for exactly the same tags as offered by other code tagging extensions, but also for similar tags and, thus, for similar code snippets. coTag provides means for context-based adding of new as well as changing of existing similarity links between tags, supported by myCBR’s explanation component.

Transcript of Code-tagging and similarity-based retrieval with myCBR

Page 1: Code-tagging and similarity-based retrieval with myCBR

Code-tagging and similarity-based retrieval with myCBRThomas Roth-Berghofer & Daniel BahlsSenior researcher, [email protected] German Research Centre for Artificial Intelligence DFKI GmbH

CAMBRIDGE, UK, 10 DEC 2008

Samstag, 18. Juli 2009

Page 2: Code-tagging and similarity-based retrieval with myCBR

Programmer‘s dilemma

Samstag, 18. Juli 2009

Page 3: Code-tagging and similarity-based retrieval with myCBR

Programmer‘s dilemma

Samstag, 18. Juli 2009

Page 4: Code-tagging and similarity-based retrieval with myCBR

Programmer‘s dilemma

• Where is the code fragment I used to solve a similar problem in the past?

• Is this piece of code still available?

• Is it worth the effort to search for it?

• If so, what would be the right search term?

Samstag, 18. Juli 2009

Page 5: Code-tagging and similarity-based retrieval with myCBR

Personalised approach

Samstag, 18. Juli 2009

Page 6: Code-tagging and similarity-based retrieval with myCBR

Personalised approach

• Personal vocabulary: tags

Samstag, 18. Juli 2009

Page 7: Code-tagging and similarity-based retrieval with myCBR

Personalised approach

• Personal vocabulary: tags

• Linking tags

Samstag, 18. Juli 2009

Page 8: Code-tagging and similarity-based retrieval with myCBR

Personalised approach

• Personal vocabulary: tags

• Linking tags

• Case-based retrieval

Samstag, 18. Juli 2009

Page 9: Code-tagging and similarity-based retrieval with myCBR

Personalised approach

• Personal vocabulary: tags

• Linking tags

• Case-based retrieval

• Work context

Samstag, 18. Juli 2009

Page 10: Code-tagging and similarity-based retrieval with myCBR

Personalised approach

• Personal vocabulary: tags

• Linking tags

• Case-based retrieval

• Work context

• Social dimension: tag exchange

Samstag, 18. Juli 2009

Page 11: Code-tagging and similarity-based retrieval with myCBR

CBR cycle

Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.

Samstag, 18. Juli 2009

Page 12: Code-tagging and similarity-based retrieval with myCBR

CBR cycle

Agnar Aamodt and Enric Plaza. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Communications, 7(1):39–59, 1994.

myCBRCBR

Samstag, 18. Juli 2009

Page 13: Code-tagging and similarity-based retrieval with myCBR

Code snippet & context

Java code snippet

Samstag, 18. Juli 2009

Page 14: Code-tagging and similarity-based retrieval with myCBR

Code snippet & context

Work context

• java.net.URL

• java.net.URLConnection

• java.io.InputStream

• java.lang.StringBuffer

• java.io.BufferedReader

• java.lang.String

• java.lang.Exception

Java code snippet

Samstag, 18. Juli 2009

Page 15: Code-tagging and similarity-based retrieval with myCBR

Case structureAttribute Value type category

Tags String (multiple) Problem description

Context items String (multiple) Problem description

Code snippet String Solution

Document type String Provenance

Project name String Provenance

File path String Provenance

Author ID String Provenance

Creation date Long Provenance

Rating Float Maintenance

Rating count Integer Maintenance

Samstag, 18. Juli 2009

Page 16: Code-tagging and similarity-based retrieval with myCBR

Case structureAttribute Value type category

Tags String (multiple) Problem description

Context items String (multiple) Problem description

Code snippet String Solution

Document type String Provenance

Project name String Provenance

File path String Provenance

Author ID String Provenance

Creation date Long Provenance

Rating Float Maintenance

Rating count Integer Maintenance

Set by user

Set by coTag

Samstag, 18. Juli 2009

Page 17: Code-tagging and similarity-based retrieval with myCBR

Acquiring case

Samstag, 18. Juli 2009

Page 18: Code-tagging and similarity-based retrieval with myCBR

Acquiring case

Samstag, 18. Juli 2009

Page 19: Code-tagging and similarity-based retrieval with myCBR

Query view

• Search for tags: init, logging config

• Include context => regard currently selected code

Samstag, 18. Juli 2009

Page 20: Code-tagging and similarity-based retrieval with myCBR

Retrieval

• Result for: init, logging, config

• Ranked list of code snippets

Samstag, 18. Juli 2009

Page 21: Code-tagging and similarity-based retrieval with myCBR

Presentation of cases

Samstag, 18. Juli 2009

Page 22: Code-tagging and similarity-based retrieval with myCBR

Situations in which explanations play a role

• Instructing explanations:• Novice users want to know about how tagging and (similarity-based)

retrieval works.

• Convincing explanations:• Regular users want to check when the retrieval does not meet their

expectations.

• Improving explanations• Regular users want to correct coTag‘s behaviour.

Samstag, 18. Juli 2009

Page 23: Code-tagging and similarity-based retrieval with myCBR

Explanation of matching

• Search terms: • init, logging, config

• Case tags: • init, Logger

Samstag, 18. Juli 2009

Page 24: Code-tagging and similarity-based retrieval with myCBR

Graphical explanation of trigram matching

• Syntactical similarity• Typos

• Stemming

Samstag, 18. Juli 2009

Page 25: Code-tagging and similarity-based retrieval with myCBR

Similarity customisation

• Tag similarities:

• Updates personal and community similarity measure

unsimilar 0%

partly similar 25%

similar 50%

very similar 75%

identical 100%

Samstag, 18. Juli 2009

Page 26: Code-tagging and similarity-based retrieval with myCBR

Similarity customisation

• Tag similarities:

• Updates personal and community similarity measure

unsimilar 0%

partly similar 25%

similar 50%

very similar 75%

identical 100%

Samstag, 18. Juli 2009

Page 27: Code-tagging and similarity-based retrieval with myCBR

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Page 28: Code-tagging and similarity-based retrieval with myCBR

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Page 29: Code-tagging and similarity-based retrieval with myCBR

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Page 30: Code-tagging and similarity-based retrieval with myCBR

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Page 31: Code-tagging and similarity-based retrieval with myCBR

Three levels of similarity calculation

Personal

Imported

Trigram

Samstag, 18. Juli 2009

Page 32: Code-tagging and similarity-based retrieval with myCBR

Customised (personal) and imported similarity

Samstag, 18. Juli 2009

Page 33: Code-tagging and similarity-based retrieval with myCBR

Client-side architecture

Samstag, 18. Juli 2009

Page 34: Code-tagging and similarity-based retrieval with myCBR

Client-side architecture

Samstag, 18. Juli 2009

Page 35: Code-tagging and similarity-based retrieval with myCBR

Client-side architecture

Samstag, 18. Juli 2009

Page 36: Code-tagging and similarity-based retrieval with myCBR

Tag and exchange code snippets

Samstag, 18. Juli 2009

Page 37: Code-tagging and similarity-based retrieval with myCBR

Samstag, 18. Juli 2009

Page 38: Code-tagging and similarity-based retrieval with myCBR

Samstag, 18. Juli 2009

Page 39: Code-tagging and similarity-based retrieval with myCBR

Take home messages

Samstag, 18. Juli 2009

Page 40: Code-tagging and similarity-based retrieval with myCBR

• Re-finding information is a quite typical task in knowledge-work.

Take home messages

Samstag, 18. Juli 2009

Page 41: Code-tagging and similarity-based retrieval with myCBR

• Re-finding information is a quite typical task in knowledge-work.

• Tagging is a helpful and well-known technique.

Take home messages

Samstag, 18. Juli 2009

Page 42: Code-tagging and similarity-based retrieval with myCBR

• Re-finding information is a quite typical task in knowledge-work.

• Tagging is a helpful and well-known technique.

• Similarity-based retrieval can improve searches.

Take home messages

Samstag, 18. Juli 2009

Page 43: Code-tagging and similarity-based retrieval with myCBR

• Re-finding information is a quite typical task in knowledge-work.

• Tagging is a helpful and well-known technique.

• Similarity-based retrieval can improve searches.

• Explanation-aware development of applications help you deal with increased complexity of similarity-based retrieval.

Take home messages

Samstag, 18. Juli 2009

Page 44: Code-tagging and similarity-based retrieval with myCBR

Code-tagging and similarity-based retrieval with myCBRThomas Roth-Berghofer & Daniel BahlsSenior researcher, [email protected] German Research Centre for Artificial Intelligence DFKI GmbH

CAMBRIDGE, UK, 10 DEC 2008

Thank you!

Samstag, 18. Juli 2009