Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers
description
Transcript of Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers
![Page 1: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/1.jpg)
Assieme: Finding and Leveraging Implicit
References in a Web Search Interface for Programmers
Raphael Hoffmann, James Fogarty, Daniel S. Weld
University of Washington, SeattleUIST 2007
![Page 2: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/2.jpg)
Programmers Use Search
• To identify an API• To seek information about an API• To find examples on how to use an
API
“Programmatically output an Acrobat PDF file in Java.”
Example Task:
![Page 3: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/3.jpg)
Example: General Web Search Interface
![Page 4: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/4.jpg)
Example: Code-Specific Web Search
Interface
…
![Page 5: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/5.jpg)
Problems
• Information is dispersed: tutorials, API itself, documentation, pages with samples
• Difficult and time-consuming to …– locate required pieces,– get an overview of alternatives,– judge relevance and quality of results,– understand dependencies.
• Many page visits required
![Page 6: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/6.jpg)
With Assieme we …
• Designed a new Web search interface• Developed needed inference
![Page 7: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/7.jpg)
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
![Page 8: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/8.jpg)
Six Learning Barriers faced by Programmers (Ko et
al. 04) • Design barriers — What to do?
• Selection barriers — What to use?
• Coordination barriers — How to combine?
• Use barriers — How to use?
• Understanding barriers — What is wrong?
• Information barriers — How to check?
![Page 9: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/9.jpg)
Examining Programmer Web Queries
Objective• See what programmers search for
Dataset• 15 million queries and click-through data• Random sample of MSN queries in 05/06
Procedure• Extract query sessions containing ‘java’ – 2,529• Manual looking at queries and defining regex
filters• Informal taxonomy of query sessions
![Page 10: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/10.jpg)
Examining Programmer Web Queries
![Page 11: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/11.jpg)
Examining Programmer Web Queries
Descriptive Contain package, type or member name
Contain terms like “example”, “using”, “sample code”
64.1 % 35.9 %
17.9 %
“java JSP current date” “java SimpleDateFormat”
“using currentdate in jsp”
Selection barrier Use barrier
Coordination barrier
![Page 12: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/12.jpg)
Assieme
example
code
documentation
required
libaries
relevance indicated by
# uses
Summaries show
referenced types
links torelated
info
![Page 13: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/13.jpg)
Challenges
How to put the right information on the interface ?
• Get all programming-related data• Interpret data and infer relationships
![Page 14: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/14.jpg)
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
![Page 15: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/15.jpg)
Assieme’s Data
… is crawled using existing search engines
Pages withcode examples JAR files JavaDoc pages
Queried Google on“java ±import ±class …”
Queried Google on“overview-tree.html …”
Downloaded libraryfiles for all projects onSun.com, Apache.org,
Java.net, SourceForge.net
~2,360,000 ~79,000 ~480,000
![Page 16: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/16.jpg)
The Assieme Search Engine
… infers 2 kinds of implicit references
JAR files
JavaDoc pages
Pages withcode examples
Uses of packages,
types and members
Matches of packages,
types and members
?
![Page 17: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/17.jpg)
unclear segmentation
Extracting Code Samples
code in a different language (C++)distracting terms ‘…’ in code
line numbers
![Page 18: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/18.jpg)
Extracting Code Samples
remove HTML commands,but preserve line breaksremove some distracters by heuristicslaunch (error-tolerant) Java parser at every line break
(separately parse for types, methods, and sequences of statements)
<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>
<html><head><title></title></head><body>A simple example:<br><br> 1: import java.util.*; <br>2: class c {<br>3: HashMap m = new HashMap();<br>4: void f() { m.clear(); }<br>5: }<br><br><a href=“index.html”>back</a></body></html>
A simple example:
1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }
back
A simple example:
1: import java.util.*;2: class c {3: HashMap m = new HashMap();4: void f() { m.clear(); }5: }
back
A simple example:
import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}
back
A simple example:
import java.util.*;class c {HashMap m = new HashMap();void f() { m.clear(); }}
back
![Page 19: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/19.jpg)
Resolving External Code References
Naïve approach of finding term matches does not work:
1 import java.util.*;2 class c {3 HashMap m = new HashMap();4 void f() { m.clear(); }5 }
Reference java.util.HashMap.clear() on line 4 only detectable by considering several lines
?
Use compiler to identify unresolved names
![Page 20: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/20.jpg)
Resolving External Code References
• Index packages/types/members in Jar files
JARfiles
Utility function:# covered references(and JAR
popularity)
java.util.HashMap.clear()java.util.HashMap…
greedily pickbest JARs
JARfiles
unresolved names
compile
indexlookup
put onclasspath
• Compile & lookup
![Page 21: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/21.jpg)
Scoring
• Existing techniques …
– Docs modeled as weighted term frequencies– Hypertext link analysis (PageRank)
– JAR files (binary code) provide no context– Source code contains few relevant keywords– Structure in code important for relevance
• … do not work well for code, because:
![Page 22: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/22.jpg)
Using Implicit References to Improve Scoring
• Assieme exploits structure on Web pages
HTML hyperlinks
and structure in code
code references
![Page 23: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/23.jpg)
Scoring
APIs(packages/types/members)
Web pages
![Page 24: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/24.jpg)
Scoring
APIs• Use text on doc pages and on pages with
code samples that reference API (~ anchor text)
• Weight APIs by #incoming refs (~ PageRank)
Web Pages• Use fully qualified references
(java.util.HashMap) and adjust term weights• Filter pages by references• Favor pages with accompanying text
![Page 25: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/25.jpg)
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
![Page 26: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/26.jpg)
Evaluating Code Extraction and Reference Resolution
… on 350 hand-labeled pages from Assieme’s data
Reference Resolution• Recall 89.6%, Precision 86.5% • False positives: Fisheye and diff pages• False negatives: incomplete code samples
Code Extraction• Recall 96.9%, Precision 50.1% ( 76.7%)• False positives: C, C#, JavaScript, PHP,
FishEye/diff• (After filtering pages without refs: precision 76.7%)
![Page 27: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/27.jpg)
User Study
Assieme vs. Google vs. Google Code Search
Design• 40 search tasks based on queries in logs:
query “socket java” “Write a basic server that communicates using Sockets”
• Find code samples (and required libraries)• 4 blocks of 10 tasks: 1 for training + 1 per
interfaceParticipants• 9 (under-)graduate students in Computer Science
![Page 28: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/28.jpg)
User Study – Task Time
Assieme Google GCS0
50
100
150
seco
nd
s (
SE
M)
F(1,258)=5.74p ≈ .017
F(1,258)=1.91p ≈ .17
*significant
![Page 29: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/29.jpg)
User Study – Solution Quality
0 seriously flawed .5 generally good but fell short in critical regard1 fairly complete
Assieme Google GCS0.0
0.2
0.4
0.6
0.8
1.0
qu
alit
y (
SE
M)
F(1,258)=55.5p < .0001F(1,258)=6.29
p ≈ .013**
![Page 30: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/30.jpg)
User Study – # Queries Issued
Assieme Google GCS0.0
0.5
1.0
1.5
2.0
2.5
#qu
erie
s (
SE
M)
F(1,259)=9.77p ≈ .002
F(1,259)=6.85p ≈ .001
**
![Page 31: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/31.jpg)
Outline
• Motivation• What Programmers Search For• The Assieme Search Engine
– Inferring Implicit References– Using Implicit References for Scoring
• Evaluation of Inference & User Study• Discussion & Conclusion
![Page 32: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/32.jpg)
Discussion & Conclusion
• Assieme – a novel web search interface• Programmers obtain better solutions,
using fewer queries, in the same amount of time
• Using Google subjects visited 3.3 pages/task, using Assieme only 0.27 pages, but 4.3 previews
• Ability to quickly view code samples changed participants’ strategies
![Page 33: Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers](https://reader036.fdocuments.in/reader036/viewer/2022062517/56813bec550346895da51b92/html5/thumbnails/33.jpg)
Thank YouRaphael Hoffmann
Computer Science & EngineeringUniversity of Washington
James FogartyComputer Science & Engineering
University of [email protected]
Daniel S. WeldComputer Science & Engineering
University of [email protected]
This material is based upon work supported by the National Science Foundation under grant IIS-0307906, by the Office of Naval Research under grant N00014-06-1-0147, SRI International under CALO grant 03-000225 and the Washington Research Foundation / TJ Cable Professorship.