Lori Pollock Professor, CIS Program Analysis, Software Development & Maintenance Tools, Optimizing...
Transcript of Lori Pollock Professor, CIS Program Analysis, Software Development & Maintenance Tools, Optimizing...
Lori PollockProfessor, CIS
Program Analysis, Software Development & Maintenance Tools, Optimizing
Compilers ‘81 B.S. CS and Econ, Allegheny ’81-’86 PhD in CS, U of Pittsburgh
Married Mark’86-’90 Assistant Prof, Rice U
Lauren ‘88; Lindsay ‘90’91- Assistant, Associate, Full Prof UD CIS
Matt ’95Today: Lauren & Lindsay here at UD, Matt 15, 3 PhD
students, 3 masters students, and a few undergraduate researchers
What I do here at UD• Research
– Software Engineering and Compilation Lab (Hiperspace)• 213 Smith Hall
– Collaborations• Vijay Shanker (UD CIS), Terry Harvey (UD CIS), Lisa Marvel (Army
Research Lab), Martin Swany (UD CIS), Guang Gao (UD ECE)
– Funding• Primarily NSF grants; previously some Army funding
• Graduate Teaching – CISC 672 Compilers– CISC 673 Program Analysis and Transformations– CISC 879 Software Testing and Maintenance– CISC 879 Software Tools and Environments
• Undergraduates– Programming XO laptops for local middle school teachers– Study abroad programs
What I do outside UD• Associate Editor, Transactions on Software
Engineering and Methodology (TOSEM)
• Computing Research Association (CRA)’s Committee on the Status of Women in Computer Research (CRA-W)
• Mentoring – speaker at mentoring
workshops for undergrads, grads, assistant and associate profs, and industry lab researchers
• Program committees, conf org,NSF panels, paper reviews,… (typical of university
researchers)
Current Students in Training
Xiaoran Wang PhD
Antony Danalis PhD
Giri Sridhara PhD
Masters: Divya, Suparna, PoonamCurrent Undergraduates: Sana Malik, Michelle Allen
Recently Completed PhD 2007-10
Mike JochenAssistant ProfEast Stroudsburg U
Sara SprenkleAssistant ProfWashington & Lee U
David ShepherdPostdoc, Startup
Ben BreechArmy ResearchLab
Emily Gibson HillAssist Prof, Montclair State U
Overview: Research Projects
Program Analysis
Compiler Technology
Natural LanguageAnalysis of Programs Testing Web
Applications
Optimizing Cluster Parallel Programs
IdentifyingCode vulnerabilities To Malware
Software Tools…………..Testing…..........Compilers….……Parallel Computing
Giri, XiaoranShanker, Hill
Antony,Swany
Sprenkle
ClauseWinbladh Green Software
Engineering
Optimizing Cluster Parallel Programs
Research Problem - How can scientific codes be scaled to a cluster of many CPUs?
Major Challenge – Communication Costs
Approach and Contributions:
An integrated system to hide communication latency
-Surveyor: Collect “knowledge” of cluster
-Compiler: analyze dependencies and transform to create maximal communication/computation overlap
-Communication Library: Use a companion library to MPI
ASPhALT: Automatic System for Parallel AppLication
Transformations
Contribution: FIRST to cluster-optimize MPI codes
Testing Web Applications
• Combination of– Stand-alone applications– GUIs and Database applications– Distributed applications
• Numerous technologies and components
Web Application Structure
Front End Back End
Client(HTML)
Server(Java)
Database(MySql)
Browser
Traditional Software Testing Process
Oracle Pass/ Fail
Replay Tool
TestCases
TestCases
ActualResults
Hard to obtain when testing web applications!!
ExpectedResults
ApplicationSpecification
Test Case Generator
ApplicationRepresentation
ApplicationImplementation
User-session-based Testing
User-session-based Testing Process
Oracle Pass/ Fail
ActualResults
ExpectedResults
Log UserRequests
User 1:register.jsp?name=ss&pass=tstlogin.jsp?name=ss&pass=tstlogout.jsp
Beta Web Application (v.0.9) Deployment
Users
TestCases
Web Application Implementation (v.1.0)
UserSessions
Replay Tool
Test Cases
Create testcases
Create/ReduceTest Cases
User-session-based
Maintenance Testing for Web Applications
Research Problem: How can we exploit user session logging for testing of web applications after initial deployment, with minimal tester effort?
Contributions: Scalable, practical, automated structural testing framework for web applications
* Test case generation * Test suite reduction * Test oracles * Test coverage criteria in terms of URLs, parameters, values
Analyzing the Names in SoftwareResearch Problem
- 60-90% software costs are in reading and navigating large software systems to fix bugs and add new features. Can we help with automation of search, navigation, location of relevant code?
- Key: Programmers leave clues of their intent as they choose names.
Proposed Approach – Develop, extend, and apply natural language-based analysis to
the identifier names and comments
Contribution - Aid understanding, debugging, maintenance, development
Focus on actions -Correspond to verbs-Verbs need Direct Object- Phrases more useful
NLPA
Our Research Focus and Impact
Software Maintenance ToolsSoftware Maintenance Tools
SearchSearch UnderstandingUnderstanding ……
Natural Language Analysis
Word relations(synonyms, antonyms, …
Word relations(synonyms, antonyms, …
Part of speech tagging
Part of speech tagging Abbreviations…Abbreviations…Word splittingWord splitting
ExplorationExploration
Search MethodSearch MethodSearch MethodSearch Method
1. User formulates query
2. Query executed by search method
3. User views search results
4. Repeat as necessary
How do SE tool users search code now?
User faces 2 challenges:• Decide what query words to search for• Determine whether the results are relevant
Our focus: Natural language (NL) queries
SourceCode
SearchSearchResultsResultsSearchSearchResultsResults
Query
Determine Relevance of Results
(Re)formulate Query
User
“compile report” vs. “compil*report”
Our Contextual Search Process
Search MethodSearch MethodSearch MethodSearch Method
SourceCode
SearchSearchResultsResultsSearchSearchResultsResults
Query
Determine Relevance of Results(Re)formulate
Query
User
Search MethodSearch Method Search MethodSearch Method
SourceCode
Information Extraction Process
Information Extraction Process
Partial Phrase Matching
Partial Phrase Matching
NL Phrase Mapping
NL Phrase Mapping
Hierarchical
Information Extraction Process
Information Extraction Process
Another Tool - UD-Summarize: An Example
/* Update linear edge view. If width <= 1, draw line to given graphics2d,
else draw polyline to graphics2d */
Challenges in Summary Comment Generation for Methods
• Method name does not always suffice as summary Ex: main(), returnPressed()
• There is no notion of what constitutes an important statement for a method summary
• A selected statement may have too little or too much detail for a succinct summary
• Generated phrases should be understandable without reading the associated code and its context within the method body
UD-Summarize: The Process
UD-GenPhrase: Generate Phrases for Selected Statements and Combine
Phrases
Method M signature and body
Summary comment for M
Build required structural and linguistic program representations
NLPA
Developing Basic NL Analyses
Software Maintenance ToolsSoftware Maintenance Tools
SearchSearch UnderstandingUnderstanding ……
Natural Language Analysis
Word relations(synonyms, antonyms, …
Word relations(synonyms, antonyms, …
Part of speech tagging
Part of speech tagging Abbreviations…Abbreviations…Word splittingWord splitting
ExplorationExploration
public UserList getUserListFromFile( String path ) throws IOException {
try {
File tmpFile = new File( path );
return parseFile(tmpFile);
} catch( java.io.IOException e ) {
throw new IOrException( ”UserList format issue" + path + " file " + e );
}
}
Illustrating some issues: Extracting Clues from
Signatures1. Split Name into Words2. Part-of-speech tag method name3. Chunk method name4. Identify Verb and Direct-Object (DO)
get<verb> User<adj> List<noun> From <prep> File <noun>
get<verb phrase> User List<noun phrase> From File <prep phrase>
POS Tag
Chunk
Automatic Abbreviation Expansion
1.Split Identifiers:2.Identify non-
dictionary words3.Determine long
form
non-dictionary wordnon-dictionary word
no boundaryno boundary
• Don’t want to miss relevant code with abbreviations
• Given a code segment, identify character sequences that are short forms and determine long form
• Approach: Mine expansions from code [MSR 08]
Issues with a Simple Dictionary
Approach• Manually create a lookup table of common
abbreviations in code- Vocabulary evolves over time, must maintain
table- Same abbreviation can have different expansions
depending on domain AND context:
cfg?Control Flow Graph
Context-Free Grammar
configuration
configure
Overview: Research Projects
Program Analysis
Compiler Technology
Natural LanguageAnalysis of Programs Testing Web
Applications
Optimizing Cluster Parallel Programs
IdentifyingCode vulnerabilities To Malware
Software Tools…………..Testing…..........Compilers….……Parallel Computing
Giri, XiaoranShanker, Hill
Antony,Swany
Sprenkle
ClauseWinbladh Green Software
Engineering
What are we doing now?• Emily
– Developing a general software word usage model– Implementing SWUM– Evaluating for search
• Giri– Developing techniques for automatic comment generation
• Which statements to include in summary content?• How to generate phrases for that content?
– Providing feedback on SWUM refinement
• Divya– Analyzing specific Java statement structures – Developing templates for comment phrase generation
What have we learned overall?
• Evaluation studies indicateNatural language analysis has far more potential to improve software maintenance tools than we initially believed
• Existing technology falls shortSynonyms, collocations, morphology, word frequencies, part-of-speech tagging, AOIG
• Keys to further successImprove recall
Extract additional NL clues
Another of our Tools: Dora the Program
Explorer*
* Dora comes from exploradora, the Spanish word for a female explorer.
DoraDora
Natural Language Query• Maintenance request• Expert knowledge• Query expansion
Natural Language Query• Maintenance request• Expert knowledge• Query expansion
Relevant Neighborhood
Program Structure• Representation
• Current: call graph• Seed starting point
Relevant Neighborhood• Subgraph relevant to query
Query