Inferring the type systems under data processing programs

Click here to load reader

download Inferring the type systems under data processing programs

of 23

description

Inferring the type systems under data processing programs. Motivation. Data processing programs Retrieving runtime system status, recorded information, … On specific APIs Type systems (structure ?) of data sources are necessary for inspecting and developing programs - PowerPoint PPT Presentation

Transcript of Inferring the type systems under data processing programs

Inferring the type systems under data processing programs

Inferring the type systems under data processing programs1MotivationData processing programsRetrieving runtime system status, recorded information, On specific APIsType systems (structure?) of data sources are necessary for inspecting and developing programsWhat kinds of data, relations, how to invoke the APINot easy to establish the type systemsGeneric APIs do not reflect the data typesSufficient and accurate documents are not always availableReading source code is not always practical

2This workSystematically inferring the type systems of data sources, through static analysis of data processing programsFor inspection: detecting problems related to data usagesFor programming: sample code snippets for retrieving a specific type of dataBasic ideaRecover entire data flow of the programClarify the different ways of API invocations to retrieve dataChallengesBig scale and complex structure of source codeComplex data retrieving logic3Data processing programsA simple exampleRetrieving memory information of JEE serverThrough JMX API

4Type system under this program

5Inferring the type system

MemoryVerbosegetAttribute_VerboseSee what data it gets, and what other data used when getting them.6Challenging for practical programs

Complex data flow

One instruction to retrieve different kinds of data7Approach Overview

8Source Code AnalysisPurpose: recover the data flow from source codeSource code abstractionObject- and call-site-sensitive points-to analysisAbout points-to analysisA heap H storing all objects allocated in the source codeA points-to mapping Pt showing what objects a variable may point toExtension to typical points-to analysisTracing the API invocation results: new obtained objectsDepends on constant values: pre-calculation on constants9

10Data type inferenceRaw inferenceA new calculus to clarify API invocationsConstruct classes and associations accordinglyCode snippets slicingBackward slicing along data flowMeta-model refinementRemove redundant duplicated elementsMeta-model decorationNames, multiplicity11Raw meta-modeling

12Code snippet slicing

13Refinement and decorationRefinementRewriting rulesDecorationEmpirical namingprinciples

14ImplementationPoints-to analysis: Extend WALAInference: Implement thealgorithms onWALAEMF

15ExperimentsTo evaluation the following three aspectsApplied to practical data sources and programsUseful for inspecting existing programsUseful for writing new programsThree experimentsInference test on typical data sources and open source programsResult investigation, finding problems for the programsUser study, comparing the programing efficiency with and without the inferred type system16Inference test

17Inspection with type systemsInformal but interesting finds for the selected programsVersion incompatibilityTwo programs on JOnAS, CarteBlanche and jonasAdmin (4.7)A DeploymentPlan type in CarteBlanche but not jonasAdminConjectures: DeploymentPlan is a feature in a later version and CarteBlanche is not compliant to JOnAS 4.7Confirmed by their documentsIncompete supportJabRef sub function to import from MS Bib reference source76 out of 77 XML elements supported, without RefOrderIndicating potential improvementConclusion: Assist developer in detecting wrong or sufficient use of data source

18User studyFour data sources (Exists, JOnAS, Flickr, GeoRss)12 problems about retrieving dataQ1: get the ID of a query under processingQ2: get the ID of a running job6 volunteers, 3 grad, 1 ugrad, 2 engineersExperiment resultProgramming efficiency (time spent)Programming processes19User study result

20FindingsProcessWithout type systemsMost chose to search the sample clientsHard to find the proper keywordSome chose to use the XML schema, but block a while for writing codeSometimes miss the relation between problemsWith type systemsRead the meta-model intuitively, chose the element, go onResultReally improveSignificant for related problemsSignificant for non-expert developers

21Related workAPI programming assistantRestraint: summarize and detect bad smellsGuidance: Not formal or precise, but show potential waysA guidance approach, but for data not API itselfData type inferenceInferring data types from text and XMLNot from data themselves, but the programs using them, no need for huge amount of sample setPoints-to analysis: A new usage and corresponding extensionDef-use analysis: not just uses, but the compositions of uses to form sufficient and independent invocation22ConclusionA novel approach to inferring type systems of data sources under data processing programsUsage and extend points-to analysisA new calculus to clarify different API usagesExperiments to show this approachApplies to practical data sources and programsAssist program inspectionAssist writing new data processing programsFuture workAccuracy improvementMore experiments on different APIs23