Software Networks
description
Transcript of Software Networks
![Page 1: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/1.jpg)
Software Networks
Christian Bird
Computer Science Dept.
UC Davis
![Page 2: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/2.jpg)
A network like any other
• A software network is made up of– Nodes: software artifacts– Edges: relationships between those artifacts
(may be directed or undirected)
functionmodule
class file
imports
co-comitted
includes
requires
![Page 3: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/3.jpg)
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
![Page 4: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/4.jpg)
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions (3000 in apache)– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c;}
![Page 5: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/5.jpg)
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
Class Logger { int logItem(Object item, int level) { stuff… } int logError(String msg) { more stuff… } more functions…}
![Page 6: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/6.jpg)
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files (300 in apache)– Modules/Packages– Directories– Libraries
Nodes
math.cfloat absoluteValue(float a) { return a > 0 ? a : -a;}
void printName(char *name) { printf(“Hello %s\n”, name);}
more functions…
![Page 7: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/7.jpg)
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Nodes
class Logger { stuff…}
class LogMessage { stuff…}
class LogError { stuff…}
more classes…
![Page 8: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/8.jpg)
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories (65 in apache)– Libraries
Nodes
/apache/http-2.0/server/core/handle.c/apache/http-2.0/server/core/serve.c/apache/http-2.0/server/core/cgi.c/apache/http-2.0/server/core/locking.c
![Page 9: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/9.jpg)
• The nodes in a software network usually represent software artifacts at various levels of granularity– Functions– Classes– Files– Modules/Packages– Directories– Libraries (25 in apache)
Nodes
libkdeinit_konqueror.solibkonq.so.4libkutils.so.1libkio.so.4libkdeui.so.4libkdesu.so.4libkdecore.so.4libDCOP.so.4libdl.so.2libresolv.so.2libutil.so.1libart_lgpl_2.so.2 libidn.so.11libqt-mt.so.3libpng12.so.0libXext.so.6 libX11.so.6libSM.so.6libICE.so.6libXrender.so.1
![Page 10: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/10.jpg)
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
![Page 11: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/11.jpg)
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
int add (int a, int b) { printf(“%i + %i = ”, a, b); int c = a + b; printf(“%i\n”, c); return c;}
![Page 12: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/12.jpg)
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Class Logger inherits Writer{ int logItem(LogMessage item, int level) { stuff… } int logError(String msg) { more stuff… } more functions… FileWriter w }
![Page 13: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/13.jpg)
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
math.cfloat absoluteValue(float a) { return max(a, -a);}
void printName(char *name) { printf(“Hello %s\n”, name);}
more functions…
![Page 14: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/14.jpg)
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
import java.lang.util;import edu.ucdavis.senses;class WirelessSensor { …}
![Page 15: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/15.jpg)
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
A function in /apache/http-2.0/server/core/handle.c
may call a function in /apache/http-2.0/apr-util/hash.c
![Page 16: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/16.jpg)
Edges
• Edges in a software network represent a relationship such as a function call, instance member, library dependence, etc.– Functions– Classes– Files– Modules/Packages– Directories– Libraries
Library libkdecore.so may need toLoad libqt3-mt.so which in turn mayNeed to load libX11.so and libm.so whichAll need libc.so
libkdecore.so
libqt3-mt.so
libX11.so libm.so
libc.so
![Page 17: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/17.jpg)
Example Callgraphvoid printInt(int a) { printf(“the number is %i\n”, a);}
int add(int a, int b) { return a + b;}
int multiply(int a, int b) { return a * b;}
int factorial(int a) { if (a == 1) return a; return multiply(a,factorial(a-1));}
void main() { printf(“calculating 6!\n”); printInt(factorial(6));}
printInt
addmultiply
factorial
main
printf
Never called
![Page 18: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/18.jpg)
Static versus Runtime Callgraphs
• Static callgraphs are constructed by a syntactic analysis of the source code
• Pros– Don’t have to build or run the program– Works in the presence of syntactic or semantic errors– Catches calls for exceptional situations– Fairly fast
• Cons– Doesn’t get valued information (how many calls to each function)– Includes calls in dead code. Example: if (0 == 3) logError(…)– Doesn’t include calls through function pointers– Doesn’t include calls to functions in dynamically loaded libraries
![Page 19: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/19.jpg)
Static versus Runtime Callgraphs
• Runtime callgraphs are constructed by running a piece of software one or more times and logging the number of function calls
• Pros– Includes number of times function calls occur– Includes calls through function pointers and dynamically loaded
libraries– Will not include calls in dead code
• Cons– Requires building the software– Hard to get complete code coverage– Can take a long time– May require a test harness of some kind (especially for
interactive applications) along with test data
![Page 20: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/20.jpg)
Differences between callgraphs and other graphs we’ve seen
• Has a root and commonly will form a tree-like structure
• Few if any cycles in callgraphs (direct or indirect recursion is rare)
• Reciprocity is not common due to levels of abstraction
• Preferential attachment?– If a function is called by many functions is it more
likely to be called by other functions in the future? Maybe.
![Page 21: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/21.jpg)
Software Repositories
• Used in development of virtually any software project (commercial, personal, OSS, etc.)
• Examples include RCS, CVS, subversion, perforce, bitkeeper, and sourcesafe
• Keeps track of every change to the software, who made the change, time of change, comments associated with a change, etc.
• Allows us to view the evolution of a piece of software• A developer makes changes to software code and then
commits the changes to the software respository with a description of the changes
![Page 22: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/22.jpg)
Software Networks from Repositories
• The software history allows us to relate different artifacts in the software
• Create an edge between functions, files, classes, if they all were modified in the same commit
• Create an edge between artifacts if they were modified by the same developer
![Page 23: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/23.jpg)
Modularity: one use of a callgraph• The characteristic of a system that has been divided into
smaller subsystems which interact with each other• Software that is modular has distinct subsystems
(modules) with high levels of interaction within the subsystems and low levels of interaction between the subsystems
• Software that is modular is easier to understand and maintain
Filesystem
Scheduler
I/O devicesMemory Management
Networking
Kernel
Modular OS
![Page 24: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/24.jpg)
Modularity Case Study using Callgraphs
• Exploring the structure of Complex Software Designs: An Empirical Study of Open Source by Alan MacCormack, John Rusnak, and Carliss Baldwin
• Created a “Design Structure Matrix” at the file level using function calls as ties. (i.e. if a function in foo.c calls a function in bar.c then there is a tie from foo.c to bar.c, non-symmetric)
• Used static analysis to extract the file-level callgraph• Clustered the DSM using standard clustering techniques• Metrics used:
– Clustering cost: measure of how many function calls are not within a cluster
– Propagation cost: measure of how many functions will be affected if a particular function is modified
![Page 25: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/25.jpg)
DSM examplesExample System in Graphical and Dependency Matrix Form
A DSM with dependencies in an “Idealized Modular Form”
All calls are within clusters so the clustering cost is 0
A change to F propagates to E, C, and A while a change to B only propagates to A
![Page 26: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/26.jpg)
Mozilla Project
• Netscape opensourced Navigator in March 1998• The project was named Mozilla and eventually
led to what Firefox is today• Initially the code was complex and tightly
coupled, a common phenomenon in industry code
• This formed a high barrier to entry for volunteers to contribute code
• Architecture was re-designed in late 1998 due to increasing complexity
![Page 27: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/27.jpg)
DSM’s for Mozilla
![Page 28: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/28.jpg)
Results of Mozilla Re-design
![Page 29: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/29.jpg)
More Results
• After the re-design, volunteerism went up dramatically (critical for an OSS project to succeed)
• Both functionality and performance increased
• Both code size and number of files decreased (initially)
![Page 30: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/30.jpg)
What are we doing with software nets?
• Due to CVS history, we can create a callgraph for a piece of software at any time during it’s evolution
• Do certain parts of the callgraph stabilize before others? Why?
• Are certain portions of the callgraph more bug-prone than others?
• What does code ownership in the callgraph look like?
• What is the relationship between callgraph network, co-commit network, and ownership network?
![Page 31: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/31.jpg)
More Questions
• Does the software network bear any resemblance to the social network of the developers who work on it? (Conway’s Law)
• Are callgraphs small-world networks? What is the distribution of in- and out-degrees? What would the answers mean (if anything)?
• What partitioning techniques allow us to extract module structure from source code?
• Is there a relationship between the co-committer social network and the email social network for developers?
![Page 32: Software Networks](https://reader035.fdocuments.in/reader035/viewer/2022062409/56814cd7550346895db9db05/html5/thumbnails/32.jpg)
On with the show…