Survival analysis of database technologies in open source Java projects
Transcript of Survival analysis of database technologies in open source Java projects
Survival Analysis of Database Technologies in Open Source Java Projects
Mathieu Goeminne, Tom Mens So?ware Engineering Lab, University of Mons, Belgium
hEp://informaHque.umons.ac.be/genlog/projects/disse
ICSME 2015 Early Research Achievements — Bremen, Germany, September 2015
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Context
• FNRS research project “Data-‐Intensive So?ware System EvoluHon” – Interuniversity collaboraHon with Université de Namur
• Expand empirical MSR research to include database-‐related acHviHes
• Overall goal – Empirically analyse and support co-‐evoluHon between program code and database schema in data-‐intensive so?ware systems
• This paper: – Study co-‐evoluHon of Java ORM database technologies
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Focus
• Open source Java projects – Extracted from GitHub Java Corpus [Allamanis&SuEon — MSR 2013]
– We considered 13,307 Java projects sHll having a Git repository in March 2015
3
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Focus
• Many Java relaHonal database technologies
4
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Focus
• Java relaHonal database technologies
5
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Focus
• Java relaHonal database technologies • 19 different Java technologies considered (of at least 3 years old) • Detected by looking at import statements and configuraDon files
• This le? us with 3,819 Java projects
6
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Focus
• Top 5 technologies occurred in over 200 projects each • 3,707 Java projects used (at least) one of these 5 technologies
7
200
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Research QuesHons
RQ1 Which combinaHons of database technologies co-‐occur in the projects in which they are used? RQ2 How long do database technologies survive in the projects in which they occur? RQ3 Does the introducHon of a technology influence the survivability of another one? RQ4 How long does it take to introduce a second technology a?er a previous one was introduced?
8
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Different technologies used within the same Java project (not necessarily at the same Hme)
(61%)
RQ1 Which combinaHons of technologies co-‐occur in the projects in which they are used?
9
(34,5%)
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
RQ1 Which combinaHons of technologies co-‐occur in the projects in which they are used?• How frequently do these technologies actually co-‐occur? • Answer: most of the Hme!
10
JPA annotations as an indicator of the use of any frameworkbut JPA itself.
Vaadin is a framework for developing web applications. Itintroduces the notion of domain layer, which abstracts thedatabase structure through Java classes hosting the businesslogic of the application.
RQ1: Which combinations of database frameworks “co-occur”in the projects in which they are used?
We identified which of the 5 considered database frameworksoccurred throughout the lifetime of each considered project, andwe computed all possible intersections of framework occurrencein Fig. 2.
JDBC occurs as the only database framework in 56.3% ofall projects. At the other side of the spectrum, Hibernate occursas the only framework in only 2.9% of all projects. If we lookat their intersection, the large majority (82.8%) of all projectsthat have used Hibernate have also used JDBC during theirlifetime.
Something similar can be observed for JDBC and JPA. JPAoccurs in isolation in 29.5% of all projects, while almost halfof all projects that have used JPA (49.3% to be precise) havealso used JDBC during their lifetime.
Similarly, when comparing Hibernate and JPA, we observethat 49.6% of all projects that have used Hibernate have alsoused JPA, while 44.1% of all projects that have used Hibernatehave also used JPA and JDBC.
Fig. 2. Number of Java projects using a given number of database frameworks(over the entire project’s lifetime).
These high numbers could be due to the fact that somedatabase frameworks are used as supporting technologiesfor others (e.g., Spring typically uses JDBC for databaseaccess), while some frameworks are complementing each other(e.g., Vaadin has an optional module called JPAContainer forsupporting JPA annotations).
To determine for which frameworks this is the case, westudied the “co-occurrence” of different frameworks withinthe same project. This happens when files relating to bothframeworks are present in at least one of the project’s commits(but typically in many more commits).
Table II shows vertically the number of projects having useda given number of distinct frameworks over their entire history,and horizontally the maximum number of distinct “co-occurring”frameworks. Almost all values reside on the diagonal, implyingthat in the large majority of all cases (97.5%, i.e., 1213/1273),different database frameworks used in a project tend toco-occur.
TABLE IINUMBER OF PROJECTS INVOLVING A GIVEN NUMBER OF FRAMEWORKS,
OVER THEIR ENTIRE LIFETIME AND IN CO-OCCURRENCE.
# co-occurring fw. ! 1 2 3 4 5# total # frameworks used1 2,4432 22 7763 2 16 3284 0 0 18 1045 0 0 1 1 5
Focusing on specific combinations of frameworks, Table IIIreports the number of projects in which two database frame-works co-occurred at least once during the project’s lifetime.Not surprisingly, we observe that JDBC frequently co-occurswith other frameworks. That lets us suppose that JDBC is usedas a supporting technology that provides services not offered bythe other frameworks. 80.1% of all projects that used Hibernatehave also used JDBC in co-occurrence; 48.4% of all projectsthat used JPA have used JDBC in co-occurrence; 41.3% of allprojects that used Spring have used JDBC in co-occurrence;and 39.6% of all projects that used Vaadin have used JDBCin co-occurrence.
TABLE IIINUMBER OF PROJECTS IN WHICH PAIRS OF DATABASE FRAMEWORKS
CO-OCCUR.
Spring JPA Vaadin HibernateJDBC 645 565 143 192Spring 558 76 156JPA 98 105Vaadin 22
Some database frameworks seem to complement one another.For example, 47.8% of all projects using Spring also use JPA.Other database frameworks appear to be in competition. Forexample, Vaadin co-occurs with Hibernate in only 22 projects,which makes up 9.2% of all projects using Hibernate, and only6.1% of all projects using Vaadin. To a lesser extent, Vaadinalso co-occurs infrequently together with JPA or Spring.
RQ2: How long do database frameworks “survive” in theprojects in which they occur?
Fig. 3 shows the Kaplan-Meier survival curves of the selectedframeworks. After their introduction, all database frameworksremain present in more than 45% of the projects. Neverthe-less, we observe different trends in framework survivability.For example, in 11.7% of all cases Hibernate is removed 30days after its introduction. In the same time interval, Spring isonly removed from 3.7% of the projects. Three years after its
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
RQ1 Which combinaHons of technologies co-‐occur in the projects in which they are used?• JDBC co-‐occurs with most other technologies • Spring and JPA co-‐occur very frequently • Vaadin occurs infrequently with Hibernate, JPA, Spring
11
JPA annotations as an indicator of the use of any frameworkbut JPA itself.
Vaadin is a framework for developing web applications. Itintroduces the notion of domain layer, which abstracts thedatabase structure through Java classes hosting the businesslogic of the application.
RQ1: Which combinations of database frameworks “co-occur”in the projects in which they are used?
We identified which of the 5 considered database frameworksoccurred throughout the lifetime of each considered project, andwe computed all possible intersections of framework occurrencein Fig. 2.
JDBC occurs as the only database framework in 56.3% ofall projects. At the other side of the spectrum, Hibernate occursas the only framework in only 2.9% of all projects. If we lookat their intersection, the large majority (82.8%) of all projectsthat have used Hibernate have also used JDBC during theirlifetime.
Something similar can be observed for JDBC and JPA. JPAoccurs in isolation in 29.5% of all projects, while almost halfof all projects that have used JPA (49.3% to be precise) havealso used JDBC during their lifetime.
Similarly, when comparing Hibernate and JPA, we observethat 49.6% of all projects that have used Hibernate have alsoused JPA, while 44.1% of all projects that have used Hibernatehave also used JPA and JDBC.
Fig. 2. Number of Java projects using a given number of database frameworks(over the entire project’s lifetime).
These high numbers could be due to the fact that somedatabase frameworks are used as supporting technologiesfor others (e.g., Spring typically uses JDBC for databaseaccess), while some frameworks are complementing each other(e.g., Vaadin has an optional module called JPAContainer forsupporting JPA annotations).
To determine for which frameworks this is the case, westudied the “co-occurrence” of different frameworks withinthe same project. This happens when files relating to bothframeworks are present in at least one of the project’s commits(but typically in many more commits).
Table II shows vertically the number of projects having useda given number of distinct frameworks over their entire history,and horizontally the maximum number of distinct “co-occurring”frameworks. Almost all values reside on the diagonal, implyingthat in the large majority of all cases (97.5%, i.e., 1213/1273),different database frameworks used in a project tend toco-occur.
TABLE IINUMBER OF PROJECTS INVOLVING A GIVEN NUMBER OF FRAMEWORKS,
OVER THEIR ENTIRE LIFETIME AND IN CO-OCCURRENCE.
# co-occurring fw. ! 1 2 3 4 5# total # frameworks used1 2,4432 22 7763 2 16 3284 0 0 18 1045 0 0 1 1 5
Focusing on specific combinations of frameworks, Table IIIreports the number of projects in which two database frame-works co-occurred at least once during the project’s lifetime.Not surprisingly, we observe that JDBC frequently co-occurswith other frameworks. That lets us suppose that JDBC is usedas a supporting technology that provides services not offered bythe other frameworks. 80.1% of all projects that used Hibernatehave also used JDBC in co-occurrence; 48.4% of all projectsthat used JPA have used JDBC in co-occurrence; 41.3% of allprojects that used Spring have used JDBC in co-occurrence;and 39.6% of all projects that used Vaadin have used JDBCin co-occurrence.
TABLE IIINUMBER OF PROJECTS IN WHICH PAIRS OF DATABASE FRAMEWORKS
CO-OCCUR.
Spring JPA Vaadin HibernateJDBC 645 565 143 192Spring 558 76 156JPA 98 105Vaadin 22
Some database frameworks seem to complement one another.For example, 47.8% of all projects using Spring also use JPA.Other database frameworks appear to be in competition. Forexample, Vaadin co-occurs with Hibernate in only 22 projects,which makes up 9.2% of all projects using Hibernate, and only6.1% of all projects using Vaadin. To a lesser extent, Vaadinalso co-occurs infrequently together with JPA or Spring.
RQ2: How long do database frameworks “survive” in theprojects in which they occur?
Fig. 3 shows the Kaplan-Meier survival curves of the selectedframeworks. After their introduction, all database frameworksremain present in more than 45% of the projects. Neverthe-less, we observe different trends in framework survivability.For example, in 11.7% of all cases Hibernate is removed 30days after its introduction. In the same time interval, Spring isonly removed from 3.7% of the projects. Three years after its
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
RQ2 How long do database technologies survive in the projects in which they occur?• Use staHsHcal technique of survival analysis
• Kaplan-‐Meier esHmator represents probability to survive (i.e., the Hme it takes for a specific event to occur)
• Takes into account right-‐censored data (e.g., the event did not occur yet)
12
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
RQ2 How long do database technologies survive in the projects in which they occur?• Most technologies tend to survive (>50% probability) over the project’s lifeHme • A bit less for Vaadin and Hibernate
13
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
RQ3 Does the introducHon of a technology influence the survivability of another one?
• Visually we observe slightly improved survival for some combinaHons; but staDsDcally insignificant
14
0 1000 2000 3000 4000 5000 6000
0.0
0.2
0.4
0.6
0.8
1.0
C2C1
A = spring B = jdbc
0 1000 2000 3000 4000 5000 6000
0.0
0.2
0.4
0.6
0.8
1.0
C2C1
A = jdbc B = spring
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Effect of project size?
• Project size and age follow a log-‐normal distribuHon • How does this affect the results?
– Split projects in two equal bins according to project size, and compare results
15
Time unDl technology introducDon
• Technologies tend to be introduced at the start of the project
• Some small projects also introduce technologies near the end their observed lifeHme
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
RQ4 How long does it take to introduce a second technology a?er a previous one was introduced?
• Small projects are less likely to introduce a second technology, and do it later • JDBC is rarely completed with another technology • Hibernate is o?en quickly completed with (or replaced by) another techno
16
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Conclusions
• Some technologies are much more popular than others – JDBC, JPA, Hibernate, Spring
• Different Java database technologies tend to co-‐occur together – Especially in the larger Java projects
• Technologies, once introduced, tend to remain • Introducing new technos does not “replace” exisHng ones but rather “complements” them
– Survival of exisHng technologies is not negaHvely affected
• Big projects tend to behave differently than small ones • Many technologies are used in combinaHon with JDBC; but JDBC is also o?en used in isolaHon
17
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
Future Work
• Analyse co-‐occurrence of technologies at finer level of granularity
• Look at NoSQL database technologies • Study co-‐evoluHon between changes in the source code and changes in the database schema
• Study social aspects
18
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
References
• M. Goeminne, T. Mens. Towards a survival analysis of database framework usage in Java projects. ICSME 2015 ERA track
• M. Goeminne, A. Decan, T. Mens. Co-‐evolving code-‐related and database-‐related changes in a data-‐intensive soQware system. CSMR-‐WCRE 2014 ERA track
• L. Meurice, A. Cleve. DAHLIA: A visual analyzer of database schema evoluDon. CSMR-‐WCRE 2014 Tool Demo
• A. Cleve, T. Mens, J.-‐L. Hainaut. Data-‐intensive system evoluHon, IEEE Computer 43(8): 110-‐112 (2010)
• A. Cleve, M. Gobert, L. Meurice, J. Maes, J. Weber. Understanding database schema evoluHon: A case study, Science of Computer Programming (2013)
19
September 2015 — InternaHonal Conference on So?ware Maintenance and EvoluHon (ICSME2015), Bremen, Germany
References
X
!!Evolving Software Systems Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. !Springer, ISBN 978-3-642-45398-4