Improving the Performance of Database-Centric Applications Through Program Analysis
-
Upload
blackberry -
Category
Technology
-
view
169 -
download
0
Transcript of Improving the Performance of Database-Centric Applications Through Program Analysis
1
Improving the Performance of Database-Centric Applications Through Program Analysis
1
Tse-Hsun (Peter) ChenSupervisor: Dr. Ahmed E. Hassan
2
Databases are the backbone of large-scale software applications
Database
3
The key to improving the performance of database-centric applications is not only improving the backend database management system, but also improving the database access code, which is rarely considered in prior studies.
Thesis statement
4
Focus of the thesis
ApplicationSource Code Database
abstraction framework
SQL queries Database
This thesis focuses on the database access code at the application-level
Most prior work in the database community focus on DB and SQL queries
Database access code
5
Related publications• Chapter 1: Improving the Quality of Large-Scale Database-Centric Software
Systems by Analyzing Database Access Code, International Conference on Data Engineering, PhD Symposium (ICDE-PhD), 2015
• Chapter 4: An Empirical Study on the Practice of Maintaining Object-Relational Mapping Code in Java Systems, International Conference on Mining Software Repositories (MSR), 2016
• Chapter 5: Detecting Performance Anti-patterns for Applications Developed Using Object-Relational Mapping, International Conference on Software Engineering (ICSE), 2014
• Chapter 6: Detecting Problems in Database Access Code of Large Scale Systems – An Industrial Experience Report, International Conference on Software Engineering, Software Engineering in Practice Track (ICSE-SEIP), 2016
• Chapter 7: Finding and Evaluating the Performance Impact of Redundant Data Access for Applications that are Developed Using Object-Relational Mapping Frameworks, IEEE Transactions on Software Engineering (TSE), 2016
• Chapter 8: CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate-based Database-centric Web Applications, International Symposium on the Foundations of Software Engineering (FSE), 2016
6
Goals of the literature review
What is missing in prior studies that use program analysis to improve database access code?
Can we further improve the performance of database-centric applications from the point of view of improving application source code?
7
Literature review of using program analysis to improve database access code
SQL synthesis and
transformation
Domain-specific languages and
APIs
Anti-pattern detection
8
SQL synthesis and
transformation
Domain-specific languages and
APIs
Anti-pattern detection
F1: Limited tooling support for detecting anti-patterns in database access codeF2: Limited research on detecting performance problems or pinpointing the root cause for database access code
• Static anti-pattern detection tools:• Static analysis framework for database access code: [Dasgupta et al.]• State-of-the-art static anti-pattern detection frameworks: [FindBugs, Coverity, PMD]
• Detecting anti-patterns in database access code:• Anti-pattern in database schema: [Nijjar and Bultan]• Static SQL validation by string analysis: [Gould et al.]• Linking code and SQL: [Tamayo et al.]• Deadlock detection: [Grechanik et al.]
Literature review of using program analysis to improve database access code
9
SQL synthesis and
transformation
Domain-specific languages and
APIs
Anti-pattern detection
F3: Prior studies focus on optimizing SQL queries but do not consider how the queried data would be used in the application
• Batching SQL queries: [Cheung et al.]• Pre-fetching SQL queries: [Ramachandra et al.]• Asynchronous query execution: [Chavan etal.] • Translate application logic to stored procedure calls: [Cheung et al.]• Synthesize SQL queries from Java code: [Cheung et al.]
Literature review of using program analysis to improve database access code
10
SQL synthesis and
transformation
Domain-specific languages and
APIs
Anti-pattern detection
F4: Most prior approaches only consider providing APIs to improve SQLs, but not the code for interacting with database abstraction frameworks
• Domain-specific languages for parallel query execution: [Ackermann et al.]• Domain specific language that compile list iterations into SQL queries: [Grust et al]• Java APIs that can be translated to SQL queries: [Iu et al.]
Literature review of using program analysis to improve database access code
11
Object-Relational Mapping (ORM) eliminates the gap between objects and SQL
Database
• Less boilerplate code • Object-DB translations are done automatically• More than 67% of developers use ORM
JavaClasses
Benefits over raw SQLs
ORM
Much less code and shorter development time
12
The key to improving the performance of database-centric applications is not only improving the backend database management system, but also improving the database access code, which is rarely considered in prior studies.
Thesis statement
13
Part I: Empirical Study• How is ORM code maintained?
Part II: Approaches for Improving the Performance of Database Access Code• Automatically tuning cache configurations• Statically detecting ORM anti-patterns• Dynamically detecting redundant data
anti-patterns
An example class with Java ORM code
14
@Entity@Table(name = “user”)@Cachablepublic class User{
@Column(name=“id”)private int id;
@Column(name=“name”)String userName;
@OneToMany(fetch=FetchType.EAGER)List<Team> teams;public void setName(String n){
userName = n;}
… other getter and setter methods
User.javaUser class is
mapped to “user” table in DB
id is mapped to the column “id” in the
user table
A user can belong to multiple teams
Eagerly retrieve associated teams when retrieving a
user object
Performance-related configs
Accessing the database using ORM
15
User u = findUserByID(1);
ORMdatabase
select u from user where u.id = 1;
u.setName(“Peter”);
update user set name=“Peter” where user.id = 1;
Objects SQLs
16
ORM is not a silver bullet
Using ORM comes with some hidden costs…
We have been working with developers to understand how to improve ORM code maintenance
[MSR 2016]
17
Study ORM code changes overtime
[MSR 2016]
Files with ORM code
We automatically identify and classify ORM code into different types, and study their changes overtime
DB mapping
code
Regularcode
…
Perf config code
Maintenance activities of each type of ORM code
18
Database mapping code@Table(name = “user”)
Performance configurationquery.cache()
ORM query codeUser u = q.getSingleResult();
[MSR 2016]
35% of ORM code changes
55% of ORM code changes
< 10% of ORM code changes ORM
configurations are rarely tuned!
19
Part I: Empirical Study• How is ORM code maintained?
Part II: Approaches for Improving the Performance of Database Access Code• Automatically tuning cache configurations• Statically detecting ORM anti-patterns• Dynamically detecting redundant data
anti-patterns
Caching helps performance
20[FSE 2016]
User u = findUserByID(1);
ORMdatabase
ORMCache
Application server
…
User u = findUserByID(1);
Finding optimal ORM cache configuration is difficult
21[FSE 2016]
• Optimal cache configurations may change when the workload changes significantly
• There can be hundreds of potential cache locations in the code
User
Application server
Approach: Link REST calls to database accesses
22
Tomcat HTTP server
10.10.10.1 - - [11/Apr/2015:12:19:30] 200 “GET /app/user/1 ” …
User Table
User
[FSE 2016]
Log files
Approach: Apply static analysis to extract database-access information
23
@Get@Path(“/user/{id}”)public User getUser(id){ … select from User u where u.id = id …}
Map annotations to corresponding REST calls
Apply inter-procedural data flow analysis to see if REST inputs are used as criteria for database queries
10.10.10.1 - - [11/Apr/2015:12:19:30] 200
“GET /app/user/1 ” …
[FSE 2016]
Running performance tests for evaluation
24
Application server Database
• We use JMeter tests to generate the load• Database is populated with hundreds of
MB of data
[FSE 2016]
Performance improved significantly after using optimized cache
25
NoCache DeveloperCache0%
20%
40%
60%
80%
100%
PetClinicCloudStoreOpenMRS
% o
f im
prov
emen
t in
thro
ughp
ut
[FSE 2016]
26
Part I: Empirical Study• How is ORM code maintained?
Part II: Approaches for Improving the Performance of Database Access Code• Automatically tuning cache configurations • Statically detecting ORM anti-patterns• Dynamically detecting redundant data
anti-patterns
Developers are often not aware of database access
Wow! I don’t need to worry
about DB code!
ORM code with performance anti-patterns
27
Bad applicationperformance
The performance impact can be SIGNIFICANT!
[ICSE 2014 & 2016]
Coverity PMD Google error-prone
Facebook InferFindBugs
Existing tools do not look for performance problems
28
These tools only support detecting language and functional related problems.
[ICSE 2014 & 2016]
Statically detecting one-by-one processing
29
First find all the methods that read/write from/to DBMS
Class User{getUserById()…getUserAddress()…
}
Identify the positions of all loopsfor each userId{
foo(userId)}
Check if the the call graph calls any database-accessing method
foo (userId){getUserById(userId)
} [ICSE 2014 & 2016]
Assessing anti-pattern impact by fixing the anti-patterns
ExecutionResponse timeFor u in users
u.getName
Code with anti-patterns
Get users in a batchFor u in users:
u.getName
Code without anti-patterns 30
Execute test suite 30 times
Response time after fixing the anti-patterns
Avg. % improvement
Execution
Execute test suite 30 times [ICSE 2014]
Performance anti-patterns have medium to large impacts
One-by-one processing0%
20%
40%
60%
80%
100%
31
% o
f im
prov
emen
t in
resp
onse
tim
e
[ICSE 2014]
Extending static anti-pattern detection framework
32
Source code
• DBChecker looks for both functional and performance anti-patterns
• DBChecker is able to detect 5 more anti-patterns that we see in practice
[ICSE 2016]
DBChecker is adopted in practice
33
• DBChecker is adopted by our industrial partner
• DBChecker is executed on a daily-basis for application quality assurance
• We documented the experience we learned during the adoption processing
[ICSE 2016]
Lessons learned: Handling a large number of detection results
34
• Developers have limited time to fix detected problems
• Most existing static anti-pattern detection tools do not prioritize the detected instances for the same anti-pattern
[ICSE 2016]
35
Solution: Prioritizing based on DB tablesUser
Time zone
• Problems related to large or frequently-accessed tables are ranked higher (more likely to be performance bottlenecks)
• Problems related to highly dependable tables are ranked higher
[ICSE 2016]
36
Part I: Empirical Study• How is ORM code maintained?
Part II: Approaches for Improving the Performance of Data Access Code• Automatically tuning cache configurations• Statically detecting ORM anti-patterns• Dynamically detecting redundant data
anti-patterns
ORM often requests redundant data
ORM
37
Our goal is to identify redundant data anti-pattern between the needed data
in the application and the ORM requested data
Needed data in the application Requested data
[TSE 2016]
Finding redundant data anti-patterns
Source Code
Database-
accessing methods
Exercising the application
Requested data
from the DB
Static analysis
38
Compile
Executable
Needed data in
the code
Executed methods
[TSE 2016]
Update all
Select all
Excessive dataData is not updated in the code, but is updated by ORM-generated SQL
Data in some tables are not used in the code, but data is retrieved
Some data in the same table is not needed, but is retrieved 39
Discovered three types of redundant data anti-patterns
[TSE 2016]
Assessing anti-pattern impact by fixing the anti-patterns
ExecutionResponse timeSelect * from
User…
Code with anti-patterns
Select user name from User…
Code without anti-patterns 40
Execute test suite 30 times
Response time after fixing the anti-patterns
Avg. % improvement
Execution
Execute test suite 30 times [TSE 2016]
Performance improvement after removing anti-patterns
0%
20%
40%
60%
80%
100%
4.00% 5.40% 4.00%
30%
0% 0%
31%
0%
92%
BL EA PC
41
Anti-patterns have different impacts on different workloads, but removing them can give significant performance improvements
in general[TSE 2016]
Select all Update all Exc. data
% o
f im
prov
emen
t in
resp
onse
tim
e
42
43
44
45
46
47
48
49
50
Summary of our the literature review
Most tools and prior studies do not focus
on database performance anti-
patterns
Most prior studies focus on improving SQLs, but developers nowadays work with database
abstraction frameworks