Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale...
Transcript of Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale...
1
Weiyi ShangSupervisor: Dr. Ahmed E. Hassan
Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale Systems
2
Automated profiling & instrumentation are widely used in software engineering
Overhead No domain knowledgeLarge scale
3
Logs are a valuable source of information about system execution
Field informationDeveloper experience
foo() { … Log_statement(“operation started”); …}
4
Overview of log mining
Software System
Log collection Log analysisLog transformation
Goal
5
Finding 1. Little research focuses on logging statements that reside in the source code.Finding 2. Little research focuses on logs generated during the development of system.
Software System
Log collection Log analysisLog transformation
Goal
• Types of Logs:• Platform logs: Hadoop logs [Tan et al.]• Application logs: Dell DVD store logs [Jiang et al.]
• Sources of Logs:• Logs from the field: [Kavulia et al.]• Logs during development: [Jiang et al.]
6
Finding 3. Prior research primarily uses ad hoc log transformation techniques.
Software System
Log collection Log analysisLog transformation
Goal
• Abstracted logs: Log events [Jiang et al.] • Vectors or sets: Pairs [Jiang et al.], Sequence [Jiang et al.], Suffix arrays
[Nagappan et al.], Time series [Bitincka et al.]• Graphs: State machines [Tan et al.], Directed Graph[Nagappan et al.]• Matrixes: [Lou et al.]
7
Finding 4. Prior log mining research does not address the scalability challenges.
Software System
Log collection Log analysisLog transformation
Goal
• Simple calculation: filtering [Salfner et al. ]
• Directed Graph-based algorithms: [Nagappan et al.]
• Static analysis: [Yuan et al.]• Model checking: [Beschastnikh et al.]• Visualization: [De Pauw et al.]
• Statistical methods: PCA [Xu et al.]• Data mining techniques: Co-occurrence
analysis [Lou et al.]• Machine learning techniques: Prediction
[Salfner et al.]• Other analysis techniques: Compression
[Hassan et al.]
8
Finding 5. There exists limited software log mining research to support software development activities
Software System
Log collection Log analysisLog transformation
Goal
• Log mining platforms: [Bitincka et al.]• Log improvements: [Yuan et al.]• Log mining for system administration
• Anomaly detection [Xu et al.]• System monitoring [Rabkin et al.]• Work load recovery and capacity
planning [Kavulia et al.]
• Log mining for software engineering• Program comprehension:
[Beschastnikh et al.]• Software testing: [Jiang et al.]• Empirical studies: [Yuan et al.]
9
Thesis statement
Logs are a valuable yet rarely explored source of knowledge about a software system and its operation. There is little research regarding the understanding and evolution of logs.Systematic and scalable log mining approaches are needed to support various software development activities (e.g., code quality improvement, large scale testing and deployment of ultra-large scale applications).
10
Part 1: Study the challenges associated with understanding and evolving logging statements
Part 2: Log engineering approaches to support software development activities
What are the challenges in understanding logging statements? [Submitted to ICSM 2014]
How do logging statements evolve?[WCRE 2011 , JESP]
Prioritizing code review and testing efforts using logs and their churn. [EMSE]Verifying deployment of Big Data Analytics applications using logs. [ICSE 2013 ]
11
Part 1: Study the challenges associated with understanding and evolving logging statements
Part 2: Log engineering approaches to support software development activities
What are the challenges in understanding logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.
12
Motivation: Log understanding is challenging
User mailing lists
HadoopCassandraZookeeper
14 inquiries asked about 5 types of information
Meaning Cause Context Solution Impact02468
1012
2
11
1
6
1
# inquires
[ICSM 2014 in submission]
13
Approach: Attaching development knowledge to logs
Code commit
Issue reportsSource code
/*…*/
Call graph
Code comments
[ICSM 2014 in submission]
14
Development knowledge can resolve real-life inquiries
Development knowledge can provide help in resolving 9 out of 14 real-life inquiries from the user mailing list
Meaning Cause Context Solution Impact0
2
4
6
8
10
12
# not answered inquires
# answered inquires
[ICSM 2014 in submission]
15
Part 1: Study the challenges associated with understanding and evolving logging statements
Part 2: Log engineering approaches to support software development activities
What are the challenges in understanding logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.
16
Motivation:How to keep Log Processing Apps in sync with logs?
Release 1 Release 2 Release 3
[WCRE 2011 best paper, JSEP]
17
Approach:Studying log evolution at the execution level
Data Collection
Log Abstraction
System Deployment
time=1, Trying to launch, TaskID=01A
time=$t, Trying to launch, TaskID=$id
Enterprise Application (EA)
LogEvents
[WCRE 2011 best paper, JSEP]
18
Generating Abstract
Syntax Tree
Identifying logging
statementsSource code
Log.info (“time=%d, Trying to launch, TaskID=%s”, time, taskid);
time=$t, Trying to launch, TaskID=$id
Logging statements
Approach:Studying log evolution at the code level
[WCRE 2011 best paper, JSEP]
19
How do log evolve over time?
Growing &changing
Document & track
What types of modifications
happen to logs?
What information is conveyed by the
short-lived logs?
Quantity Type Content
8 types
Are mostly avoidable
Implementation-level details
FragileMaintenance effort
Results
[WCRE 2011 best paper, JSEP]
20
Part 1: Study the challenges associated with understanding and evolving logging statements
Part 2: Log engineering approaches to support software development activities
What are the challenges in understanding logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.
Approach: Building statistical models for post-release defects
21
Logistic Regression ModelTraditional metrics
Traditional metrics Log-related metrics Logistic Regression Model
• Are log-related metrics significant in the models?• How much explanatory power improvement can
log-related metrics provide over traditional metrics?[EMSE]
22
Log density
Average logging level
Log add densityLog delete densityCo-change of log and bug fix
Product Process
Approach:Defining log-related metrics
Lines of codePre-release defectsTotal prior commits
log-related metrics
Traditional metrics
Product Process
[EMSE]
23
There is relationship between logging characteristics and software quality.
Results
• In 7 out of 8 studied releases, at least one log-related metric is statistically significant in enhancing the model with only traditional metrics.
• The log-related metrics provide up to 40% improvement over the explanatory power of the traditional metrics.
0.16.0 to 0.19.0 3.0 to 4.0
[EMSE]
24
Part 1: Study the challenges associated with understanding and evolving logging statements
Part 2: Log engineering approaches to support software development activities
What are the challenges in understanding logging statements?
How do logging statements evolve?
Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.
25
How to verify the deployment of Big Data Analytics Apps?
Small sample data and pseudo cloud
Big data and real-life cloud
Data sample
How to verify
[ICSE 2013 distinguished paper]
26
Traditional approach for verifying BDA apps
Keyword scan
Many false positives!!Large results, too much effort to manually examine
[ICSE 2013 distinguished paper]
27
Overview of our approach
Small sample data and pseudo cloud
Big data and real-life cloud
Data sample
Underlying platform Underlying platform
Execution sequences
Execution sequences
Execution sequence delta
[ICSE 2013 distinguished paper]
Comparing small and large runs
28
Logs from testing run with small data
Logs from run with large data
Execution sequence
E1, E2, E3, E5, E6
Execution sequence
E1, E2, E3, E5, E6
E1, E2, E3, E7, E5, E6
Execution sequence delta
E1, E2, E3, E7, E5, E6
[ICSE 2013 distinguished paper]
29
How precise is our approach?
PrecisionEffort Reduction
How much effort reduction does our approach provide?
Reduce logs for manual inspection by over 86%
Less false positive
[ICSE 2013 distinguished paper]
30
Thesis contribution
• We demonstrate the challenges of understanding logs.
• We show that logging statements continually evolve.
• We show that there is a relationship between logging
characteristics and software defects.
• We propose approaches that leverage logs to verify the
deployment of Big Data Analytics applications.
31
32
33
Where else can we find the requested information?
Code commit Issue reports
Source code
/*…*/
Code comments
Call graph
fetch failure
From method checkAndInformJobTrackerof file ShuffleScheduler.java
34
Where else can we find the requested information?
Code commit Issue reports
Source code
/*…*/
Code comments
Call graph
fetch failure
Notify the JobTracker after every read error, if `reportReadErrorImmediately' is true or after every `maxFetchFailuresBeforeReporting' failures
Where else can we find the requested information?
35Code
commit Issue reports
Source code
/*…*/
Code comments
Call graph
fetch failure
Called by method copyFailed in class ShuffleScheduler
36
Where else can we find the requested information?
Code commit Issue reports
Source code
/*…*/
Code comments
Call graph
fetch failure
Allow shuffle retries and read-error reporting to be configurable. Contributed by Amareshwari Sriramadasu.
37
Where else can we find the requested information?
Code commit Issue reports
Source code
/*…*/
Code comments
Call graph
fetch failure
MAPREDUCE-1171.… This is caused by a behavioral change in hadoop 0.20.1. ……One solution I could see is "Provide a config option... ”…
38
Where else can we find the requested information?
Code commit Issue reports
Source code
/*…*/
Code comments
Call graph
fetch failure
Meaning: There is a data reading error.Cause: One of the possible reasons is a configuration.Context: The event happens during the shuffle period, while copying data.Impact: The event impacts the jobtracker.Solution: Changing a configuration option would solve the issue.
Amareshwari Sriramadasu is the expert to go to.
39
Step 1: Log Abstractionreduces the size of logs
Log abstraction Log Linking Simplifying
sequences
Example of log lines
Execution eventsJiang et al. JSME 2008
40
Step 2: Log linkingprovides context for logs
Log abstraction Log Linking Simplifying
sequences
Example of log lines
Execution events
41
Step 3: Sequence simplificationdeals with repeated logs
Log abstraction Log Linking Simplifying
sequences
Repeated logs:
task t1 read file A.task t1 read file A.task t1 read file A.
Remove repetition and order of events