Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale...

Post on 14-Apr-2017

117 views 1 download

Transcript of Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale...

1

Weiyi ShangSupervisor: Dr. Ahmed E. Hassan

Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale Systems

2

Automated profiling & instrumentation are widely used in software engineering

Overhead No domain knowledgeLarge scale

3

Logs are a valuable source of information about system execution

Field informationDeveloper experience

foo() { … Log_statement(“operation started”); …}

4

Overview of log mining

Software System

Log collection Log analysisLog transformation

Goal

5

Finding 1. Little research focuses on logging statements that reside in the source code.Finding 2. Little research focuses on logs generated during the development of system.

Software System

Log collection Log analysisLog transformation

Goal

• Types of Logs:• Platform logs: Hadoop logs [Tan et al.]• Application logs: Dell DVD store logs [Jiang et al.]

• Sources of Logs:• Logs from the field: [Kavulia et al.]• Logs during development: [Jiang et al.]

6

Finding 3. Prior research primarily uses ad hoc log transformation techniques.

Software System

Log collection Log analysisLog transformation

Goal

• Abstracted logs: Log events [Jiang et al.] • Vectors or sets: Pairs [Jiang et al.], Sequence [Jiang et al.], Suffix arrays

[Nagappan et al.], Time series [Bitincka et al.]• Graphs: State machines [Tan et al.], Directed Graph[Nagappan et al.]• Matrixes: [Lou et al.]

7

Finding 4. Prior log mining research does not address the scalability challenges.

Software System

Log collection Log analysisLog transformation

Goal

• Simple calculation: filtering [Salfner et al. ]

• Directed Graph-based algorithms: [Nagappan et al.]

• Static analysis: [Yuan et al.]• Model checking: [Beschastnikh et al.]• Visualization: [De Pauw et al.]

• Statistical methods: PCA [Xu et al.]• Data mining techniques: Co-occurrence

analysis [Lou et al.]• Machine learning techniques: Prediction

[Salfner et al.]• Other analysis techniques: Compression

[Hassan et al.]

8

Finding 5. There exists limited software log mining research to support software development activities

Software System

Log collection Log analysisLog transformation

Goal

• Log mining platforms: [Bitincka et al.]• Log improvements: [Yuan et al.]• Log mining for system administration

• Anomaly detection [Xu et al.]• System monitoring [Rabkin et al.]• Work load recovery and capacity

planning [Kavulia et al.]

• Log mining for software engineering• Program comprehension:

[Beschastnikh et al.]• Software testing: [Jiang et al.]• Empirical studies: [Yuan et al.]

9

Thesis statement

Logs are a valuable yet rarely explored source of knowledge about a software system and its operation. There is little research regarding the understanding and evolution of logs.Systematic and scalable log mining approaches are needed to support various software development activities (e.g., code quality improvement, large scale testing and deployment of ultra-large scale applications).

10

Part 1: Study the challenges associated with understanding and evolving logging statements

Part 2: Log engineering approaches to support software development activities

What are the challenges in understanding logging statements? [Submitted to ICSM 2014]

How do logging statements evolve?[WCRE 2011 , JESP]

Prioritizing code review and testing efforts using logs and their churn. [EMSE]Verifying deployment of Big Data Analytics applications using logs. [ICSE 2013 ]

11

Part 1: Study the challenges associated with understanding and evolving logging statements

Part 2: Log engineering approaches to support software development activities

What are the challenges in understanding logging statements?

How do logging statements evolve?

Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.

12

Motivation: Log understanding is challenging

User mailing lists

HadoopCassandraZookeeper

14 inquiries asked about 5 types of information

Meaning Cause Context Solution Impact02468

1012

2

11

1

6

1

# inquires

[ICSM 2014 in submission]

13

Approach: Attaching development knowledge to logs

Code commit

Issue reportsSource code

/*…*/

Call graph

Code comments

[ICSM 2014 in submission]

14

Development knowledge can resolve real-life inquiries

Development knowledge can provide help in resolving 9 out of 14 real-life inquiries from the user mailing list

Meaning Cause Context Solution Impact0

2

4

6

8

10

12

# not answered inquires

# answered inquires

[ICSM 2014 in submission]

15

Part 1: Study the challenges associated with understanding and evolving logging statements

Part 2: Log engineering approaches to support software development activities

What are the challenges in understanding logging statements?

How do logging statements evolve?

Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.

16

Motivation:How to keep Log Processing Apps in sync with logs?

Release 1 Release 2 Release 3

[WCRE 2011 best paper, JSEP]

17

Approach:Studying log evolution at the execution level

Data Collection

Log Abstraction

System Deployment

time=1, Trying to launch, TaskID=01A

time=$t, Trying to launch, TaskID=$id

Enterprise Application (EA)

LogEvents

[WCRE 2011 best paper, JSEP]

18

Generating Abstract

Syntax Tree

Identifying logging

statementsSource code

Log.info (“time=%d, Trying to launch, TaskID=%s”, time, taskid);

time=$t, Trying to launch, TaskID=$id

Logging statements

Approach:Studying log evolution at the code level

[WCRE 2011 best paper, JSEP]

19

How do log evolve over time?

Growing &changing

Document & track

What types of modifications

happen to logs?

What information is conveyed by the

short-lived logs?

Quantity Type Content

8 types

Are mostly avoidable

Implementation-level details

FragileMaintenance effort

Results

[WCRE 2011 best paper, JSEP]

20

Part 1: Study the challenges associated with understanding and evolving logging statements

Part 2: Log engineering approaches to support software development activities

What are the challenges in understanding logging statements?

How do logging statements evolve?

Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.

Approach: Building statistical models for post-release defects

21

Logistic Regression ModelTraditional metrics

Traditional metrics Log-related metrics Logistic Regression Model

• Are log-related metrics significant in the models?• How much explanatory power improvement can

log-related metrics provide over traditional metrics?[EMSE]

22

Log density

Average logging level

Log add densityLog delete densityCo-change of log and bug fix

Product Process

Approach:Defining log-related metrics

Lines of codePre-release defectsTotal prior commits

log-related metrics

Traditional metrics

Product Process

[EMSE]

23

There is relationship between logging characteristics and software quality.

Results

• In 7 out of 8 studied releases, at least one log-related metric is statistically significant in enhancing the model with only traditional metrics.

• The log-related metrics provide up to 40% improvement over the explanatory power of the traditional metrics.

0.16.0 to 0.19.0 3.0 to 4.0

[EMSE]

24

Part 1: Study the challenges associated with understanding and evolving logging statements

Part 2: Log engineering approaches to support software development activities

What are the challenges in understanding logging statements?

How do logging statements evolve?

Prioritizing code review and testing efforts using logs and their churn.Verifying deployment of Big Data Analytics applications using logs.

25

How to verify the deployment of Big Data Analytics Apps?

Small sample data and pseudo cloud

Big data and real-life cloud

Data sample

How to verify

[ICSE 2013 distinguished paper]

26

Traditional approach for verifying BDA apps

Keyword scan

Many false positives!!Large results, too much effort to manually examine

[ICSE 2013 distinguished paper]

27

Overview of our approach

Small sample data and pseudo cloud

Big data and real-life cloud

Data sample

Underlying platform Underlying platform

Execution sequences

Execution sequences

Execution sequence delta

[ICSE 2013 distinguished paper]

Comparing small and large runs

28

Logs from testing run with small data

Logs from run with large data

Execution sequence

E1, E2, E3, E5, E6

Execution sequence

E1, E2, E3, E5, E6

E1, E2, E3, E7, E5, E6

Execution sequence delta

E1, E2, E3, E7, E5, E6

[ICSE 2013 distinguished paper]

29

How precise is our approach?

PrecisionEffort Reduction

How much effort reduction does our approach provide?

Reduce logs for manual inspection by over 86%

Less false positive

[ICSE 2013 distinguished paper]

30

Thesis contribution

• We demonstrate the challenges of understanding logs.

• We show that logging statements continually evolve.

• We show that there is a relationship between logging

characteristics and software defects.

• We propose approaches that leverage logs to verify the

deployment of Big Data Analytics applications.

31

32

33

Where else can we find the requested information?

Code commit Issue reports

Source code

/*…*/

Code comments

Call graph

fetch failure

From method checkAndInformJobTrackerof file ShuffleScheduler.java

34

Where else can we find the requested information?

Code commit Issue reports

Source code

/*…*/

Code comments

Call graph

fetch failure

Notify the JobTracker after every read error, if `reportReadErrorImmediately' is true or after every `maxFetchFailuresBeforeReporting' failures

Where else can we find the requested information?

35Code

commit Issue reports

Source code

/*…*/

Code comments

Call graph

fetch failure

Called by method copyFailed in class ShuffleScheduler

36

Where else can we find the requested information?

Code commit Issue reports

Source code

/*…*/

Code comments

Call graph

fetch failure

Allow shuffle retries and read-error reporting to be configurable. Contributed by Amareshwari Sriramadasu.

37

Where else can we find the requested information?

Code commit Issue reports

Source code

/*…*/

Code comments

Call graph

fetch failure

MAPREDUCE-1171.… This is caused by a behavioral change in hadoop 0.20.1. ……One solution I could see is "Provide a config option... ”…

38

Where else can we find the requested information?

Code commit Issue reports

Source code

/*…*/

Code comments

Call graph

fetch failure

Meaning: There is a data reading error.Cause: One of the possible reasons is a configuration.Context: The event happens during the shuffle period, while copying data.Impact: The event impacts the jobtracker.Solution: Changing a configuration option would solve the issue.

Amareshwari Sriramadasu is the expert to go to.

39

Step 1: Log Abstractionreduces the size of logs

Log abstraction Log Linking Simplifying

sequences

Example of log lines

Execution eventsJiang et al. JSME 2008

40

Step 2: Log linkingprovides context for logs

Log abstraction Log Linking Simplifying

sequences

Example of log lines

Execution events

41

Step 3: Sequence simplificationdeals with repeated logs

Log abstraction Log Linking Simplifying

sequences

Repeated logs:

task t1 read file A.task t1 read file A.task t1 read file A.

Remove repetition and order of events