Data Analytics in Software Engineering
Transcript of Data Analytics in Software Engineering
![Page 1: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/1.jpg)
Data Analytics in Software Engineering
Christian Kaestner
1
![Page 2: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/2.jpg)
Learning Goals
• Understand importance of data-driven decision making also during software engineering
• Collect and analyze measurements
• Design evaluation strategies to evaluate the effectiveness of interventions
• Understand the potential of data analytics at scale for QA data
2
![Page 3: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/3.jpg)
3
![Page 4: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/4.jpg)
4
![Page 5: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/5.jpg)
5
![Page 6: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/6.jpg)
What about Software Engineering?
6
![Page 7: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/7.jpg)
7
![Page 8: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/8.jpg)
How would you approach these questions with data?• Where to focus testing effort?
• Is our review practice effective?
• Is the expensive static analysis tool paying off?
• Should we invest in security training?
8
![Page 9: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/9.jpg)
Believes vs Evidence?
• “40% of major decisions are based not on facts, but on the manager’s gut” [Accenture survey among 254 US managers in industry]
• E.g., strong believes in survey among 564 Microsoft engineers• Code Reviews improve code quality• Coding Standards improve code quality• Static Analysis tools improve code quality
• Controversial believes from same survey• Code Quality depends on programming language• Fixing Defects is riskier than adding new features• Geographically distributed teams produce code of as good quality as non-
distributed teams.
9Devanbu, P., Zimmermann, T., & Bird, C. (2016, May). Belief & evidence in empirical software engineering. In Proceedings of the 38th international conference on
software engineering (pp. 108-119). ACM.
![Page 10: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/10.jpg)
Source of Believes
10
![Page 11: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/11.jpg)
Software Engineering is becoming more like modern medicine?
11
![Page 12: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/12.jpg)
Measurement and Metrics
• Discussed throughout the semester
• Everything is measurable
• Define measures, be critical (precision, accuracy, …)
• Be systematic in data collection (prefer automation)
12
![Page 13: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/13.jpg)
How would you approach these questions with data?• Where to focus testing effort?
• Is our review practice effective?
• Is the expensive static analysis tool paying off?
• Should we invest in security training?
13
![Page 14: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/14.jpg)
Evaluate Effectiveness of an Intervention
• Controlled experiments• Compare group with intervention against control group without,
• Randomized controlled trials, AB testing, …
• Ideally blinded
• Natural experiments, Quasi experiments• Compare similar groups that naturally only differ in the intervention
• No randomized assignment of treatment condition
• Time series analyses• Compare measures before and after intervention, preferably across groups
with the intervention at different times
14
![Page 15: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/15.jpg)
On Experiments
• Understand experimental methods and limitations• Chose appropriate design (e.g., quasi experiment, vs timeseries, vs controlled)
• Appropriate to research question and available subjects
• Design carefully, control confounds, avoid biases
• Use appropriate statistics to draw conclusions
• This requires sound understanding of quantitative research methods
• Many pitfalls
15
![Page 16: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/16.jpg)
16
![Page 17: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/17.jpg)
17
![Page 18: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/18.jpg)
18
![Page 19: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/19.jpg)
19
![Page 20: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/20.jpg)
20
![Page 21: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/21.jpg)
21
![Page 22: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/22.jpg)
Abundance of Data
22
![Page 23: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/23.jpg)
Abundance of Data
• Code history
• Developer activities
• Bug trackers
• Sprint backlog, milestones
• Continuous integration logs
• Static analysis and technical debt dashboards
• Test traces; dynamic analyses
• Runtime traces
• Crash reports from customers
• Server load, stats
• Customer data, interactions
• Support requests, customer reviews
• Working hours
• Team interactions in Slack/issue tracker/email/…
• …
23
![Page 24: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/24.jpg)
Measurement is HardExample: Performance
24
![Page 25: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/25.jpg)
Twitter Case Study
25
![Page 26: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/26.jpg)
Timer Overhead• Measurement itself consumes time
26
Request time
Time reported
Even starts Event ends,request time
Saved end time
Memory access and interactionwith operating system
Measured event should be 100-1000xlonger than measurement overhead
![Page 27: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/27.jpg)
Confounding variables
27
![Page 28: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/28.jpg)
Confounding variables
• Background processes• Hardware differences• Temperature differences• Input data; random?• Heap size• System interrupts• Single vs multi core systems• Garbage collection• Memory layout• …
28
![Page 29: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/29.jpg)
Handling confounding variables
• Keep constant
• Randomize• -> Repeated measurements
• -> Large, diverse benchmarks
• Measure and compute influence ex-post
29
![Page 30: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/30.jpg)
Common approach: best result
• Repeat measurement
• Report best result (or second best, or worst)
30
![Page 31: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/31.jpg)
Common approach: Mean values
• Repeat measurement (how often?)
• Report average
• Basic assumptions: Law of large numbers and central limit theorem
31
(cc 3.0) Wikimedia
![Page 32: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/32.jpg)
Means
• Arithmetic mean
• Median: The value in the middle• On even data sets, the arithmetic mean between the two values in the middle• Robust against outliers
• Truncated mean• Remove 10% outliers (on both ends), then arithm. mean
• Geometric mean• …
32
median(c(1,4,6,10)) = 5median(c(-5,3,4,6,50)) = 4
mean(c(1,4,6,10)) = 5.25 mean(c(-5,3,4,6,50)) = 11.6
x arithm 1
nx i
x1 x2 ... xn
ni1
n
![Page 33: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/33.jpg)
Median
• Median instead of arithmetic mean, if• ordinal data ("distance" has no meaning)
• only few measurements
• asymmetric distributions
• expecting outliers
33
![Page 34: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/34.jpg)
But
• How many measurements?• Are 3, 10, or 50 sufficient? Or 100 or 10000?
• (to find the higgs boson, several million measurements were necessary)
• Measuring order?• AAABBB or ABABAB
• Iterate in a single batch or multiple batches?
• Are measurements independent?
• Is the average good enough?
34
![Page 35: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/35.jpg)
Visualize data
• Get an overview
• Visually inspect distribution and outliers
35
![Page 36: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/36.jpg)
Histograms
36
hist(c)
![Page 37: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/37.jpg)
Reporting distributions
• Boxplot show• Median as thick line
• Quartiles as box (50% of all values are in the box)
• Whiskers
• Outliers as dots
• Cumulative probability distributions
• Visual representation of distributions
37
boxplot(c)
plot(ecdf(c))
![Page 38: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/38.jpg)
Error Models and Probability Distributions
38
![Page 39: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/39.jpg)
Intuition: Error Model
• 1 random error, influence +/- 1
• Real mean: 10
• Measurements: 9 (50%) und 11 (50%)
• 2 random errors, each +/- 1
• Measurements: 8 (25%), 10 (50%) und 12 (25%)
• 3 random errors, each +/- 1
• Measurements : 7 (12.5%), 9 (37.5), 11 (37.5), 12 (12.5)
39
![Page 40: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/40.jpg)
Normal distributions
40
![Page 41: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/41.jpg)
Standard deviation
41
s 1
n(x i x
i1
n
)2 (x1 x)
2 (x2 x)2 ... (xn x)
2
n
CC BY 2.5 Mwtoews
![Page 42: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/42.jpg)
Confidence intervals (formal)
42
![Page 43: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/43.jpg)
Confidence intervals
43-5
0
5
10
15
20
25
0 10 20 30 40 50 60 70 80 90 100
Measurements
Mean
Collect data until confidence interval at an expected size, e.g, +/- 10%
![Page 44: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/44.jpg)
Confidence intervals
• Results of independent measurements are normallydistributed (central limit theorem)
• Confidence level 95% =>with 95% probability, the real mean is within the interval*• Mean of the measurements vs real mean of the statistical population
44
> t.test(data, conf.level=.95)…95 percent confidence interval:8.870949 10.739207
*Technically more correct: When repeating the experiment very often, in 95% of the repetitions the real mean will be within the confidence interval of that measurement
![Page 45: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/45.jpg)
Accuracy vs Precision
45
Precision:Distribution around the mean (repeatability)
Source of measurement error, usually not attributable
Accuracy:Deviations of the measured mean from the real mean
i.e., can we trust the results
Resolution:smallest measureable difference
![Page 46: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/46.jpg)
Random vs. Systematic Errors
• Systematic errors: Error of experimental design or measurement technique• CPU Speed: Measuring at different temperatures• Forgot to reset counter for repeated measurement• -> Small variance over repeated measurements• -> Experience to exclude them during design• -> Accuracy
• Random errors• Cannot be controlled• Stochastic methods• -> Precision
46
![Page 47: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/47.jpg)
Comparing Measurements
47
![Page 48: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/48.jpg)
Comparing measurement results
• GenCopy faster than GenMS?
• GenCopy faster than SemiSpace?
48
![Page 49: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/49.jpg)
Comparing Distributions
49
![Page 50: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/50.jpg)
Different effect size, same deviations
50
small overlap=> significant difference
large overlap=> no significant difference
![Page 51: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/51.jpg)
Same effect size, different deviations
51
small overlap=> significant difference
large overlap=> no significant difference
![Page 52: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/52.jpg)
Dependent vs. independent measurements
• Pairwise (dependent) measurements• Before/after comparison
• With same benchmark + environment
• e.g., new operating system/disc drive faster
• Independent measurements• Repeated measurements
• Input data regenerated for each measurement
52
![Page 53: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/53.jpg)
Significance level
• Statistical change of an error• Define before executing the experiment
• use commonly accepted values• based on cost of a wrong decision
• Common:• 0.05 significant• 0.01 very significant
• Statistically significant result =!> proof• Statistically significant result =!> important result• Covers only alpha error (more later)
53
![Page 54: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/54.jpg)
Compare confidence interval
• Rule of thumb: If the confidence intervals do not overlap, the difference is significant
54
![Page 55: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/55.jpg)
t test
• Requires: normally distributed metric data• very large data sets almost always follow a normal distribution
• Compares to measurement
• Basic idea:• Assume that both measurements are from the same basis population (follow
the same distribution)
• t test computes the chance that both samples are from the same distribution
• If probability is smaller than 5% (for significance level 0.05) the assumption is considered refuted
55
![Page 56: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/56.jpg)
t test with R
56
> t.test(x, y, conf.level=0.9)
Welch Two Sample t-test
data: x and y t = 1.9988, df = 95.801, p-value = 0.04846alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval:0.3464147 3.7520619
sample estimates:mean of x mean of y 51.42307 49.37383
> t.test(x-y, conf.level=0.9) (paired)
![Page 57: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/57.jpg)
• For causation• Provide a theory (from domain knowledge, independent of
data)• Show correlation• Demonstrate ability to predict new cases
(replicate/validate)
http://xkcd.com/552/57
![Page 58: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/58.jpg)
58
![Page 59: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/59.jpg)
Big Code Data Science
59
![Page 60: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/60.jpg)
Abundance of Data
• Code history
• Developer activities
• Bug trackers
• Sprint backlog, milestones
• Continuous integration logs
• Static analysis and technical debt dashboards
• Test traces; dynamic analyses
• Runtime traces
• Crash reports from customers
• Server load, stats
• Customer data, interactions
• Support requests, customer reviews
• Working hours
• Team interactions in Slack/issue tracker/email/…
• …
60
![Page 61: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/61.jpg)
Large Datasets now accessible
• Huge codebases in Google, Facebook, Microsoft, …
• Public activates of open source projects, including hobby projects and industrial systems (e.g., GitHub • 27M contributors, 80M projects, 1B traces, 10 years
• Lots of data: Code, commits, commit messages, issues, bug-fixing patches, discussions, reviews, pull requests, teams, build logs, static analysis logs, coverage history, performance history
• Lots of noise: Multitasking, interruptions, offline communication, project and team cultures, …
61
![Page 62: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/62.jpg)
Data Science on Big Code
• Answer large, more general questions:• What team size is most productive or produces highest quality?
• Is multitasking causing buggy code?
• Do co-located teams perform better?
• Does code review improve quality?
• Find trends in big noisy data sets using advanced statistics
• Find even small relationships with natural experiments: Compare similar projects that differ only in one aspect (given the size, there will be many pairs for most questions)
62
![Page 63: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/63.jpg)
Example Results
• “Geographically distributed teams produce code whose quality (defect occurrence) is just as good as teams that are not geographically distributed”• No statistical difference detected at Microsoft
• “Defect probability increases if teams consist of members with large organizational distance”• Key predictor for defect density found at Microsoft
• “Multitaskers are more productive in open source projects, but not beyond 5 projects”• Confirmed on GitHub data by CMU Faculty Vasilescu
63
![Page 64: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/64.jpg)
Example: Badges
64A. Trockman, S. Zhou, C. Kästner, and B. Vasilescu. Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem. In Proceedings of the 40th International Conference on Software Engineering (ICSE), New York, NY: ACM Press, May 2018.
![Page 65: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/65.jpg)
Experimenting in Production
65
![Page 66: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/66.jpg)
Canary Testing and AB Testing
66
![Page 67: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/67.jpg)
Testing in Production
• Beta tests
• AB tests
• Tests across hardware/software diversity (e.g., Android)
• “Most updates are unproblematic”
• “Testing under real conditions, with real workloads”
• Avoid expensive redundant test infrastructure
67
![Page 68: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/68.jpg)
Pipelines
68
![Page 69: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/69.jpg)
Release cycle of Facebook’s apps69
![Page 70: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/70.jpg)
Real DevOps Pipelines are Complex
• Incremental rollout, reconfiguring routers
• Canary testing
• Automatic rolling back changes
Chunqiang Tang, Thawan
Kooburat, Pradeep
Venkatachalam, Akshay
Chander, Zhe Wen,
Aravind Narayanan,
Patrick Dowell, and
Robert Karl. Holistic
Configuration
Management at
Facebook. Proc. of SOSP: 328--343 (2015).
70
![Page 71: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/71.jpg)
• Scripts to change system configurations (configuration files, install packages, versions, …); declarative vs imperative
• Usually put under version control
Configuration management, Infrastructure as Code
$nameservers = ['10.0.2.3'] file { '/etc/resolv.conf':
ensure => file, owner => 'root', group => 'root', mode => '0644', content => template('resolver/resolv.conf.erb'),
}
- hosts: allsudo: yestasks:- apt: name={{ item }}
with_items:- ldap-auth-client- nscd
- shell: auth-client-config -t nss -p lac_ldap- copy: src=ldap/my_mkhomedir dest=/…- copy: src=ldap/ldap.conf dest=/etc/ldap.conf- shell: pam-auth-update --package- shell: /etc/init.d/nscd restart
(Puppet)(ansible)
71
![Page 72: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/72.jpg)
Monitoring
• Many standard and custom tools for monitoring, aggregation and reporting
• Logging infrastructure at scale
• Open source examples• collectd/collect for gathering and storing statistics
• Monit checks whether process is running
• Nagios monitoring infrastructure, highly extensible
72
![Page 73: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/73.jpg)
(Netflix)
https://www.slideshare.net/jmcgarr/continuous-delivery-at-netflix-and-beyond 73
![Page 74: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/74.jpg)
74
![Page 75: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/75.jpg)
Why DevOps when testing in production
• Ability to quickly change configurations for different users
• Track configuration changes
• Track metrics at runtime in production system
• Track results per configuration; analysis dashboard to test effects
• Induce realistic fault scenarios (ChaosMonkey…)
• Ability to roll back bad changes quickly
75
![Page 76: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/76.jpg)
76
![Page 77: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/77.jpg)
Summary
• Pursue data-supported decisions, rather than relying on “belief”
• Learn from scientific methods, experiments, statistics• Experimental designs
• Biases, confounding variables
• Measurements, systematic vs random errors
• Big code provides new opportunities
• Measurement in production with DevOps
• Measurement is essential for software engineering professionals
77
![Page 78: Data Analytics in Software Engineering](https://reader030.fdocuments.in/reader030/viewer/2022012407/616a201d11a7b741a34f146a/html5/thumbnails/78.jpg)
Some slides with input from
• Bogdan Vasilescu, ISR/CMU
• Thomas Zimmermann, Microsoft Research:• https://speakerdeck.com/tomzimmermann
• Greg Wilson, Mozilla• https://www.slideshare.net/gvwilson/presentations
78