Identifying Logging Practices in Open Source Microservices ...

Identifying Logging Practices in Open Source MicroservicesProjects

Marco Tulio Resende Zuquim Alves1

1Bacharelado em Engenharia de SoftwareInstituto de Ciencias Exatas e Informatica (ICEI)

Pontifıcia Universidade Catolica de Minas Gerais (PUC-MG)Edifıcio Fernanda, Rua Claudio Manoel, 1.162, Funcionarios

30.140-100 — Belo Horizonte — MG — Brazil

[email protected]

Abstract. Nowadays, most software projects have migrated from the old mono-lithic to microservice architectures. Although this system architecture is nowquite popular, there are not many studies that describe how observability prac-tices are being employed. Knowing that good observability leads to a betterproject development through the DevOps cycle and that logging is the moststraightforward practice among the observability pillars, it is interesting toknow its current adoption level and common practices. In this empirical study,we cloned 10,918 of the most stargazed GitHub Python repositories in search ofopen source projects that use a logging library and have at least one Docker/Ku-bernetes associated file. Our goal is to understand what is the adoption level ofobservability by trying to identify popular logging practices in the community.We were able to find 1,166 projects fitting our research requirements. A cus-tom parser with regular expressions identified and saved logger statements fromevery Python file in each selected repository. We discovered that projects adopt-ing certain licenses tend to have a higher logger statements/LLOC ratio. Otherdiscoveries include, but are not limited to: over 99% of Python open sourceprojects using Docker use the builtin logging library exclusively or in parallelwith another one; the repository age does not affect its logger statements/L-LOC ratio significantly; logging verbosity levels debug and info are used almosttwice as much as warning and error. We hope our study provides the communitywith useful data about this topic, possibly contributing to the improvement oftechniques that stimulate its applications.

Bacharelado em Engenharia de Software - PUC MinasTrabalho de Conclusao de Curso (TCC)

Orientador de conteudo (TCC I): Jose Laerte Pires Xavier Junior – [email protected] academico (TCC I): Lesandro Ponciano dos Santos – [email protected] do TCC II: Hugo Bastos de Paula – [email protected]

Belo Horizonte, 15 de novembro de 2020

1. IntroductionDevOps is an approach where business owners, development, operations, and qualityassurance teams collaborate to deliver reliable software continuously, enabling faster re-

leases including improvements based on customer feedback. DevOps encompasses theentire software delivery life cycle, from idea to applications running in production. Con-tinuous Delivery is a DevOps capability responsible, among other things, for the releasepipeline.

After DevOps has emerged, the microservices architecture (MSA) was the firstone to completely adopt the Continuous Delivery practices [Waseem et al. 2020]. Mi-croservices are programs with a single task that also include all the connectivity to theoutside world as well as the runtime requirements to execute the task [Yousif 2016, Al-Debagy and Martinek 2018]. Considering the difference of complexity between the old(monolithic) and the new (MSA, widely adopted by the DevOps community) architec-tural system styles, one of its challenges is observability [Pina et al. 2018, Waseem et al.2020].

In this context, observability is the measure of how well internal states of a sys-tem can be inferred from information obtained from its external outputs [Kasun Indrasiri2018]. Observability is composed by three main pillars: logging, monitoring, and dis-tributed tracing [Kasun Indrasiri 2018]. Efficient observability of microservices is ofuttermost importance to ensure that the deployed services are working as expected. Goodobservability practices are also important to, through automation, generate useful feed-back data from the current cycle that can be used to improve further the system in the nextDevOps cycle. There are studies that cover the monitoring matter of observability in thecontext of MSA and DevOps [Heinrich et al. 2017, Miglierina and Tamburri 2017, Barnaet al. 2017, Trihinas et al. 2018], but not logging nor distributed tracing [Waseem et al.2020]. In this work, we will try to fill this mentioned gap about logging in MSA.

In this paper, we aim to investigate the adoption level of observability — speciallythe logging matter — and most common practices used in MSA open source projectspublicly available on GitHub. In order to achieve this goal, we intend to uncover what isthe adoption level of logging in these projects, and what are the most common observabil-ity practices in the open source community. To accomplish that, the following researchquestions were formulated:

(RQ1) How are containerized apps being logged?(RQ2) What kind of events are logged?(RQ3) What is the frequency of MSA projects adopting observability?

As far as we are concerned, there is no other work that quantifies observabilitypractices — logging, more specifically — of microservices in open source projects. Theproposed work will shed some light on the matter, finding values about the adoptionlevel and common practices adopted in the open source community. We hope that, byanswering these research questions, we’ll be helping other researches to result in moredetailed insights about what may be the best practices for observability of microservices.The findings may bring attention to the matter of observability of microservices from theDevOps practitioners, causing more empirical studies to be done about such practices anddeveloping new methods or tools that could further improve DevOps culture.

The rest of the paper is organized as follows. Section 2 provides the reader a basicbackground knowledge about the discussed subjects. Section 3 discusses related work.Section 4 presents the proposed methodology in detail.

2. Background

This section provides an overview of DevOps, MSA, and observability. The understand-ing of these concepts, technologies, and practices are fundamental for the understandingof the main focus of our research.

2.1. DevOps

The DevOps concept is focused in building a collaborative culture between teams thatare historically separated [Loukides 2012]. It is said that a legitimate response to “adaptor die”, in the software development universe, is “I’ll do DevOps!” [Schlossnagle 2017].The DevOps approach was originated at the 2008 Agile Conference in Toronto, where P.Debois proposed a set of practices and processes to bridge the gap between developmentand operations teams, providing a quick response to customer demands [Debois 2008].As a consequence, the developed software is reliably deployed as fast as possible [Mala2019].

Development teams were often pressured to deliver new releases cheap and fast,and not paying much attention to maintainability nor stability. Meanwhile, operationteams were frequently uncomfortable with new releases, as they were responsible formaintaining services running and stable, and ended up introducing complicated formalprocesses, that resulted in less collaboration and long cycle times [Dornenburg 2018].

DevOps can be illustrated as a continuous cycle composed by 3 phases: Continu-ous Integration, Continuous Delivery, and Continuous Feedback. Continuous Integrationis composed by 2 stages, Create and Verify, where the project is in fact built and tested.Continuous Delivery is composed by 3 stages, Package, Release, and Configure, wherethe project is put together, deployed, and adjusted through automated procedures. Con-tinuous Feedback, composed by 2 stages, Monitor and Plan, is the last phase, but also thebeginning of a new cycle, with the complete analysis of the project and new definitions,based on those analysis, for the planning of the next cycle. Next, we explain the DevOpscycle, illustrated in Figure 1 (retrieved from AWS Marketplace website1).

Plan: The plan stage is the end of the Continuous Feedback phase and the beginning ofthe Dev cycle. Most of the planning input come from the feedback provided by the endof the Ops cycle, which is monitoring. It begins with the agile planning and managementfor the project, in which will be defined a set of functionalities that gives value, criteriaand goal for each iteration of each phase of the project.

Create: The create stage, the first of the Continuous Integration phase, is when the projectis truly built. During this stage, the project’s source code, infrastructure design, processautomation, test definitions, security implementations, versioning, and virtual infrastruc-ture deployment are done.

Verify: In the verify stage, all the previously defined tests are executed, finishing theContinuous Integration phase. There are many kinds of tests that can be done, includingbut not limited to: unit tests, integration tests, and mutation tests. All of it is done throughautomation mechanisms in charge of reviewing, validating, testing, and reporting.

Package: The package stage is the beginning of the Continuous Delivery phase. In this

1https://bit.ly/aws-mktplace-devops

Figure 1. DevOps cycle illustration1

stage, the application, its dependencies, documents, and provisioning artifacts are puttogether to be released.

Release: In the release stage the application is deployed and all of the work related tothe automated deployment requirements and instructions done in the package stage areexecuted. Depending on the project, it can vary from a single Docker container to hun-dreds of thousand different containerized apps, being controlled by an orchestrator suchas Kubernetes, working together to deliver whatever the system was conceived to do.

Configure: The configure stage is the last one of the Continuous Deployment phase.It describes how all operations related to the software life cycle happen continuously,using tools and scripts, to automate the optimization of operations scenarios and mitigatefailures caused by human generated errors.

Monitor: The monitor stage is the last in the DevOps cycle, and the first in the ContinuousFeedback phase. It is when the logging, monitoring, data measure and analysis of theproject happen. Through the results obtained, the application, its dependencies, scripts,and other artifact will have a new update plan — with needs and risks in evidence — inthe beginning of the new cycle. This is the key stage for observability in the DevOpscycle.

2.2. Microservices Architecture

In the traditional architectural style, the monolithic architecture, the applications are builtas a single, autonomous unit. Although this architecture is simpler, making it easier todevelop at first, the general maintenance of these applications is not so easy. Simple

changes may impact the whole system, causing the need to rebuild and deploy a newversion, which can be expensive both in time and resources.

MSA consists of multiple small applications that have its own architecture [Al-Debagy and Martinek 2018, Bucchiarone et al. 2018, De Lauretis 2019] — i.e. a webserver, database management system, a logging server. In order to ensure loose couplingand truly benefit from MSA, each of the microservices has its own specific database. Thisis called a polyglot persistence architecture. Each microservice exposes an ApplicationProgramming Interface (API) and consumes data provided by other services’ APIs. Itsmodular architecture makes it easier to maintain, test, and understand.

Even though some monolithic systems may have a slightly better throughput per-formance in comparison to MSA [Al-Debagy and Martinek 2018], the later has alreadyproven itself when it comes to scalability, reliability, maintainability, and cost reduction[De Lauretis 2019, Gos and Zabierowski 2020, Villamizar et al. 2016, Bucchiarone et al.2018].

In October 2017, the International Data Corporation predicted that by 2021, 80%of enterprise applications will have migrated to cloud platforms — Platform as a Service(PaaS) — and over 95% of microservices deployed in containers [Larrucea et al. 2018].The MSA has synergy with DevOps given its main characteristic, which is being a set ofsmall granular services that can be integrated through lightweight communication mech-anisms. And it also supports incremental non-breaking change as a principle, an exampleof an evolutionary architecture [Ford et al. 2017]. This separation is achieved via ad-vanced DevOps practices like machine provisioning, automated testing and deployments.

2.3. Observability

Observability is the measure of how well internal states of a system can be inferred fromknowledge of its external outputs [Kasun Indrasiri 2018]. It is achieved by the combina-tion of three different methods: logging, monitoring, and tracing.

Logging is a common programming practice in software development and hasbecome the main way of recording strategic runtime information for analysis [Zhu et al.2015]. It is the combination of multiple kinds of output generated by a software duringits operation. Logging is a valuable source of information for developers not only duringthe development process but also after the application is deployed [Yuan et al. 2012].Logs are the main source of information for a variety of analysis tasks, such as anomalydetection, failure diagnosis, test analysis and performance issue identification.

Monitoring is any activity’s progress registered by systematically collecting andanalyzing data. To make it possible, one needs to understand what are the metrics and val-ues that determine whether the system is behaving correctly or not [Schlossnagle 2017].Monitoring has been commonly used in the industry for decades. Many monitoring soft-ware are well known by operation teams for decades. Cacti, Nagios, MRTG, RRDTools,SolarWinds, Zabbix, and others have been widely used in the industry since network ser-vices exist [Hernantes et al. 2015, Filipe et al. 2018].

While monitoring encompasses instrumenting an application and then collecting,aggregating, and analyzing metrics to improve your understanding of the system be-haviour as a whole, the goal of tracing is to follow a program’s flow and data progression.

Tracing is derived from logs and is a fundamental process in software engineeringto collect data about the behavior of a system or application. It is a different view onhow a system behaves, which takes into consideration the ordering of each event and theimpact of one on another [Kasun Indrasiri 2018]. Distributed tracing is a way to quicklyunderstand what is happening with the data in transit through a group of microservices,with each request being made, in distributed systems. To trace a request through thedistributed system, one should search and combine logs that share the same request ID inorder to understand and trace the request path [Zhou et al. 2018]. Storing, categorizing,and tracing runtime events is crucial to improve the software quality in each cycle of theagile development [Capizzi et al. 2020].

3. Related Work

The systematic identification, analysis, and classification of the literature on MSA in De-vOps done by Waseen et al. (2020) intended to understand how MSA is employed inDevOps. The researches did a systematic mapping study (SMS) where 47 studies wereanalysed. Among the main findings of this SMS, it’s worth citing: the increase in thenumber of publications on MSA in DevOps context; 24 identified problems with theirsolutions regarding implementing MSA in DevOps; 38 MSA patterns identified; 15 Qual-ity Assurances (QA) that are positively and negatively affected; 50 identified tools thatsupport building MSA based systems in DevOps. The SMS unveiled that monitoringhundreds of microservices and diagnosing an issue in one of them is a difficult work, butthe studies found by their SMS only discuss the monitoring of MSA, leaving a gap forfurther investigations on logging and distributed tracing.

In the work from Trihinas et al. (2018), they propose a framework called Uni-corn to solve a set of challenges every MSA system faces in the DevOps context. TheUnicorn framework intends to solve the following challenges: monitoring and diagnos-tics, auto-scaling and optimizations, orchestration in hybrid cloud deployments, and se-curity enforcement and privacy protection. Even though it does solve one of the pillarsof observability: monitoring — as stated previously, this work does not treat other keychallenges in this context: logging and, consequently, distributed tracing [Waseem et al.2020].

The work of Yuan et al. (2012) investigates the logging practices of four old andpopular open source projects: Apache httpd, OpenSSH, PostgreSQL, and Squid. Amongthe various findings of the study, some of the highlights are: that logging is a pervasivepractice during software development, logging is at least as important as other parts of thecode, and logging code is quite significant in the software evolution despite its relativelysmall presence in the code. The study is considered useful for the present paper since itexecutes a significant analysis of logging techniques and logging events in mature opensource projects.

In the work from Cito et al. (2017), they do an empirical analysis of open sourceprojects using the Docker ecosystem on GitHub. After analyzing and comparing over70,000 Dockerfiles with samplings from the Top-100 and Top-1,000 most popular projectsusing Docker, they uncover some interesting information regarding the quality of theprojects — i.e. most issues (28,6%) are caused by missing version pinning. We find thiswork relevant for our research since it also analyzes the most popular Docker-using open

source projects on GitHub.

4. Proposed MethodologyThe work proposed in this article is a quantitative research, since it uses predefined metricsto identify the adoption level of observability practices on MSA projects in the opensource community.

4.1. ProceduresIn this section, we describe our approach to identify the adoption level and what are themost used observability practices — focusing in the logging matter — applied to MSAin open source projects. In order to achieve this, we studied several GitHub projects,analysing logging practices adopted by them. We inspected key files that may presentvaluable evidences for our study.

The study was performed in 5 phases. Each phase is further explained in Sec-tion 4.2. In Phase 1, we created a list of the 1,000 most popular GitHub repositorieswritten primarily in Python and containing Docker files. In Phase 2, we collected datafrom the 1,000 repositories in the list obtained in Phase 1 by cloning and saving the inter-esting files from each one of the projects in an orderly way. Phases 3, 4, and 5 occurredsimultaneously, post data collection, filtering, and ordering from Phase 2. In Phase 3, weanalyzed the saved files to try to identify the logging techniques used in each project. InPhase 4, the saved files were analyzed to try to identify what kind of events are beinglogged by the project. In Phase 5, we quantified the adoption level of observability — atleast on the logging matter — of each one of the cloned projects.

We restricted our GitHub data collection to the repositories which have Pythonas their main programming language and explicitly uses the Docker containerizationplatform. The Python programming language is one of the most popular programminglanguages (second most popular on GitHub since 2019)2 and should provide a greatnumber of repositories for analysis. Docker is, according to IBM3, the world’s leadingcontainerization platform, which is widely utilized in microservices architecture systems[Al-Debagy and Martinek 2018, Maheshwari et al. 2018]. The fact that Docker usesspecifically named configurations files, such as Dockerfile and docker-compose.yml, ac-celerates the data analysis, since we will be focusing on a limited variety of key files,configurations, and libraries.

A tool was developed in order to collect, inspect, and analyse all of the proposeddata. It was primarily built using Python (version ≥ 3.6), but also have bits of shell script.The Python libraries used in this development are: gitpython, logzero, pandas, radon5,and requests.

4.2. Phases1. GitHub data ingestion2. GitHub data selection3. Analysis: logging techniques4. Analysis: logging events5. Analysis: observability adoption level

2https://octoverse.github.com/3https://www.ibm.com/downloads/cas/BBKLLK1L

4.2.1. GitHub data ingestion

Having realized that GitHub will only return the first 1,000 results for each query made,to select the interesting GitHub repositories, we made the list of GitHub repositories bymaking numerous GraphQL queries until we obtained about 1,000 repositories that satisfythe requirements for this study. The filtered results were ordered in descending order bythe number of stars – GitHub counter attribute stargazers.

4.2.2. GitHub data selection

Having the repository list, we collected data from GitHub by doing the following stepsfor each one of the repositories:

1. Clone the repository;2. Save the interesting files;3. Delete the remaining repository data;

The cloning step was made using Git application through the gitpython library. Aftercloning a repository, the total lines of code (LOC) of each project file was calculatedusing the radon4 library and the results were registered in a CSV file. Then, to savethe interesting files, we searched the whole project directory looking for requirements.txtfiles and files with a name matching the RegEx [Dd]ocker.* using the find5 command,and files which have the log expression — that may be a clue for a logging usage, suchas a logger object or the logging builtin Python library — in any line using the grep6

command. The files identified in the search were copied/moved to a safe separate path.When the LOC analysis and file filtering processes were complete, the cloned repositoryroot directory was deleted from the local storage in order to save storage space.

4.2.3. Analysis: logging techniques

To answer (RQ1), we identified common logging techniques in those interesting filespreviously saved by searching for some clues. To identify the most common logging li-braries, we investigated the Python files’ import statements. We also checked for loggerobjects setup arguments, which should contain information such as: handler naming con-ventions, verbosity levels, logging file names, logging file paths, logging file verbosity,logging file rotation, logging file max size, among others. It would be interesting to eval-uate if the projects are using some other logging persistence approaches different fromtext file logging, such as databases. The related findings were stored in a CSV file forquantification and further analysis via data visualizations.

4.2.4. Analysis: logging events

To answer (RQ2), every previously saved Python files was scanned looking for somemore specific evidence concerning the kinds of events being logged. First, we searched

4https://radon.readthedocs.io/en/latest/intro.html#raw-metrics5https://www.gnu.org/software/findutils/manual/html mono/find.html6https://www.gnu.org/software/grep/manual/grep.html

for logger object call and tried to identify their respective context and stored these findingsin an orderly manner along with important information such as project name, file path,file name, line number, context, verbosity level, message, logger object instance, and anyother associated information. Then, after all the files have been searched and the CSV fileor database is complete, we quantified the findings in order to identify patterns betweenthem through tables or data visualizations.

4.2.5. Analysis: observability adoption level

To answer (RQ3), project files were searched for indications that the project makes anyuse of observability. We registered the logging density of each one of the projects bycalculating the LOC of logging code than dividing it by the total LOC of the project[Yuan et al. 2012]. Beyond the evidence concerning logging techniques and loggingevents, it would be interesting to find indications that the project also makes explicit use ofmonitoring or distributed tracing by looking at its dependencies and related microservicesin Docker-related files.

5. Research Methods

5.1. Initial data survey

To begin the analysis, the 991 most popular public Python repositories were identifiedvia GitHub’s GraphQL API, and downloaded locally (70 GB) — via Python scriptedmulti-threaded git clone operations — for a careful first inspection. Through Unix shellscripting — using tools such as awk8, egrep7, find6, and uniq9 — these 991 repositorieswere filtered down to 197.

After the first filter, done via regular expression filtering (ˆfrom log|ˆimportlog) to find which of these make use of common Python logging libraries, 523 reposito-ries were found. These 523 repositories were filtered to find which ones have at least oneMSA associated file/directory — Dockerfile, docker-compose.yml, docker-compose.yaml,and .kube, resulting in 197 repositories. Finally, all the Python files from these 197 repos-itories which are importing a logging library were listed and saved in a file containing10,624 file paths to be inspected.

5.2. Data collection

To overcome the GitHub API limitation of providing the maximum of 1,000 items persearch query, 11 manual GraphQL search queries were made using the same query, buteach with a manually adjusted variation on the stars search filter. Each query result wasappended to the CSV file, and the next query was made with the stars filter set based onthe last repository’s stargazers count from the previous query result. Using this strategy— doing some manual filtering when the current result set had the first repositories over-lapping the previous’ last ones — it was possible to paginate the most popular Pythonrepositories way further from the first 1,000. After the 11th query run, we ended up witha list of 10,918 repositories in our CSV file.

With the new repository list, we run our [readjusted] multi-threaded script to clonethem all. It took about 24 hours to complete and we ended up with around 685 GB of

repository data. The number of successfully cloned repositories was 10,911 — 99.94%success rate. One was manually aborted, since it was too large and was taking an abnormaltime to clone. Four were not found. We suspect they were archived or set to private inthe time window between the CSV construction and cloning moment. Two were notdownloaded due to an error. We chose to ignore them, since we already had enough datato start working.

We needed a set of repositories which could be considered as MSA — due to theusage of a Docker or Kubernetes file / configuration directory — that also uses a commonlogging library. Considering that finding a file/directory is much faster than reading every*.py file and looking for a substring in it, we first filtered our 10,911 repositories runningthe find6 command with 4 different argument sets.

We found 3,322 Dockerfile files and 932 docker-compose.y* files. But we didnot find a single Kubernetes configuration directory, .kube nor confmap. Filtering theseresults, we found how many repositories have at least one Docker associated file. In orderto do it, we used the awk7 command to parse the output text from both non-empty files —paths dockerfile and paths docker-compose — and extract only the full repository name(owner/repo). To filter out the duplicates properly using the uniq8 command, we used thesort9 to sort the list output by awk8. The total number of repositories using at least oneDocker file was 1,726.

In order to discover which of these repositories were using at least one commonlogging libraries, we searched the web for libraries commonly used for this purpose. Hav-ing a list of Python logging libraries, we built a patterns file, considering the most commontechniques to import modules in Python.

To apply the last filter to our list of repositories, we investigated these using awk8,egrep7, sort10, and uniq9. After running the composite command, we ended up with a listof 1,168 repositories. Which is 10.70% of the original list built by GraphQL queries.

This finding can help us understand the frequency in which MSA repositories areadopting observability (RQ3). Considering the 1,726 MSA repositories initially identi-fied and the 1,168 ones using at least one logging library, we have that 67.67% of MSArepositories are implementing at least logging, the most basic of the three pillars of ob-servability.

5.3. Data analysis

A shell script was developed to encode source code to de ASCII format and solve severalencoding issues. Then, the radon5 framework was used as a command line tool (radonraw -s -j -O <output-path> <repository-path) in order to collect thetotal logical lines of code (LLOC) of each one of the 1,168 selected repositories. Somerepositories took a long time (some over 40 hours) to finish the analysis.

On Table 1, it is possible to observe the microservice artifact counter for the se-lected repositories. An interesting finding was that there were no Kubernetes commonlyassociated artifacts, such as .kube directory or confmap file. But there were many Docker

7https://www.freebsd.org/cgi/man.cgi?query=awk8https://www.gnu.org/software/coreutils/manual/html node/uniq-invocation.html9https://www.gnu.org/software/coreutils/manual/html node/sort-invocation.html

files — specially of the main Dockerfile type. It may suggests that many projects, eitheruse native cloud orchestration, which is provided by the main cloud providers without theneed for configuration files, or it may not use container orchestration at all.

In order to cover the maximum number of possibilities regarding the logging li-braries, we searched the internet — articles from journals, proceedings, and technologyblogs and forums — and found 8 commonly used libraries, not counting the builtin log-ging Python library.

Table 2 summarize logging library usage. We found that, among these popularrepositories, it is not so common to use a third party logging library, instead of the builtinone. Three of them — daiquiri, pysimplelog, and twiggy — are not being used by anyof the selected projects. We also found that most of the repositories using a third partylogging library were also using the builtin logging library alongside it.

Table 1. Summary of LLOC counters for the selected repositories

Lowest Median Average Highest Std. Dev.

72 7,110 25,071 3,506,112 114,469

Table 2. Number of repositories using each MSA artifact

MSA artifact Quantity Percentage

Dockerfile 1,118 95.72%docker-compose.y* 390 33.39%

.kube 0 0.00%confmap 0 0.00%

Total 1,508

Table 3. Number of repositories using each logging library

Number ofrepositories

Library using it Percentage

logging 1,160 99.32%loguru 14 1.20%

structlog 13 1.11%logbook 9 0.77%

eliot 2 0.17%logzero 2 0.17%daiquiri 0 0.00%

pysimplelog 0 0.00%twiggy 0 0.00%

In order to collect all of the logger objects calls with their respective contents, aPython script was developed to search all *.py source code files for logger statement calls.

logger_statement_regex = (r"ˆ\s*(\w+\.)*"r"(?P<logger_object>\w+)"r"\."r"(?P<logger_verbosity>"

r"([Ff][Aa][Tt][Aa][Ll])"r"|"r"([Cc][Rr][Ii][Tt]"r"([Ii][Cc]([Aa][Ll])?)?)"r"|"r"([Ee][Xx][Cc][Ee][Pp][Tt]"r"([Ii][Oo][Nn])?)"r"|"r"([Ee][Rr]{2}[Oo][Rr])"r"|"r"([Ww][Aa][Rr][Nn]"r"([Ii][Nn][Gg])?)"r"|"r"([Ii][Nn][Ff][Oo])"r"|"r"([Dd][Ee][Bb][Uu][Gg])"r"|"r"([Ll][Oo][Gg])"

r")\(")

Listing 1. Regular expression to match logger calls

Since many logger calls could be using multiple lines in a single call, we used a regularexpression to identify this call, as presented on Listing 1.

The strategy used was to get what would be a typical logger statement call. First, aN space characters preceding a logger object — which can be named as virtually anything— that may or may not be an attribute of another object — such as a self statement —and then is immediately followed by a verbosity level or a log method which works in asimilar way.

While there are logging methods which use the verbosity level name as the callstatement, and receives primarily the message string to be logged, the log call statementrequires another argument prior to the message string: the verbosity level. The scriptidentified 7,187 (3.14%) logger calls using the log statement.

Considering the language’s builtin logging library, we have a set of possible ver-bosity levels which are equivalent to an integer representing the severity of the loggingmessage: debug (10), info (20), warning — that also has deprecated form: warn — (30),error — that has the same level as exception and is often used in similar situations —(40), critical — which has the same level as fatal and is often used in the same kind ofsituation — (50).

The script reads the source code line by line searching for a match with the regularexpression. In case of a positive match, it tests the line again with a very similar regularexpression, with a small difference at the end. If it represents a complete logger call:a single line logging statement with object call and arguments (including the logging

message string), the line will be stripped from any preceding blank spaces and stored ina row. If it isn’t a single line logging statement, it will save the first part of the statementand keep scanning the following lines, appending each one to the previous (stripped fromblank spaces and line break characters), until it identifies the end of the statement, whichis the closing parenthesis, and then it will be stored in the data set.

A summary of the counters for the logging statements found can be seen on thefirst row of Table 4. Another summary of the logging statements per LLOC ratio canbe found on the second row of the same table. The fact that we have a repository with18.60% logging statements per LLOC ratio, the highest of them, while the median is0.86%, indicates that it probably is an outlier.

In case the method used by the logger object is log, the script identifies, throughanother regular expression, the defined verbosity level among the used arguments in thecall. On the first positive match, it will register that logger call as the matched one. Ifthere are not any matches, the verbosity level for this call will be registered as OTHER,meaning a non-standard verbosity level. Of the 7,187 log statements, the script’s regularexpression could only match and properly reclassify 2,106 (29.30%) verbosity levels.Discarding 5,081 (2.22%) logging statements with unidentified verbosity levels.

Table 4. Summary: sum of logging statements and its frequency per LLOC

Lowest Median Average Highest Std. Dev.

Sum 0 57.00 198.06 8,273 508.53Ratio 0.00% 0.86% 1.48% 18.60% 0.0184

Table 5 and Figure 2 have some of the findings related to the logging statementscollected. We found that the debug and info levels are used twice as much as the moresevere ones. Further analysis is necessary to take conclusions, but it may be a sign ofa general problem in the logging strategy being adopted with repositories that have asimilar verbosity grouping. Making excessive use of logging can eventually mask themost valuable information for maintenance or even security analysis [Zhu et al. 2015].

Considering that most of the logging messages contained variables and string for-matting, so far, it was not possible to identify extensive logging patterns for the messagescollected. We used Orange3, a Python data science software, and some data sciencerelated Python libraries, such as matplotlib, numpy, pandas, pickle, and wordcloud, touncover possible findings with our data through some text processing and visualizationtechniques.

Figure 2. Frequency of use for each verbosity level

Table 5 presents the number of unique words for each verbosity level as well as thetotal number of unique words found. It is interesting to notice that, even though ‘debug’ isthe second most frequently used verbosity level, it has by far the least unique words. Onecan assume that this verbosity level, instead of more descriptive sentences in a human-likemanner, uses more technical or code related words that probably are repeated very ofteneven across different projects. We checked the length — as the number of characters usedas arguments by the logging statement calls — of each one of the logging statements,but the results showed no meaningful difference nor correlation in the length between theverbosity levels.

Table 5. Cumulative values by verbosity levels for logging statements and uniquewords found

Debug Info Warning Error Critical

Logging statements 66,208 74,330 36,660 43,836 2,881Percentage 28.91% 32.46% 16.01% 19.14% 1.26%

Unique words 3,185 41,226 23,185 44,386 28,726Percentage 3.08% 39.82% 22.39% 42.87% 27.75%

Figures 3 to 7 are word clouds formed by the most frequently used words in de-bug, info, warning, error, and critical verbosity levels, respectively. They help us betterunderstand the kinds of events being logged for each verbosity level, which is part ofthe answer to RQ2. Studies show that logging verbosity levels of ‘fatal’ or ‘debug’ usu-ally have a clear meaning, but ‘error’, ‘warning’ of ‘info’ are frequently confused amongdevelopers [Anu et al. 2019].

Figure 3. Debug word cloud

Figure 4. Info word cloud

Figure 5. Warning word cloud

Figure 6. Error word cloud

Figure 7. Critical word cloud

It can be seen that ‘debug’ messages were used to present results, describe eventsthat were happening, show urls being accessed of processed and to show program set-tings. It is compatible with the intended function of a debug logging strategy, which isto present the execution path and context of a program [Anu et al. 2019]. The ‘info’messages present a lot of verbs in the -ing form, such as ‘creating’, ‘starting’, ‘wait-ing’, ‘deleting’, ‘updating’, ‘loading’ and ‘checking’. This is also consistent with whatis expected from an ‘info’ log message. From the Warning word cloud, it can be seenthat function deprecation is by far the most addressed issue. Error messages deal mostlywith unknown, unexpected or missing resources. API is also usually present in thesesmessages, which can be explained by the fact that this study focuses on MSA. ‘Create’,‘Update’ and ‘Check’ were also very common among error messages, which needs furtherinvestigation. Finally, ‘configuration’ is the main issue addressed by ‘critical’ messages.A thorough investigation of the word hank shows a high degree of intersection between‘error’ and ‘critical’ word clouds, which and be observed from frequent use of words suchas ‘unexpected’, ‘connect’/‘connection’, ‘unknown’, ‘configuration’/‘config’.

Another interesting finding is that open source projects using certain software li-censes have a tendency to have a higher ratio of (logger statements / LLOC) than others.

Looking at the bar graph on Figure 8, it is clear that Eclipse Public License 2.0 and Cre-ative Commons Zero v1.0 Universal are the ones with the highest ratios, considering thecollected artifacts in this study, with an average of around 9.08% and 6.58% of logicallines of code being logger statements, respectively. This is 3 to 4 times higher than otherlicenses. We were unable to find any license specific logging best practices document thatcould justify this difference.

Figure 8. Percentage of logger statements per LLOC by license

Figure 9 is a scatter plot representing the correlation between repositories’ age andtheir logger statements per LLOC ratio. It was produced after the execution of an outlierremoval procedure using the Local Outlier Factor method with 10% contamination and 2neighbors using the Manhattan metric.

As presented on Figure 9, there is no significant difference in logger statements perLLOC density with repository age variation. One hypothesis being tested was whether anolder repository would be more mature and, therefore, make use of a more comprehensivelogging practice. It was only noticeable that much older repositories, beyond 4,000 daysold (above 10 years), tend to have a slightly lower average ratio than the newer ones. Thiscan be easily explained by the natural evolution of the DevOps culture in the past yearsand the evolution of programming languages support for logging as well.

Figure 9. Correlation between repositories’ age and their logger statements /LLOC ratio

6. Threats to validityOne of the main threats to the validity of this study concern the open source projects’spoken language. We noticed a significant number of repositories in mandarin. But ourstudy only focused on the analysis of English words.

Focusing the analysis on the Python language could be regarded as a limitation ofthis work. Although Python, as previously stated, is one of the most popular programminglanguages and has a considerably good adoption level of the MSA using Docker, there areother very popular languages in the same situation, such as JavaScript — mainly usingthe Node.js framework — and GoLang.

Our graphic representing the sentiment analysis made from our collected loggingstatements may not be precise. According to Wilson et. al, the contextual polarity of thesentence in which a certain word is used can differ from the same word’s prior polarity[Wilson et al. 2005]. Meaning that a word with a positive prior polarity, depending of thephrase, can end up with a negative contextual polarity, and vice versa.

Our logging statement parsing method may be considered a limitation in our study.It is possible that methods completely unrelated to logging may adopt a name pattern sim-ilar to the one the implemented in our regular expression, which was based on the officialPython builtin logging library standard. So it is possible that among the collected data,we may have contents from statements not related to the topic. But a visual validationwas made with chunks of samples, and we are confident that this noise is not significant.

7. ConclusionIn this paper, we have presented several analyses made using different empirical methodswith the purpose of identifying logging practices in popular open source Python projectsusing Docker publicly available on GitHub. Our intention is, by downloading, searchingfor specific files, and inspecting the Python source code, analyse the selected data in order

to answer some research questions. We would like to know how containerized apps arebeing logged (RQ1), what kind of events are being logged (RQ2), and what is the fre-quency in which MSA projects are adopting logging, the most fundamental observabilitypractice (RQ3).

We believe that we were able to answer all of our research questions about Pythonopen source projects using Docker, which was what we used as a MSA proof. We foundthat they use almost exclusively (99.32% of them) the language’s builtin logging library(RQ1). And that most of them use the builtin definitions for verbosity levels. We wereable to find the most frequently used terms for each one of these verbosity levels usingvisualisation techniques, such as word clouds (RQ2). We also found that 67.67% of thePython projects adopting Docker import in some part of the source code at least onelogging library (RQ3).

On the path to answer these questions, we were able to uncover other interestingfindings. We found that open source software projects adopting certain licenses (EclipsePublic License 2.0 and Creative Commons Zero v1.0 Universal) are prone to have higherlogging statements per LLOC ratio averages than projects adopting other ones. Throughdata visualization, we found that the repository age does not affect its logger statementsper LLOC ratio significantly, since only repositories older than 4,000 days had a slightlylower ratio average. Also, the logging verbosity levels DEBUG and INFO are used almosttwice as much as WARNING and ERROR.

We hope that our findings may bring awareness to the logging practices of mi-croservices from the DevOps practitioners, possibly influencing more empirical studiesabout this topic to be made. I could lead to the development of new methods or tools thatcould help to improve MSA applications and DevOps culture through observability.

References[Al-Debagy and Martinek 2018] Al-Debagy, O. and Martinek, P. (2018). A comparative

review of microservices and monolithic architectures. In 2018 IEEE 18th InternationalSymposium on Computational Intelligence and Informatics (CINTI), pages 000149–000154, Budapest, Hungary. IEEE.

[Anu et al. 2019] Anu, H., Chen, J., Shi, W., Hou, J., Liang, B., and Qin, B. (2019). Anapproach to recommendation of verbosity log levels based on logging intention. In2019 IEEE International Conference on Software Maintenance and Evolution (IC-SME), pages 125–134, Cleveland, Ohio USA. IEEE.

[Barna et al. 2017] Barna, C., Khazaei, H., Fokaefs, M., and Litoiu, M. (2017). Deliveringelastic containerized cloud applications to enable devops. In 2017 IEEE/ACM 12thInternational Symposium on Software Engineering for Adaptive and Self-ManagingSystems (SEAMS), pages 65–75, Buenos Aires, Argentina. IEEE.

[Bucchiarone et al. 2018] Bucchiarone, A., Dragoni, N., Dustdar, S., Larsen, S. T., andMazzara, M. (2018). From monolithic to microservices: An experience report fromthe banking domain. IEEE Software, 35(3):50–55.

[Capizzi et al. 2020] Capizzi, A., Distefano, S., and Mazzara, M. (2020). From devops todevdataops: Data management in devops processes. In Bruel, J.-M., Mazzara, M., andMeyer, B., editors, Software Engineering Aspects of Continuous Development and New

Paradigms of Software Production and Deployment, pages 52–62, Cham. SpringerInternational Publishing.

[De Lauretis 2019] De Lauretis, L. (2019). From monolithic architecture to microservicesarchitecture. In 2019 IEEE International Symposium on Software Reliability Engineer-ing Workshops (ISSREW), pages 93–96.

[Debois 2008] Debois, P. (2008). Agile infrastructure and operations: How infra-gile areyou? In Agile 2008 Conference, pages 202–207, Toronto, Canada. IEEE.

[Dornenburg 2018] Dornenburg, E. (2018). The path to devops. IEEE Software, 35(5):71–75.

[Filipe et al. 2018] Filipe, R., Correia, J., Araujo, F., and Cardoso, J. (2018). On black-boxmonitoring techniques for multi-component services. In 2018 IEEE 17th InternationalSymposium on Network Computing and Applications (NCA), pages 1–5.

[Ford et al. 2017] Ford, N., Parsons, R., and Kua, P. (2017). Building Evolutionary Archi-tectures: Support Constant Change. O’Reilly Media, Inc., 1st edition.

[Gos and Zabierowski 2020] Gos, K. and Zabierowski, W. (2020). The comparison of mi-croservice and monolithic architecture. In 2020 IEEE XVIth International Conferenceon the Perspective Technologies and Methods in MEMS Design (MEMSTECH), pages150–153.

[Heinrich et al. 2017] Heinrich, R., van Hoorn, A., Knoche, H., Li, F., Lwakatare, L. E.,Pahl, C., Schulte, S., and Wettinger, J. (2017). Performance engineering for microser-vices: Research challenges and directions. In Proceedings of the 8th ACM/SPEC onInternational Conference on Performance Engineering Companion, ICPE ’17 Com-panion, page 223–226, New York, NY, USA. Association for Computing Machinery.

[Hernantes et al. 2015] Hernantes, J., Gallardo, G., and Serrano, N. (2015). It infrastructure-monitoring tools. IEEE Software, 32(4):88–93.

[Kasun Indrasiri 2018] Kasun Indrasiri, P. S. (2018). Observability. In: Microservices forthe Enterprise. Apress.

[Larrucea et al. 2018] Larrucea, X., Santamaria, I., Colomo-Palacios, R., and Ebert, C.(2018). Microservices. IEEE Software, 35(3):96–100.

[Loukides 2012] Loukides, M. (2012). What is DevOps? O’Reilly Media, Inc.

[Maheshwari et al. 2018] Maheshwari, S., Deochake, S., De, R., and Grover, A. (2018).Comparative study of virtual machines and containers for devops developers. CoRR,abs/1808.08192.

[Mala 2019] Mala, D. (2019). Integrating the Internet of Things Into Software EngineeringPractices. Advances in Systems Analysis, Software Engineering, and High Perfor-mance Computing (2327-3453). IGI Global.

[Miglierina and Tamburri 2017] Miglierina, M. and Tamburri, D. A. (2017). Towards om-nia: A monitoring factory for quality-aware devops. In Proceedings of the 8thACM/SPEC on International Conference on Performance Engineering Companion,ICPE ’17 Companion, page 145–150, New York, NY, USA. Association for Com-puting Machinery.

[Pina et al. 2018] Pina, F., Correia, J., Filipe, R., Araujo, F., and Cardroom, J. (2018). Non-intrusive monitoring of microservice-based systems. In 2018 IEEE 17th InternationalSymposium on Network Computing and Applications (NCA), pages 1–8.

[Schlossnagle 2017] Schlossnagle, T. (2017). Monitoring in a devops world: Perfect shouldnever be the enemy of better. Queue, 15(6):35–45.

[Trihinas et al. 2018] Trihinas, D., Tryfonos, A., Dikaiakos, M. D., and Pallis, G. (2018).Devops as a service: Pushing the boundaries of microservice adoption. IEEE InternetComputing, 22(3):65–71.

[Villamizar et al. 2016] Villamizar, M., Garces, O., Ochoa, L., Castro, H., Salamanca, L.,Verano, M., Casallas, R., Gil, S., Valencia, C., Zambrano, A., and Lang, M. (2016).Infrastructure cost comparison of running web applications in the cloud using awslambda and monolithic and microservice architectures. In 2016 16th IEEE/ACM Inter-national Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 179–182.

[Waseem et al. 2020] Waseem, M., Liang, P., and Shahin, M. (2020). A systematic map-ping study on microservices architecture in devops. Journal of Systems and Software,170:110798.

[Wilson et al. 2005] Wilson, T., Wiebe, J., and Hoffmann, P. (2005). Recognizing contex-tual polarity in phrase-level sentiment analysis. In Proceedings of Human LanguageTechnology Conference and Conference on Empirical Methods in Natural LanguageProcessing, pages 347–354, Vancouver, British Columbia, Canada. Association forComputational Linguistics.

[Yousif 2016] Yousif, M. (2016). Microservices. IEEE Cloud Computing, 3(5):4–5.

[Yuan et al. 2012] Yuan, D., Park, S., and Zhou, Y. (2012). Characterizing logging prac-tices in open-source software. In Proceedings of the 34th International Conference onSoftware Engineering, ICSE ’12, page 102–112. IEEE Press.

[Zhou et al. 2018] Zhou, X., Peng, X., Xie, T., Sun, J., Ji, C., Li, W., and Ding, D. (2018).Fault analysis and debugging of microservice systems: Industrial survey, benchmarksystem, and empirical study. IEEE Transactions on Software Engineering, 47(2):243–260.

[Zhu et al. 2015] Zhu, J., He, P., Fu, Q., Zhang, H., Lyu, M. R., and Zhang, D. (2015).Learning to log: Helping developers make informed logging decisions. In 2015IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 1,pages 415–425.

Identifying Logging Practices in Open Source Microservices ...

Documents

Transcript of Identifying Logging Practices in Open Source Microservices ...