Michele Italy Talk

42
Michele Fattoruso User Support for Distributed Computing 19 December 2016 The Continuous Integration System at Fermi National Accelerator Laboratory

Transcript of Michele Italy Talk

Page 1: Michele Italy Talk

Michele FattorusoUser Support for Distributed Computing 19 December 2016

The Continuous Integration System at Fermi National Accelerator Laboratory

Page 2: Michele Italy Talk

What I will be talking about?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory2

● What is fermilab

● What we do in fermilab

● What is a continuous integration

● Why we need it

● What’s my contribution

Page 3: Michele Italy Talk

What is Fermilab?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory3

Fermilab is one of the world’s leading

high-energy Physics research facilities.

it’s owned by the U.S. Department of Energy

(DoE) and extends for 6,800-acre/27,5km2 in

Batavia ( Illinois ) where is the workspace of

over 1,750 employees including scientists

and engineers from all around the world.

Fermilab collaborates with more than 20

countries on physics experiments based in the

United States and elsewhere.

Page 4: Michele Italy Talk

What is Fermilab known for?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory4

In the photo below is shown Fermilab's accelerator complex, that comprises seven particle accelerators and storage rings.

It produces the world's most powerful high-energy neutrino beam and provides proton and neutrino beams for various experiments.

Tevatron

Main Injector

Wilson Hall

Page 5: Michele Italy Talk

How the Accelerator Complex works?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory5

The radio-frequency quadrupole accelerator (3,3m)Fermilab's linear accelerator

(LINAC) (152m)Fermilab Booster

(457m)The Recycler

(3,2Km)The Main Injector

(3,2Km)

Low Energy Neutrino Experiments:MicrobooneHigh-Energy Neutrino ExperimentsMINOS,MINERvA,NOvA,DUNEMuon Experiments:Muon g-2,Mu2e

Page 6: Michele Italy Talk

The Dune Experiment

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory6

Page 7: Michele Italy Talk

● The detectors produce huge amounts of raw information that must processed,

analyzed and compared .

● Programs of millions of line of code are used to convert the signals received from the

detectors.

● Huge programs are hard to maintain and update.

● Errors if not corrected promptly,can lead to a chain reaction of errors.

How experiments process the data?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory7

Page 8: Michele Italy Talk

Ariane 5 flight 501:

On Tuesday, 4 June 1996,the Ariane 5, a giant rocket capable of bringing a pair of three tons

satellite in orbit exploded during the lunch day.

All it took was a conversion of a 64 bit floating point number into a 16 bit integer number

causing overflow.

The Ariane 5 costed nearly 8 Billion Dollars

How Bad can be a little bug?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory8

Page 9: Michele Italy Talk

Why Fermilab needs a continuous testing environment?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory9

Fermilab experienced some problems with some experiments testing their code

in a finite/limited environment:

● Updating the revision of the software caused the program to have an

unexpected behaviour.

Sometime also good practice in code development can overlook some hidden bug.

● The more code you write without testing, the more paths you have to check for errors.

● The CI Project can help to have a healthy code at all times.

Page 10: Michele Italy Talk

Bad habits in code development

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory10

Page 11: Michele Italy Talk

Continuous Integration (CI) was first named and proposed in 1991 as a software

engineering practice for merging and integrating all developers’ working copies to

shared mainline several time a day. The concept has since evolved to automatically

build and test after each integration in a continuous cycle of builds

What is The Continuous Integration?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory11

Page 12: Michele Italy Talk

Benefits of a Continuous Integration

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory12

At a high level, a Continuous Integration allows software development teams to:

● Reduce risk by integrating software changes many times a day, which

facilitates the early detection of defects

● Reduce repetitive manual processes, saving time, cost and effort

● Avoid last-minute chaos at release dates, when everyone tries to check in

their slightly incompatible versions

● Early detection of Integration bugs, that are detected early and are easy

to track down due to small change sets.

● Spend less time debugging and more time adding features

● Bring products to market faster, by finding issues when they are young

and small, not waiting until they are large and more difficult to fix.

Page 13: Michele Italy Talk

The Fermilab CI System Components

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory13

The continuous integration (CI) system has the following major components:

● Hardware CI system comprised of a server and a set of distributed CI slave

nodes on which workflows are executed.

● CI application and workflow automation engine based on the Jenkins CI

system

● Jenkins CI configuration that defines the elementary CI workflow(s) to be

run by Jenkins.

● Set of scripts that run within the workflows that drive test operations or

execute elements of workflows.

● Web-based test result reporting system that provides access to all test

result information via an intuitive and simple interface.

Page 14: Michele Italy Talk

The Continuous integration system

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory14

The Continuous integration system at Fermilab consists on a build master and

several build slaves configured with different platforms:

SLF6 - 8 cores - 32GB RAM

SLF7 - 32 cores - 128GB RAM

SLF7 - 32 cores - 128 GB RAM

SLF7 - 32 cores - 64 GB RAM

SLF6 - 32 cores - 64 GB RAM

SLF6 - 32 cores - 64 GB RAM

SLF5 - 16 cores - 64 GB RAM

SLF6 - 32 cores - 64 GB RAM

SLF6 - 32 cores - 64 GB RAM

OSX 10.12 - 2 cores - 16 GB RAM

OSX 10.11 - 2 cores - 16 GB RAM

OSX 10.10 - 2 cores - 16 GB RAM

OSX 10.11 - 2 cores - 16 GB RAM

OSX 10.10 - 2 cores - 16 GB RAM

9 Linux Machines● 1 SLF5● 5 SLF6● 3 SLF7

5 Mac OS-X Machines● 2 OSX-10.10 Yosemite● 2 OSX-10.11 El Capitan● 1 OSX-10.12 Sierra

Page 15: Michele Italy Talk

The CI System Workflow

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory15

The CI system code, is divided in three packages:1. Generic_ci, which contain the core of the

system2. <experiment>_ci, which contain script

related to the specific experiment and the workflow and test configuration files

3. reporting, which contain the web application code.

● Each experiment uses the same basic workflow shown in the figure

● The build workflow specification is defined in a configuration file that is read at run-time.

● Non-default workflows can be selected via trigger parameters.

Page 16: Michele Italy Talk

Jenkins CI Experiment startup Script

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory16

Setup of the Continuous Integration

Checkout of CI Code

Run the Main Script

Page 17: Michele Italy Talk

The triggering process

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory17

To run a build on the CI system, we offer two different methods:1. push-ing a change to the develop branch of an experiment code module2. Running a trigger script in the generic_ci package.

The push command will automatically run a CI build through the use of hooks.

Page 18: Michele Italy Talk

What test can the system execute?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory18

UNIT TEST● Automated piece of code that invokes a single logical unit of the system.● Checks a single assumption about the behavior of that logical unit.

INTEGRATION TEST● Integrates/combines the unit tested modules and tests the behavior as a combined

unit.● Its goal is to test the interfaces among the units/modules.● Verifies that the (major) parts of a system work well together.

WHAT FEATURES A TEST NEED?● Trustworthy: The output should reflect every time the real status of the code.● Fully automated: Test executed automatically without user input.● Fast Execution: Fast Tests to receive a feedback as fast as possible.● Independent: Independent tests to run them in parallels

Page 19: Michele Italy Talk

What test can the system execute?

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory19

CI TESTSA CI test is an automated procedure that check the status of the processing of the experiments, In the current status the system supports:

1. REGRESSION TEST: ● The code still performs correctly even after it was changed.

2. REPRODUCIBILITY TEST:● The code using the same input, will “always” generate the same output.

3. BACKWARD COMPATIBILITY TEST: ● The functionalities previously developed ,will still work with the new release.

4. VALIDATION TEST:● The new code produces meaningful results.

Page 20: Michele Italy Talk

Reported statistics

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory20

● Elapsed Time (s): This is the time spent to complete the test● Max RSS (10s of kb): This is the Maximum resident memory used by

the CI Test.● Scaled CPU: This is the hypothetical time that the CI Test need if ran

on a single CPU with a 100% load.

Page 21: Michele Italy Talk

How we setup a new experiment

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory21

Experimenters can require the CI Service through a web portal. The request is then fulfilled executing the following steps:

1. Meeting with the experiment leader to understand the requirements2. Request of all the documentation necessary to build a workflow that configure the

experiment software3. Setup the <experiment>_ci repository to contain the necessary configuration files4. Setup a new instance of the web application5. Start to write down the workflow. The default structure is the following:

a. setup the build environment;b. checkout the code;c. build the code;d. run unit tests (if any);e. install the code;f. run integration tests (if any).

6. Deliver the product to the experimenters

Page 22: Michele Italy Talk

My Personal Contribution

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory22

Mail Report SystemCustomizable mail report system to accommodate experiments necessities

BlamelistDeveloped a functionality to allow the system to notify the user that broke the build

Skipped PhasesDeveloped a functionality to allow the user to skip a phase on defined OS

Warning StatusDeveloped a functionality that allow the system to distinguish the status of the build. Warning means that reproducibility tests or validation tests failed

Automatic update of Reference FilesThe system automatically update the reference files used by the experiment

DocumentationWrote the whole documentation for the system.

Page 23: Michele Italy Talk

The Web Application Monitor

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory23

http://dbweb6.fnal.gov:8080/TestCI/app/view_builds/index

Skipped Phases

Warning Tests

Page 24: Michele Italy Talk

Mail Report System

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory24

Page 25: Michele Italy Talk

The Mail alert System

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory25

Build Informations

Tests Results

Commits in the last day

Page 26: Michele Italy Talk

Update reference files

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory26

Page 27: Michele Italy Talk

NOvA CI Jenkins Dashboard

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory27

NOvA CI Jenkins Dashboard

List of running/runned CI builds

Page 28: Michele Italy Talk

NOvA CI Console Output Jenkins dashboard

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory28

Nova CI Console Output Jenkins Dashboard

Page 29: Michele Italy Talk

We ask our users to report feedback about the use of the system, to request new feature and to request everything that could be useful for their experiment or for all the experiments that use or will use the CI system.

After few weeks after NOvA started to use our CI system, we received the first feedback.The CI system helped them to go through a complicated transition process, helping them to identify the problems as soon as possible avoiding them to spend days on debugging the code.

What our users think about the Continuous Integration

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory29

NOvA Feedback

Page 30: Michele Italy Talk

What would have happened without CI

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory30

Page 31: Michele Italy Talk

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory31

QUESTIONS?

Page 32: Michele Italy Talk

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory32

THANK YOU for your

ATTENTION!

Page 33: Michele Italy Talk

Backup Slides

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory33

Page 34: Michele Italy Talk

Jenkins for the Continuous Integration

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory34

Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.Advantages of Jenkins include:

● It is an open source tool with great community support.● It is easy to install.● It has 1000+ plugins to ease your work. If a plugin does not exist, you can code

it and share with the community.● It is free of cost.● It is built with Java and hence, it is portable to all the major platforms.

Page 35: Michele Italy Talk

Jenkins Distributed Architecture

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory35

Jenkins uses a Master-Slave architecture to manage distributed builds

Jenkins Master● Scheduling and Dispatch builds.● Monitoring the slaves (possibly taking them online and offline as required).● Recording and presenting the build results.

Jenkins Slave● It hears requests from the Jenkins Master instance.● The job of a Slave is to do as they are told to, which involves executing build jobs

dispatched by the Master.

Page 36: Michele Italy Talk

Ci Tests Configuration

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory36

Page 37: Michele Italy Talk

Workflow Configuration

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory37

Page 38: Michele Italy Talk

The Flight Explosion

12/19/16 Michele Fattoruso | Improve the CI build summary in the mail report38

Page 39: Michele Italy Talk

How the code is automatically tested

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory39

It’s possible to automatically run a CI build after each commit on the repository through the use of hooks.

Hooks are scripts that a version control tool executes before or after events such as: commit, push, and

receive. Git hooks are a built-in feature - no need to download anything. Git hooks are run locally.

These hook scripts are only limited by a developer's imagination. Some example hook scripts include:

● pre-commit: Check the commit message for spelling errors.

● pre-receive: Enforce project coding standards.

● post-commit: Email/SMS team members of a new commit.

● post-receive: Push the code to production.

Every repository has a hook folder (GIT: .git/hooks , SVN .subversion/hooks) with a script for each hook you

can bind to. You're free to change or update these scripts as necessary, and the version control tool will

execute them when those events occur.

For our purpose we used a post-receive hook:

For GIT this hook is invoked by git-receive-pack on the remote repository, which happens when a git push is

done on a local repository. It executes on the remote repository once after all the refs have been updated.

Page 40: Michele Italy Talk

Reported statistics

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory40

● Wall Clock Time (s): How many second have been spent to complete the phase

● Each dot represent a different build

Page 41: Michele Italy Talk

Dune experiment

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory41

Page 42: Michele Italy Talk

Blame list Feature

12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory42