Michele Italy Talk
-
Upload
michele-fattoruso -
Category
Documents
-
view
15 -
download
0
Transcript of Michele Italy Talk
Michele FattorusoUser Support for Distributed Computing 19 December 2016
The Continuous Integration System at Fermi National Accelerator Laboratory
What I will be talking about?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory2
● What is fermilab
● What we do in fermilab
● What is a continuous integration
● Why we need it
● What’s my contribution
What is Fermilab?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory3
Fermilab is one of the world’s leading
high-energy Physics research facilities.
it’s owned by the U.S. Department of Energy
(DoE) and extends for 6,800-acre/27,5km2 in
Batavia ( Illinois ) where is the workspace of
over 1,750 employees including scientists
and engineers from all around the world.
Fermilab collaborates with more than 20
countries on physics experiments based in the
United States and elsewhere.
What is Fermilab known for?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory4
In the photo below is shown Fermilab's accelerator complex, that comprises seven particle accelerators and storage rings.
It produces the world's most powerful high-energy neutrino beam and provides proton and neutrino beams for various experiments.
Tevatron
Main Injector
Wilson Hall
How the Accelerator Complex works?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory5
The radio-frequency quadrupole accelerator (3,3m)Fermilab's linear accelerator
(LINAC) (152m)Fermilab Booster
(457m)The Recycler
(3,2Km)The Main Injector
(3,2Km)
Low Energy Neutrino Experiments:MicrobooneHigh-Energy Neutrino ExperimentsMINOS,MINERvA,NOvA,DUNEMuon Experiments:Muon g-2,Mu2e
The Dune Experiment
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory6
● The detectors produce huge amounts of raw information that must processed,
analyzed and compared .
● Programs of millions of line of code are used to convert the signals received from the
detectors.
● Huge programs are hard to maintain and update.
● Errors if not corrected promptly,can lead to a chain reaction of errors.
How experiments process the data?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory7
Ariane 5 flight 501:
On Tuesday, 4 June 1996,the Ariane 5, a giant rocket capable of bringing a pair of three tons
satellite in orbit exploded during the lunch day.
All it took was a conversion of a 64 bit floating point number into a 16 bit integer number
causing overflow.
The Ariane 5 costed nearly 8 Billion Dollars
How Bad can be a little bug?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory8
Why Fermilab needs a continuous testing environment?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory9
Fermilab experienced some problems with some experiments testing their code
in a finite/limited environment:
● Updating the revision of the software caused the program to have an
unexpected behaviour.
Sometime also good practice in code development can overlook some hidden bug.
● The more code you write without testing, the more paths you have to check for errors.
● The CI Project can help to have a healthy code at all times.
Bad habits in code development
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory10
Continuous Integration (CI) was first named and proposed in 1991 as a software
engineering practice for merging and integrating all developers’ working copies to
shared mainline several time a day. The concept has since evolved to automatically
build and test after each integration in a continuous cycle of builds
What is The Continuous Integration?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory11
Benefits of a Continuous Integration
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory12
At a high level, a Continuous Integration allows software development teams to:
● Reduce risk by integrating software changes many times a day, which
facilitates the early detection of defects
● Reduce repetitive manual processes, saving time, cost and effort
● Avoid last-minute chaos at release dates, when everyone tries to check in
their slightly incompatible versions
● Early detection of Integration bugs, that are detected early and are easy
to track down due to small change sets.
● Spend less time debugging and more time adding features
● Bring products to market faster, by finding issues when they are young
and small, not waiting until they are large and more difficult to fix.
The Fermilab CI System Components
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory13
The continuous integration (CI) system has the following major components:
● Hardware CI system comprised of a server and a set of distributed CI slave
nodes on which workflows are executed.
● CI application and workflow automation engine based on the Jenkins CI
system
● Jenkins CI configuration that defines the elementary CI workflow(s) to be
run by Jenkins.
● Set of scripts that run within the workflows that drive test operations or
execute elements of workflows.
● Web-based test result reporting system that provides access to all test
result information via an intuitive and simple interface.
The Continuous integration system
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory14
The Continuous integration system at Fermilab consists on a build master and
several build slaves configured with different platforms:
SLF6 - 8 cores - 32GB RAM
SLF7 - 32 cores - 128GB RAM
SLF7 - 32 cores - 128 GB RAM
SLF7 - 32 cores - 64 GB RAM
SLF6 - 32 cores - 64 GB RAM
SLF6 - 32 cores - 64 GB RAM
SLF5 - 16 cores - 64 GB RAM
SLF6 - 32 cores - 64 GB RAM
SLF6 - 32 cores - 64 GB RAM
OSX 10.12 - 2 cores - 16 GB RAM
OSX 10.11 - 2 cores - 16 GB RAM
OSX 10.10 - 2 cores - 16 GB RAM
OSX 10.11 - 2 cores - 16 GB RAM
OSX 10.10 - 2 cores - 16 GB RAM
9 Linux Machines● 1 SLF5● 5 SLF6● 3 SLF7
5 Mac OS-X Machines● 2 OSX-10.10 Yosemite● 2 OSX-10.11 El Capitan● 1 OSX-10.12 Sierra
The CI System Workflow
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory15
The CI system code, is divided in three packages:1. Generic_ci, which contain the core of the
system2. <experiment>_ci, which contain script
related to the specific experiment and the workflow and test configuration files
3. reporting, which contain the web application code.
● Each experiment uses the same basic workflow shown in the figure
● The build workflow specification is defined in a configuration file that is read at run-time.
● Non-default workflows can be selected via trigger parameters.
Jenkins CI Experiment startup Script
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory16
Setup of the Continuous Integration
Checkout of CI Code
Run the Main Script
The triggering process
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory17
To run a build on the CI system, we offer two different methods:1. push-ing a change to the develop branch of an experiment code module2. Running a trigger script in the generic_ci package.
The push command will automatically run a CI build through the use of hooks.
What test can the system execute?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory18
UNIT TEST● Automated piece of code that invokes a single logical unit of the system.● Checks a single assumption about the behavior of that logical unit.
INTEGRATION TEST● Integrates/combines the unit tested modules and tests the behavior as a combined
unit.● Its goal is to test the interfaces among the units/modules.● Verifies that the (major) parts of a system work well together.
WHAT FEATURES A TEST NEED?● Trustworthy: The output should reflect every time the real status of the code.● Fully automated: Test executed automatically without user input.● Fast Execution: Fast Tests to receive a feedback as fast as possible.● Independent: Independent tests to run them in parallels
What test can the system execute?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory19
CI TESTSA CI test is an automated procedure that check the status of the processing of the experiments, In the current status the system supports:
1. REGRESSION TEST: ● The code still performs correctly even after it was changed.
2. REPRODUCIBILITY TEST:● The code using the same input, will “always” generate the same output.
3. BACKWARD COMPATIBILITY TEST: ● The functionalities previously developed ,will still work with the new release.
4. VALIDATION TEST:● The new code produces meaningful results.
Reported statistics
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory20
● Elapsed Time (s): This is the time spent to complete the test● Max RSS (10s of kb): This is the Maximum resident memory used by
the CI Test.● Scaled CPU: This is the hypothetical time that the CI Test need if ran
on a single CPU with a 100% load.
How we setup a new experiment
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory21
Experimenters can require the CI Service through a web portal. The request is then fulfilled executing the following steps:
1. Meeting with the experiment leader to understand the requirements2. Request of all the documentation necessary to build a workflow that configure the
experiment software3. Setup the <experiment>_ci repository to contain the necessary configuration files4. Setup a new instance of the web application5. Start to write down the workflow. The default structure is the following:
a. setup the build environment;b. checkout the code;c. build the code;d. run unit tests (if any);e. install the code;f. run integration tests (if any).
6. Deliver the product to the experimenters
My Personal Contribution
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory22
Mail Report SystemCustomizable mail report system to accommodate experiments necessities
BlamelistDeveloped a functionality to allow the system to notify the user that broke the build
Skipped PhasesDeveloped a functionality to allow the user to skip a phase on defined OS
Warning StatusDeveloped a functionality that allow the system to distinguish the status of the build. Warning means that reproducibility tests or validation tests failed
Automatic update of Reference FilesThe system automatically update the reference files used by the experiment
DocumentationWrote the whole documentation for the system.
The Web Application Monitor
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory23
http://dbweb6.fnal.gov:8080/TestCI/app/view_builds/index
Skipped Phases
Warning Tests
Mail Report System
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory24
The Mail alert System
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory25
Build Informations
Tests Results
Commits in the last day
Update reference files
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory26
NOvA CI Jenkins Dashboard
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory27
NOvA CI Jenkins Dashboard
List of running/runned CI builds
NOvA CI Console Output Jenkins dashboard
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory28
Nova CI Console Output Jenkins Dashboard
We ask our users to report feedback about the use of the system, to request new feature and to request everything that could be useful for their experiment or for all the experiments that use or will use the CI system.
After few weeks after NOvA started to use our CI system, we received the first feedback.The CI system helped them to go through a complicated transition process, helping them to identify the problems as soon as possible avoiding them to spend days on debugging the code.
What our users think about the Continuous Integration
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory29
NOvA Feedback
What would have happened without CI
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory30
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory31
QUESTIONS?
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory32
THANK YOU for your
ATTENTION!
Backup Slides
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory33
Jenkins for the Continuous Integration
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory34
Jenkins is an open source automation tool written in Java with plugins built for Continuous Integration purpose. Jenkins is used to build and test your software projects continuously making it easier for developers to integrate changes to the project, and making it easier for users to obtain a fresh build. It also allows you to continuously deliver your software by integrating with a large number of testing and deployment technologies.Advantages of Jenkins include:
● It is an open source tool with great community support.● It is easy to install.● It has 1000+ plugins to ease your work. If a plugin does not exist, you can code
it and share with the community.● It is free of cost.● It is built with Java and hence, it is portable to all the major platforms.
Jenkins Distributed Architecture
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory35
Jenkins uses a Master-Slave architecture to manage distributed builds
Jenkins Master● Scheduling and Dispatch builds.● Monitoring the slaves (possibly taking them online and offline as required).● Recording and presenting the build results.
Jenkins Slave● It hears requests from the Jenkins Master instance.● The job of a Slave is to do as they are told to, which involves executing build jobs
dispatched by the Master.
Ci Tests Configuration
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory36
Workflow Configuration
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory37
The Flight Explosion
12/19/16 Michele Fattoruso | Improve the CI build summary in the mail report38
How the code is automatically tested
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory39
It’s possible to automatically run a CI build after each commit on the repository through the use of hooks.
Hooks are scripts that a version control tool executes before or after events such as: commit, push, and
receive. Git hooks are a built-in feature - no need to download anything. Git hooks are run locally.
These hook scripts are only limited by a developer's imagination. Some example hook scripts include:
● pre-commit: Check the commit message for spelling errors.
● pre-receive: Enforce project coding standards.
● post-commit: Email/SMS team members of a new commit.
● post-receive: Push the code to production.
Every repository has a hook folder (GIT: .git/hooks , SVN .subversion/hooks) with a script for each hook you
can bind to. You're free to change or update these scripts as necessary, and the version control tool will
execute them when those events occur.
For our purpose we used a post-receive hook:
For GIT this hook is invoked by git-receive-pack on the remote repository, which happens when a git push is
done on a local repository. It executes on the remote repository once after all the refs have been updated.
Reported statistics
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory40
● Wall Clock Time (s): How many second have been spent to complete the phase
● Each dot represent a different build
Dune experiment
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory41
Blame list Feature
12/19/16 Michele Fattoruso | The Continuous Integration system at Fermi National Accelerator Laboratory42