Middleware Planning for LCG/EGEE Bob Jones EGEE Technical Director
Network activity in EGEE-III SA2 - TERENA · SA2 Global view. SA2: Network activity in EGEE-III. 7....
Transcript of Network activity in EGEE-III SA2 - TERENA · SA2 Global view. SA2: Network activity in EGEE-III. 7....
EGEE-III INFSO-RI-222667
Enabling Grids for E-sciencE
www.eu-egee.org
EGEE and gLite are registered trademarks
Network activity in EGEE-IIISA2
Xavier Jeannin (CNRS/UREC)SA2 Activity Manager
7th NRENs and Grids Workshop (Dublin) 1/2 September 2008
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 2
Agenda
• EGEE size and statistics• SA2 Network activity
– Technical Network Liaison Committee TNLC– EGEE Network Operations Center EGEE– EGEE-III Projects
LHCOPN support / operational Model Trouble matching and correlation Tools for troubleshootingGrid site networking needs Advanced network services IPv6 Trouble Ticket standardization
• European Grid Initiative, National Grid Initiative– Lesson learnt from EGEE– Network activity in EGI/NGI
• Conclusion
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 3
EGEE: the largest multi-disciplinary research Grid infrastructure in the world
050
100150200250300
avr.-
04ju
il.-0
4oc
t.-04
janv
.-05
avr.-
05ju
il.-0
5oc
t.-05
janv
.-06
avr.-
06ju
il.-0
6oc
t.-06
janv
.-07
avr.-
07ju
il.-0
7oc
t.-07
janv
.-08
avr.-
08
No. Sites
020000400006000080000
avr.-
04ju
il.-0
4oc
t.-04
janv
.-05
avr.-
05ju
il.-0
5oc
t.-05
janv
.-06
avr.-
06ju
il.-0
6oc
t.-06
janv
.-07
avr.-
07ju
il.-0
7oc
t.-07
janv
.-08
avr.-
08No. Cores
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 With the courtesy of Bob Jones SA2: Network activity in EGEE-III 4
Users and resources distribution
Feb’08
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 With the courtesy of Erwin Laure SA2: Network activity in EGEE-III 5
Highlights of EGEE-II - Applications
• >270 VOs from several scientific domains– Astronomy & Astrophysics– Civil Protection– Computational Chemistry– Comp. Fluid Dynamics– Computer Science/Tools– Condensed Matter Physics– Earth Sciences– Fusion– High Energy Physics– Life Sciences
• Further applications under evaluation
Applications are moving from testing to routine and daily usage
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
NA12%
NA25%
NA38%
NA419%
NA51%SA1
49%
SA22%
SA39%
JRA15%
SA2: Network activity in EGEE-III 6
SA2 in EGEE-III• Total of 375 FTEs in EGEE-III
– 9010 person months (vs. 11165 PMs in EGEE-II; ~20% less)– Grand total combining funded and unfunded contributions
No difference for execution of program of work!• Network activity SA2 = 14 persons + TNLC, 159 PMs
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
SA2 – EGEE-III
SA2 Global view
SA2: Network activity in EGEE-III 7
Support for the ENOC
IPv6(GARR, CNRS)
Operational procedures (CNRS)
LCG Support (CNRS)
Operational tools and maintenance
(RRC-KI, CNRS)
Overall Networking coordination
ENOC running
TT exchange standard (GRNET)
Advanced network services(GRNET)
TNLC
IPv6 (GARR, CNRS)
Monitoring (DFN)s
Site networking needs (RedIRIS)
Troubleshooting (DFN)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 8
Technical Network Liaison Committee
• Technical Network Liaison Committee – TNLC– Facilitate cooperation between EGEE on the one hand and
GÉANT2 and the NRENs on the other hand– CERN; CNRS, France; DANTE, UK - the GÉANT2 operator;
RRC KI, Russia; DFN-Verein, Germany; GARR, Italy; GRNET, Greece; RedIRIS Spain...
• Main themes– Monitoring (E2ECU, monitoring LHCOPN/EGI) – Standardization of network trouble tickets (Assessment of the
impact on the grid of a trouble ticket)– Advanced network services (AMPS/SLA, new network advanced
services)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 June 2008 9
EGEE’08 conference• NRENs are invited to take part in the TNLC
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Role of the ENOC
• ENOC ensuring E2E connectivity for Grid sites• Assess the impact on the Grid of network trouble• Troubleshoot problems
– Provide support to users– Identify the faulty domain
• Assess the network connectivity of the Grid sites
SA2: Network activity in EGEE-III 10
GÉANT2NREN ARC 1
Grid site 1 NREN BRC 2
Grid site 2
Operated by DANTEOperated by NOC of NREN A
Operated by NOC of NREN B
Operated by NOC of RC2
Operated by NOC of RC1
ENOC ensuring E2E connectivity for Grid sites on the whole path
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
The ENOC– A single point of contact between EGEE and the NRENs where EGEE and the
network can exchange operational information– A Network support unit in GGUS (trouble ticket system of EGEE)
SA2: Network activity in EGEE-III 11
•Sites
GGUS
Users
Support Units
•NRENs
GÉANT2
•EGEE Network
•Sites•SitesSites •NRENs•NRENsNRENsENOC
• Interface with network providers:– Collect tickets from NRENs– Assess impact on the grid infrastructure– Forward to GGUS tickets that seem relevant
• Interface with the EGEE user support:– Receive tickets assigned to ENOC by the
GGUS 1st level support– Troubleshoot them provided that the ENOC
has access to suitable monitoring tools– Contact identified faulty domains or reassign
ticket to the associated site if this is local network issue
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 12
Assess the network connectivity of the Grid sites
• Specific tools developed: Downcollector, see https://ccenoc.in2p3.fr/
0
100
200
300
400
500
600
700
800
900
1000
August 07 September October November December January 08 February March
Number of connectivity troubles detected on EGEE Grid certified sites sorted per supposed location
WAN/MAN
LAN / Non network (power…)
Unknown
Number of sites with at least one network trouble
282 Certified Grid Sites
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Support of LHCOPN
SA2: Network activity in EGEE-III 13
http://ccenoc.in2p3.fr/ASPDrawer/
The LHC Optical Private Network
15 PB of data per year generated by the LHC
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 14
Support of LHCOPN
• SA2 objectives in LHCOPN context are: – Define the operational Model
Define accurately responsibilities of each actorEnsure a problem resolution is not delayed by an unsuitable operational modelEnsure the LHCOPN is well monitored
– Set up communication channels between this network and the EGEE Grid (scheduled downtimes, incidents etc.)
• LHCOPN operational model: – Federative Model, responsibility shared by Tiers 1 and Tier 0 – Approach: Define actors and their relationship, Where to find
the information, The procedure Every actor agrees on the operational model and are aware of their role and the procedure they should apply
– Draft: Operational model WIKI
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 15
LHCOPN Operational model
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 16
LHCOPN Operational model
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 17
Trouble matching and correlation RRC-KI
• Trouble matching and correlation for the ENOC– From a discovered incident find the related network trouble ticket– Better trouble localisation– Different methods will be tested
• First method– Another monitoring tool (smoke ping) has been set up, located in
Russia– The results of this tool and those from ENOC (Downcollector,
Lyon) are matched up– The two tools are located in two different places in order to
improve the knowledge of the network topology
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 18
Network Operational Database
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 19
Tools for troubleshooting DFN
• Tools for efficient troubleshooting– Launch test on demand from the Grid site under central server
control: ping, traceroute, DNS lookup, nmap and bandwithmeasurements.
ENOC
Local site light PerfSONAR’s sensorCentral ENOC monitoring server
1
Grid site B
3
2
4
5
ENOC supervisorSite administrator
Grid site A
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 20
Tools for troubleshooting DFN
• Active measure on demand, light weight PerfSONARversion with a specific plug-in
• Look for beta-tester sites
• NRENs can take advantage of the deployment of this software– To troubleshoot their own grid nodes
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 21
Grid site networking needs RedIRIS
• Establish by an empirical way the site needs in term of network needs according to type of – Site (Tiers 0, 1, 2, 3)– Experiment computed in the site
• Working plan– Review of the status of Tier2 / Tier3 in Spain– Translate the requirements and needs to network parameters to be measured.– Brief review of different network performance and monitoring tools that tiers
agree to deploy– Pilot / Service definition for deploying perfSONAR– Performance and monitoring tests definition– Tests phase, Results and conclusions.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III
Advanced network services GRNET
• Enable access for applications to the advanced services provided by the NRENs
• SLA automation in multi-domain environment through AMPS (Advance Multi-domain Provisioning )– Overcome the lack of automated mechanisms
• SLA monitoring in EGEE– Automate the monitoring procedure and generate alarms.– perfSONAR
• Investigate the new advanced network services soon available– Dynamic lightpath?
22
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 23
IPv6 follow-up GARR/CNRS
• Set up all elements needed to handle IPv6 in EGEE– Middleware, testbed
gLite internal dependencies, IPv6 compliance• DPM-LFC, BDII
External dependencies• Assessment of IPv6 compliance of external modules• Deep test for important external modules: Grid-FTP …
– Validation process of EGEE (SA3) – IPv6 knowledge dissemination
Training course, presentation
• Assess and make available an operational EGEE IPv6 site - according to which IPv6 gLite modules are available
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 24
Trouble ticket exchange CNRS/GRNET
• Defined by the TNLC (GARR, GRNET, RCC-KI, SRCE) • Standard trouble tickets allow a better
– Location of the problem– Assessment of the impact of trouble on the grid
• The translation can be done in – The ENOC, central server translating NREN’s ticket into
standard ticket– The NREN domain
• Software will be soon available• The translator can easily be adapted to the requirement
of NRENs willing to deliver directly standard• Standard trouble tickets will benefit both to NRENs and
Grid project
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 25
The European Grid Initiative
•Must be no gap in the support of the
production grid
• Need to prepare permanent Grid infrastructure• Coordinate the integration and interaction between National Grid
Infrastructures (NGIs) • Experimental/research task should switch to production phases • Establish at EGI level a sustainability collaboration between Grid
and Network people• A major stake for NRENs
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 26
The Lesson learnt from EGEE
• Future European Grid Initiative network activity:• Troubleshooting activity should be lowered at minimum (only big issues)• Interaction (process, trouble sharing) and integration (operation design, monitoring…) with the Grid are essentials at project level• Trouble ticket handling should be turned into a knowledge database and used as a part of the quality network monitoring• Network monitoring is an open subject in EGI-NGI• The NGI/EGI will federate several grid projects and therefore handle more sites and more networks• Future possibilities offered from networks to the Grid should not be missed: Dynamic lightpath provisioning (Internet2, Phosphorus…), Ipv6 compliance• Network quality control should be fostered (statistics, MoU checking, feedbacks to network providers…)
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 27
Network activity in EGI/NGI
• Network activity key objectives in EGI/NGI• Interface between the European Grid Infrastructure
and networks providers• Monitor the quality of networks used by Grid project:
• Public: Educational and research network.• Private: Non educational network providers (commercial…)• Dedicated: LHCOPN, LHC Optical Private Network…
• Ensure that application’s network requirements are fulfilled / monitoring
• Put new network technologies forward in the Grid process.
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 28
Conclusion
• Trouble ticket standardization• Tools for troubleshooting
– Light weight PerfSONAR deployed on grid site• Network monitoring for EGI • Collaboration with NRENs around
– Specifics topics (Network monitoring of grid sites, trouble ticket, assessment of the impact of trouble on the grid)
– through TNLC• Establish a future collaboration between NRENs and
NGI/EGI
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667 SA2: Network activity in EGEE-III 29
Thank you.