1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17...
-
Upload
ilene-perry -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Computing & Networking User Group Meeting Roy Whitney Andy Kowalski Sandy Philpott Chip Watson 17...
1
Computing & NetworkingUser Group Meeting
Roy Whitney
Andy Kowalski
Sandy Philpott
Chip Watson
17 June 2008
2
Users and JLab IT
• Ed Brash is User Group Board of Directors’ representative on the IT Steering Committee.
• Physics Computing Committee (Sandy Philpott)
• Helpdesk and CCPR requests and activities
• Challenges– Constrained budget
• Staffing• Aging infrastructure
– Cyber Security
3
Computing and Networking Infrastructure
Andy Kowalski
4
CNI Outline
• Helpdesk
• Computing
• Wide Area Network
• Cyber Security
• Networking and Asset Management
5
Helpdesk
• Hour 8am-12pm M-F– Submit a CCPR via http://cc.jlab.org/– Dial x7155– Send email to [email protected]
• Windows XP, Vista and RHEL5 Supported Desktops– Migrating older desktops
• Mac Support?
6
Computing
• Email Servers Upgraded– Dovecot IMAP Server (Indexing)– New File Server and IMAP Servers (Farm Nodes)
• Servers Migrating to Virtual Machines
• Printing– Centralized Access via jlabprt.jlab.org– Accounting Coming Soon
• Video Conferencing (working on EVO)
7
Wide Area Network• Bandwidth
– 10Gbps WAN and LAN backbone– Offsite Data Transfer Servers
• scigw.jlab.org(bbftp)• qcdgw.jlab.org(bbcp)
8
Cyber Security Challenge
• The threat: sophistication and volume of attacks continue to increase.– Phishing Attacks
• Spear Phishing/Whaling are now being observed at JLab.
• Federal, including DOE, requirements to meet the cyber security challenges require additional measures.
• JLab uses a risk based approach that incorporates achieving the mission while at the same time dealing with the threat.
9
Cyber Security
• Managed Desktops
– Skype Allowed From Managed Desktops On Certain Enclaves
• Network Scanning
• Intrusion Detection
• PII/SUI (CUI) Management
10
Networking and IT Asset Management
• Network Segmentation/Enclaves– Firewalls
• Computer Registration– https://reggie.jlab.org/user/index.php
• Managing IP Addresses– DHCP
• Assigns all IP addresses (most static)• Integrated with registration
• Automatic Port Configuration– Rolling out now– Uses registration database
11
Scientific Computing
Chip Watson & Sandy Philpott
13
Farm Evolution Motivation
• Capacity upgrades– Re-use of HPC clusters
• Movement to Open Source– O/S upgrade– Change from LSF to PBS
14
Farm Evolution Timetable
Nov 07: Auger/PBS available – RHEL3 - 35 nodes
Jan 08: Fedora 8 (F8) available – 50 nodes
May 08: Friendly-user mode; IFARML4,5
Jun 08: Production
– F8 only; IFARML3 + 60 nodes from LSF IFARML alias
Jul 08: IFARML2 + 60 nodes from LSF
Aug 08: IFARML1 + 60 nodes from LSF
Sep 08: RHEL3/LSF->F8/PBS Migration complete
– No renewal of LSF or RHEL for cluster nodes
15
Farm F8/PBS Differences
• Code must be recompiled– 2.6 kernel– gcc 4
• Software installed locally via yum– cernlib– Mysql
• Time limits: 1 day default, 3 days max
• stdout/stderr to ~/farm_out
• Email notification
16
Farm Future Plans
• Additional nodes – from HPC clusters
• CY08: ~120 4g nodes• CY09-10: ~60 6n nodes
– Purchase as budgets allow
• Support for 64 bit systems when feasible & needed
17
Storage Evolution
• Deployment of Sun x4500 “thumpers”
• Decommissioning of Panasas(old /work server)
• Planned replacement of old cache nodes
18
Tape Library
• Current STK “Powderhorn” silo is nearing end-of-life– Reaching capacity & running out of blank tapes– Doesn’t support upgrade to higher density cartridges– Is officially end-of-life December 2010
• Market trends– LTO (Linear Tape Open) Standard has proliferated since 2000– LTO-4 is 4x density, capacity/$, and bandwidth of 9940b:
800 GB/tape, $100/TB, 120 MB/s– LTO-5, out next year, will double capacity, 1.5x bandwidth:
1600 GB/tape, 180 MB/s– LTO-6 will be out prior to the 12 GeV era
3200 GB/tape, 270 MB/s
19
Tape Library Replacement
• Competitive procurement now in progress– Replace old system, support 10x growth over 5 years
• Phase 1 in August– System integration, software evolution– Begin data transfers, re-use 9940b tapes
• Tape swap through January
• 2 PB capacity by November
• DAQ to LTO-4 in January 2009
• Old silo gone in March 2009
End result: breakeven on cost by the end of 2009!
20
Long Term Planning
• Continue to increase compute & storage capacity in most cost effective manner
• Improve processes & planning– PAC submission process– 12 GeV Planning…
E.g.: Hall B Requirements
Event Simulation 2012 2013 2014 2015 2016SPECint_rate2006 sec/event 1.8 1.8 1.8 1.8 1.8Number of events 1.00E+12 1.00E+12 1.00E+12 1.00E+12 1.00E+12Event size (KB) 20 20 20 20 20
% Stored Long Term 10% 25% 25% 25% 25%Total CPU (SPECint_rate2006) 5.7E+04 5.7E+04 5.7E+04 5.7E+04 5.7E+04Petabytes / year (PB) 2 5 5 5 5
Data Acquisition Average event size (KB) 20 20 20 20 20Max sustained event rate (kHz) 0 0 10 10 20Average event rate (kHz) 0 0 10 10 10Average 24-hour duty factor (%) 0% 0% 50% 60% 65%Weeks of operation / year 0 0 0 30 30Network (n*10gigE) 1 1 1 1 1Petabytes / year 0.0 0.0 0.0 2.2 2.4
1st Pass Analysis 2012 2013 2014 2015 2016
SPECint_rate2006 sec/event 1.5 1.5 1.5 1.5 1.5Number of analysis passes 0 0 1.5 1.5 1.5Event size out / event size in 2 2 2 2 2Total CPU (SPECint_rate2006) 0.0E+00 0.0E+00 0.0E+00 7.8E-03 8.4E-03Silo Bandwidth (MB/s) 0 0 900 900 1800Petabytes / year 0.0 0.0 0.0 4.4 4.7
Total SPECint_rate2006 5.7E+04 5.7E+04 5.7E+04 5.7E+04 5.7E+04SPECint_rate2006 / node 600 900 1350 2025 3038# nodes needed (current year) 95 63 42 28 19Petabytes / year 2 5 5 12 12
22
LQCD Computing
• JLab operates 3 clusters with nearly 1100 nodes, primarily for LQCD plus some accelerator modeling
• National LQCD Computing Project (2006-2009: BNL, FNAL, JLab; USQCD Collaboration)
• LQCD II proposal 2010-2014 would double the hardware budget to enable key calculations
• JLab Experimental Physics & LQCD computing share staff (operations & software development) & tape silo, providing efficiencies for both