CHEP 98 - 3 September 98
1
PCSF - A PC based simulation facility running
Windows NT
Frédéric HemmerCERN-IT/PDP
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
Overview
Configuration & pictures Applications Data access Specific work & solutions Key issues Conclusions
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 3
People involved
A. Baran, J.P. Baud, C. Boissat, N. Buncic, J. Bunn, C. Charbonnier, F.Collin, V. Dore, V. Faine, S. Jarp, I. McLaren, S. O’Neale, A. Pfeiffer, H. Tang, A. Simmins, C. Von Praun, J. Wessels, R. Yaari
and all those that I forgot to mention ...
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 4
Goals
Make PC+NT a standard option for Physics Data Processing, starting with simulation
Establish a minimum management model for NT farm management
Address scalability issues Gain Windows NT experience
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 5
Milestones
Joined RD47 in Autumn 96 Price inquiry issued in 12/96 Hardware delivered 4/97 Ready to use 6/97 RD47 report 10/97 Expansion 5/98
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 6
Configuration (1) Server running NT 4.0 Server SP3
• 1 dual capable Ppro @ 200 MHz, 96 MB, with 9 GB data disk (with mirroring). LSF central queues.
Server running NT Terminal Server Beta 2• 1 dual Ppro @ 200 MHz, 128 MB, with 4 GB data
disk. Runs IIS 3.0 and is accessible from outside CERN. It also host the asp’s for Web access
Servers running NT 4.0 Workstation SP3• 9 dual Ppro’s @ 200 MHz, 64 MB, 2*4GB • 25 dual PII’s @ 300 MHz, 128 MB, 2*4GB
All equipped with boot proms
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 7
Configuration (2)
Machines interconnected with 4 3com 3000 100BaseT switch
Display/Keyboard/Mouse connected to a Raritan multiplexor
PC Duo for remote admin access There were problems with other products All running LSF 3.0. LSF 3.2 does not work, support weak Completely integrated with NICE
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 8
Racking evolution
1997 1998
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
2
Applications
ATLAS Dice simulation NA45 1996 reconstruction CMS reconstruction with Objectivity being
tested LHCB simulation code ready ATLAS reconstruction being ported ATLAS/Marseille event filter prototype
scalability tests (see poster)
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
3
Data access
NT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PC
Network
Network
Unix RFIOUnix RFIOServerServer
Unix RFIOUnix RFIOServerServer
Unix RFIOUnix RFIOServerServer
Unix RFIOUnix RFIOServerServer
Unix TapeUnix TapeServerServer
stagexxx commandsstagexxx commands
RFIORFIO
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
4
PCSF Usage
0
1000
2000
3000
4000
5000
6000
7000
8000
43 44 45 46 47 48 49 50 51 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Week #
NC
U h
ou
rs
Idle
Used
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
5
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
6
Specific work so far
Installation (Remote Boot, Winstall, NICE replica’s, Install Server)
User codes, CERNLIB, SHIFT Job Starter PC MGR WNTS Web Interface
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
7
Installation Disk cloning + change SID Fastest method, but not very
automated Remote boot
• Remote boot install procedures with virtual disk
• Use unattended setup, installs Winstall and other things
• Third party packages installed through Winstall
boot prom support on some hardware
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
8
Porting
Usually porting code from Unix to NT is easy (NA45 code ported in 1 week)
Usually porting production environment from Unix to NT is difficult (shell scripts)
Porting build environment is difficult, better to use native tools (Dev Studio)
Mixing Unix and NT build environment, revision control, etc.
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 1
9
Jobstarter
Initially inherited from Unix LSF CERN JobStarter
Rewritten in C++, using PcMgrSvc for drive mapping
Check execution preconditions Clean up normal and abnormal job end Kill popup dialog windows Excel & Winzip in batch
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
0
PcMgrSvc/Ctl
Checks• Status of monitored processes/services• Amount of scratch space• Drive mapping(s)
Map/Unmap drives Sync. with time servers Generate alarms on request Gets all parameters from registry
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
1
Web Interface
As a solution to• Remote access from outside CERN• Access from non NT hosts
Implemented as ASP’s with VB Requires IIS on the server
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
3
Web Interface - Overview
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
6
Windows NT Terminal Server
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
7
Key Issues
AFS access LSF support Boot proms, equipment interoperability CODE reintegration (Physics & CERNLIB) Think Windows Scalability & Management (home grown
solution vs. commercial apps.)
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
8
Next Steps
Finish and understand remote boot issues Complete remote boot - remote install AFS Integration Build up resilience Investigate how to use the new WfM, DMI,
PXE, ACPI, etc. initiatives Investigate whether WSH is an alternative Investigate NT’s I/O capabilities
CHEP 98 - 3 September 98
Frédéric Hemmer CERN-IT/PDP 2
9
Conclusions
PC+NT has proven to work in batch environment, and is now an option for Physics Data Processing
Farm management is less of a concern after have built a few tools (alternatives would be to use SMS or TNG), but some work is still needed
Scalability has started to be addressed, but the relatively small number of nodes does not help here
Considerable NT experience has been gained
Top Related