PCSF - A PC based simulation facility running Windows NT

Post on 12-Jan-2016

34 views 0 download

Tags:

description

PCSF - A PC based simulation facility running Windows NT. Frédéric Hemmer CERN-IT/PDP. Overview. Configuration & pictures Applications Data access Specific work & solutions Key issues Conclusions. People involved. - PowerPoint PPT Presentation

Transcript of PCSF - A PC based simulation facility running Windows NT

CHEP 98 - 3 September 98

1

PCSF - A PC based simulation facility running

Windows NT

Frédéric HemmerCERN-IT/PDP

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

Overview

Configuration & pictures Applications Data access Specific work & solutions Key issues Conclusions

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 3

People involved

A. Baran, J.P. Baud, C. Boissat, N. Buncic, J. Bunn, C. Charbonnier, F.Collin, V. Dore, V. Faine, S. Jarp, I. McLaren, S. O’Neale, A. Pfeiffer, H. Tang, A. Simmins, C. Von Praun, J. Wessels, R. Yaari

and all those that I forgot to mention ...

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 4

Goals

Make PC+NT a standard option for Physics Data Processing, starting with simulation

Establish a minimum management model for NT farm management

Address scalability issues Gain Windows NT experience

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 5

Milestones

Joined RD47 in Autumn 96 Price inquiry issued in 12/96 Hardware delivered 4/97 Ready to use 6/97 RD47 report 10/97 Expansion 5/98

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 6

Configuration (1) Server running NT 4.0 Server SP3

• 1 dual capable Ppro @ 200 MHz, 96 MB, with 9 GB data disk (with mirroring). LSF central queues.

Server running NT Terminal Server Beta 2• 1 dual Ppro @ 200 MHz, 128 MB, with 4 GB data

disk. Runs IIS 3.0 and is accessible from outside CERN. It also host the asp’s for Web access

Servers running NT 4.0 Workstation SP3• 9 dual Ppro’s @ 200 MHz, 64 MB, 2*4GB • 25 dual PII’s @ 300 MHz, 128 MB, 2*4GB

All equipped with boot proms

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 7

Configuration (2)

Machines interconnected with 4 3com 3000 100BaseT switch

Display/Keyboard/Mouse connected to a Raritan multiplexor

PC Duo for remote admin access There were problems with other products All running LSF 3.0. LSF 3.2 does not work, support weak Completely integrated with NICE

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 8

Racking evolution

1997 1998

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

2

Applications

ATLAS Dice simulation NA45 1996 reconstruction CMS reconstruction with Objectivity being

tested LHCB simulation code ready ATLAS reconstruction being ported ATLAS/Marseille event filter prototype

scalability tests (see poster)

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

3

Data access

NT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PC

Network

Network

Unix RFIOUnix RFIOServerServer

Unix RFIOUnix RFIOServerServer

Unix RFIOUnix RFIOServerServer

Unix RFIOUnix RFIOServerServer

Unix TapeUnix TapeServerServer

stagexxx commandsstagexxx commands

RFIORFIO

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

4

PCSF Usage

0

1000

2000

3000

4000

5000

6000

7000

8000

43 44 45 46 47 48 49 50 51 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Week #

NC

U h

ou

rs

Idle

Used

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

5

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

6

Specific work so far

Installation (Remote Boot, Winstall, NICE replica’s, Install Server)

User codes, CERNLIB, SHIFT Job Starter PC MGR WNTS Web Interface

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

7

Installation Disk cloning + change SID Fastest method, but not very

automated Remote boot

• Remote boot install procedures with virtual disk

• Use unattended setup, installs Winstall and other things

• Third party packages installed through Winstall

boot prom support on some hardware

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

8

Porting

Usually porting code from Unix to NT is easy (NA45 code ported in 1 week)

Usually porting production environment from Unix to NT is difficult (shell scripts)

Porting build environment is difficult, better to use native tools (Dev Studio)

Mixing Unix and NT build environment, revision control, etc.

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 1

9

Jobstarter

Initially inherited from Unix LSF CERN JobStarter

Rewritten in C++, using PcMgrSvc for drive mapping

Check execution preconditions Clean up normal and abnormal job end Kill popup dialog windows Excel & Winzip in batch

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

0

PcMgrSvc/Ctl

Checks• Status of monitored processes/services• Amount of scratch space• Drive mapping(s)

Map/Unmap drives Sync. with time servers Generate alarms on request Gets all parameters from registry

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

1

Web Interface

As a solution to• Remote access from outside CERN• Access from non NT hosts

Implemented as ASP’s with VB Requires IIS on the server

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

3

Web Interface - Overview

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

6

Windows NT Terminal Server

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

7

Key Issues

AFS access LSF support Boot proms, equipment interoperability CODE reintegration (Physics & CERNLIB) Think Windows Scalability & Management (home grown

solution vs. commercial apps.)

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

8

Next Steps

Finish and understand remote boot issues Complete remote boot - remote install AFS Integration Build up resilience Investigate how to use the new WfM, DMI,

PXE, ACPI, etc. initiatives Investigate whether WSH is an alternative Investigate NT’s I/O capabilities

CHEP 98 - 3 September 98

Frédéric Hemmer CERN-IT/PDP 2

9

Conclusions

PC+NT has proven to work in batch environment, and is now an option for Physics Data Processing

Farm management is less of a concern after have built a few tools (alternatives would be to use SMS or TNG), but some work is still needed

Scalability has started to be addressed, but the relatively small number of nodes does not help here

Considerable NT experience has been gained