AMS Computing Y2001-Y2002
description
Transcript of AMS Computing Y2001-Y2002
AMS Computing Y2001-Y2002
AMS Technical Interchange Meeting
MIT Jan 22-25, 2002
Vitali Choutko, Alexei Klimentov
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 2
Outline AMS Production Farm requirements architecture prototyping test of HW and SW components HW and SW evaluation for AMS02
Ground Segment Data Transmission SW Y2002 Milestones
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 3
AMS Ground Centers
Science Operations Center
POCCPOCCPOIC@MSFC AL
AMS Remotecenter
RT data CommandingMonitoringNRT Analysis
NRT Data Processing Primary storage Archiving DistributionScience Analysis
MC productionData mirror archiving
Exte
rnal
Com
mu
nic
ati
on
s
ScienceOperationsCenter
XTermHOSC Web Server and xterm
TReK WS
commandsMonitoring, H&S dataFlight Ancillary dataAMS science data (selected)
TReK WS“voice”loop
Video distribution
Production Farm
AnalysisFacilities
PC Farm
Data Server
AnalysisFacilities
GSE D S
A eT rA v e r
GSEBuffer dataRetransmitTo SOC
AMS Station
AMS Station
AMS Station
GSE
MC production
cmds archive
AMS Data, NASA data,
metadata
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 4
AMS Production Farm (requirements)
Reliability – High (24h/day, 7days/week) Performance goal – process data “quasi-online” (with
typical delay < 1 day) Disk Space – 12 months data “online” Minimal human intervention (automatic data handling,
job control and book-keeping) System stability – months Scalability Price/Performance
Complex system that consists of computing components including I/O nodes, worker nodes, data storage and networking switches. It should perform as a single system.Requirements :
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 5
AMS Production Farm (considerations)
Uniform node architecture ( dual-CPU Pentiums and AMDs) Uniform Operating System (RedHat Linux) Computing capacity equivalent to 400x450MHz PII processors (including 20% contingency and
reprocessing) Total of 10 Tbyte data stored online Two types of computers :
Considerations based on AMS01 data processing experience and MC production Y2000-2001 :
“Processing node” with cheap IDE disks used for transient data storage
“Server node” with IDE and SCSI RAID disks for persistent data storage
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 6
Y2001 milestonesHW evaluation to make a choice of platform and architecture (“official” AMS02 simulation/reconstruction code been used for the benchmarking)
Functional Goal : AMS01 STS91 Data Rerun and AMS02 MC production using production farm prototype and SW
7AMS TIM, MIT, Jan 22-25 2002V.Ch, A.K.
AMS02 Benchmarks
Executive time of AMS “standard” job compare to CPU clock
1) V.Choutko, A.Klimentov AMS note 2001-11-01
1)
Brand, CPU , Memory
Intel PII dual-CPU 450 MHz, 512 MB RAM
OS/Compiler
RH Linux 6.2 / gcc 2.95
“Sim”
1
“Rec”
1
Intel PIII dual-CPU 933 MHz, 512 MB RAM RH Linux 6.2 / gcc 2.95 0.54 0.54
Compaq, Quad α-ev67 600 MHz, 2 GB RAM RH Linux 6.2 / gcc 2.95 0.58 0.59
AMD Athlon,1.2GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.39 0.34
Intel Pentium IV 1.5GHz, 256 MB RAM RH Linux 6.2 / gcc 2.95 0.44 0.58
Compaq dual-CPU PIV Xeon 1.7GHz, 2GB RAM
RH Linux 6.2 / gcc 2.95 0.32 0.39
Compaq dual α-ev68 866MHz, 2GB RAM Tru64 Unix/ cxx 6.2 0.23 0.25
Elonex Intel dual-CPU PIV Xeon 2GHz, 1GB RAM
RH Linux 7.2 / gcc 2.95 0.29 0.35
AMD Athlon 1800MP, dual-CPU 1.53GHz, 1GB RAM
RH Linux 7.2 / gcc 2.95
0.24 0.23
8 CPU SUN-Fire-880, 750MHz, 8GB RAM Solaris 5.8/C++ 5.2 0.52 0.45
24 CPU Sun Ultrasparc-III+, 900MHz, 96GB RAM
RH Linux 6.2 / gcc 2.95 0.43 0.39
Compaq α-ev68 dual 866MHz, 2GB RAM
RH Linux 7.1 / gcc 2.95
0.22 0.23
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 8
AMS01 STS91 Data Rerun (performance)
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 9
AMS02 Benchmarks (summary) α-ev68 866MHz and AMD Athlon MP 1800+ have nearly the same
performance and are the best candidates for “AMS processing node”
(the price of system based on α-ev68 is twice higher than the similar one based on AMD Athlon)
Though PIV Xeon has lower performance, resulting 15% overhead comparing with AMD Athlon MP 1800+, the requirements of high reliability for “AMS server node” dictates the choice of Pentium machine.
SUN and COMPAQ SMP might be the candidates for AMS analysis computer (the choice is postponed up to L-12 months)
Conclusion : The total power of AMS02 processing
farm must be equivalent to 50 AMD Athlon MP 1800+ computers.
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 10
Production Farm (“AMS processing node” architecture)
ProcessorChip setMemorySystem DiskDisk ControllerDisks (transient storage)Ethernet Adapters
“public” “AMS private”
dual-CPU 1.5+GHzcurrently AMD1 GB RAMLVD SCSI3Ware IDE RAID6x120+GB IDE
100 Mbit/sec2x1 GBit/sec
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 11
Production Farm (“AMS server node” architecture)
ProcessorChip setMemorySystem DiskDisk ControllerDisks (permanent storage)Disk ControllerDisks (transient storage)Ethernet Adapters
“public” “AMS private”
dual-CPU 1.4+GHzcurrently Intel1 GB RAMLVD SCSIIPC SCSI RAID 8x180+GB SCSI3Ware IDE RAID7x120+GB IDE
100 Mbit/sec2x1 GBit/sec
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 12
Production Farm HW Tape Drive (“raw” data backup)
• IBM LTO Ultrium (connected to “server node” prototype) data transfer (write) RAID 5 array -> tape 11MByte/sec data transfer (read) tape -> Null device 19MByte/sec tape -> RAID 5 array 11MByte/sec tape capacity 200GB
(see also http://cscct.home.cern.ch/cscct/ultrium)
13
Archiving and Staging
Analysis FacilitiesData Server
Cell #1
#2
#8
PC Linux2x2GHz+
PC Linux2x2GHz+
PC Linux2x2GHz+
PC Linux2x2GHz+
PC Linux2x2GHz+
TapeServer
PC Linux2x2GHz+
PC Linux Server2x2GHz, SCSI RAID
TapeServer
DiskServer
DiskServer
DiskServer
Gigabit Switch (1 Gbit/sec)
Gigabit Switch (1 Gbit/sec)
Gigabit Switch (1 Gbit/sec)
PC Linux2x2GHz+
2xSMP,(Q, SUN)
AMS dataNASA datametadata
AMS Science Operation Center Computing Facilities
Production Farm
DiskServer
DiskServer
Sim
ula
ted
data
A.Klimentov Jan 15,2002
MC Data Server
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 14
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 15
AMS Computing Y2001 (SW) AMS production process/process communication and control SW (PPCC) and monitoring
Data Handling ORACLE DB to store metadata and catalogues (M.Boschini, A.Klimentov)
Data transmission package
Client/Server Corba technology (V.Choutko)Process Monitoring package (M.Boschini, V.Choutko, A.Klimentov)
Based on bbftp (A.Elin, A.Klimentov AMS note 2001-11-02)
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 16
AMS Production Highlights Excellent HW stability ( uptime more than 3 months) AMS01 STS91 data rerun (10 Linux boxes, 19 CPUs) Average efficiency 95% (cpu time/elapsed time) Processes communication and control via Corba LSF for process submission Oracle server on AS4100 Alpha and Oracle clients on Linux. Oracle RDBMS
Tag DB with 100M entries Conditions DB with 100K entries Bookkeeping Production
status Runs history File catalogues
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 17
Data Transmission SW High Rate Data Transfer between MSFC and POCC/SOC, POCC and SOC, SOC and MasterCopy
repositary(s) will become a paramount importance (tests with TReK between MIT and CERN, TReK is the best candidate for AMS commanding and transferring of data samples)
What should be used for the bulk data transfer ? Why not FileTransferProtocol (ftp) or ncftp , etc ? to speed up data transfer to encrypt sensitive data and not encrypt bulk data to run in batch mode with automatic retry in case of failure … starting to look around and came up with bbftp in September (bbftp developed in BaBar and used to transmit data from SLAC to IN2P3@Lyon) adapted it for AMS, wrote
service and control programs
1)
1) A.Elin, A.Klimentov AMS note 2001-11-022) P.Fisher, A.Klimentov AMS Note 2001-05-02
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 18
Data Transmission SW (the inside details) Server copy data files between directories (optional) scan data directories and make list of files to be transmitted purge successfully transmitted files and do book-keeping of transmission sessions
Client periodically connect to server and check if new data available bbftp new data and update transmission status in the catalogues.
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 19
Data Transmission SW (tests)
Location Line Mbit/sec
Program
RateMbit/sec
Prevessin ->Meyrin 10 ftpbbftpbbcp
5.87.88.0
Prevessin -> Prevessin
100 ftpbbftpbbcp
21.040.042.0
Prevessin -> Milano 16 bbftp 6.0
Server and client – dual-CPU Intel PIII , Linux OS. bbftp release 2.1.2 Transmit AMS01 “raw” data and AMS01 data summary files (Ntuples)Duration 12-24h
1)
1) M.Boschini installed bbftp in INFN Milano
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 20
AMS Computing Y2001
Y2001 milestones are fulfilled
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 21
AMS Computing Y2002
Build AMS02 “ production cell ” and use it for MC production
Build AMS02 “ analysis cell ” AMS02 process and data control SW (migrate
from OpenSource Corba to the licensed version)
“bbftp” tests between MIT and CERN, GSC@MSFC and MIT/CERN
Evaluate archiving and staging system for AMS (Jan 2002 - 4TB)
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 22“ Server Node #1 ”
“Processing Node #1”
AMS Computing Y2002 (“production cell”)
“Processing Node #5”
IDE
RAID
IDE
RAID Dual-CPU AMD
Processing Nodes 1-5Dual-CPU Athlon 1900+
1GB RAM3Ware IDE Raid
6x120GB Western Digital1 Gbit/sec ethernet
2x100MBit/sec ethernet
Server Node 1Dual-CPU Xeon or PIII
1GB RAM3Ware IDE Raid
7x120GB Western DigitalIPC SCSI Raid
8x160GB WD disks1 Gbit/sec ethernet
2x100MBit/sec ethernet
1Gbit/ses AMS private
100
Mb
it/sec C
ER
N
backb
on
e
Dual-CPU Intel
Dual-CPU AMD
IDE
RAID
IDE
RAID
Analysi
s programs
SCSI
RAID
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 23
AMS Computing Y2002 (“analysis cell”)
2 dual-CPU AMD Athlon dedicated for AMS analysis and Geant4 simulation.
Architecture is similar to “AMS processing node” (but 4 channels IDE RAID controller with 4x120GB WD HDD)
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 24
Y2002 Milestones AMS computers upgrade (1Q) AMS “production cell” (1Q) AMS “analysis cell” (2Q) Data transmission tests (2Q) Evaluation of archiving and staging
systems (technical meeting with CASPUR Feb/Mar, system choice 3Q)
AMS data handling and PPCC SW, Licensed CORBA package (3Q)
V.Ch, A.K. AMS TIM, MIT, Jan 22-25 2002 25
Growth of computers and data storage in Science Operation Center