A TCP/IP transport layer for the DAQ of the CMS Experiment

12
A TCP/IP transport layer for A TCP/IP transport layer for the DAQ of the CMS Experiment the DAQ of the CMS Experiment Miklos Kozlovszky Miklos Kozlovszky for the CMS TriDAS for the CMS TriDAS collaboration collaboration CERN CERN European Organization for European Organization for Nuclear Research Nuclear Research ACAT03 - December 2003 ACAT03 - December 2003

description

A TCP/IP transport layer for the DAQ of the CMS Experiment. Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research. ACAT03 - December 2003. Collision rate. 40 MHz. Level-1 Maximum trigger rate. 100 kHz. Average event size. ­. 1 Mbyte. - PowerPoint PPT Presentation

Transcript of A TCP/IP transport layer for the DAQ of the CMS Experiment

Page 1: A TCP/IP transport layer for the DAQ of the CMS Experiment

A TCP/IP transport layer for the DAQ of A TCP/IP transport layer for the DAQ of the CMS Experimentthe CMS Experiment

Miklos Kozlovszky Miklos Kozlovszky

for the CMS TriDAS collaborationfor the CMS TriDAS collaboration

CERNCERNEuropean Organization for Nuclear European Organization for Nuclear

ResearchResearch

ACAT03 - December 2003ACAT03 - December 2003

Page 2: A TCP/IP transport layer for the DAQ of the CMS Experiment

CMS & Data AcquisitionCMS & Data Acquisition

Collision rate 40 MHz Level-1 Maximum trigger rate 100 kHz

Average event size 1 Mbyte

No. of In-Out units 1000 Readout network bandwidth 1 Terabit/s Event filter computing power 5 10 6 MIPS Data production Tbyte/day

CMS

Detector Frontend

Computing Services

Readout Systems

Filter Systems

Event Manager Builder Networks

Level 1 Trigger

Run Control

Data Data

Page 3: A TCP/IP transport layer for the DAQ of the CMS Experiment

Event builder : Physical system interconnecting data sources with data destinations. It has to move each event data fragments into a same destination

Event fragments : Event data fragments are stored in separated physical memory systems

Full events : Full event data are stored into one physical memory system associated to a processing unit

12

33

512

11 22 512512 3

512 Data sources for 1 MByte events~1000s HTL processing nodes

NxM EVB

Building the eventsBuilding the events

Page 4: A TCP/IP transport layer for the DAQ of the CMS Experiment

• Distributed DAQ framework developed within CMS.

• Construct homogeneous applications for heterogeneous processing clusters.

• Multi-threaded (important to take advantage of SMP efficiently).

• Zero copy message passing for the event data.

• Peer to peer communication between the applications.

• I2O for data transport, and SOAP for configuration and control.

• Hardware and transport independency.

OS and Device Drivers

HTTP

Ethernet Myrinet

XDAQ

Util/DDM

Processing

Sensor readout

TCP

PCI

Subjectof presentation

XDAQ Framework XDAQ Framework

Page 5: A TCP/IP transport layer for the DAQ of the CMS Experiment

• Reuse old, “cheap” Ethernet for DAQ

• Transport layer requirements – Reliable communication– Hide the complexity of TCP– Efficient implementation– Simplex communication via sockets – Configurable

• Support of blocking and non-blocking I/O

TCP/IP Peer Transport RequirementsTCP/IP Peer Transport Requirements

Page 6: A TCP/IP transport layer for the DAQ of the CMS Experiment

• Pending Queues– Thread safe PQ management– One PQ for each destination – Independent sending through sockets

• Only one “Select” function call both to receive the packet and send the blocked data.

Implementation of the non-blocking modeImplementation of the non-blocking mode

1 2 3 4 5 n1 2 3 4 5 n #2

Pending Queues

XDAQ Application

Framesend

1 2 3 4 5 n #n

Select

Page 7: A TCP/IP transport layer for the DAQ of the CMS Experiment

Receiver Object(s)

OS

XDAQ Executive

Peer TransportLayer

ptATCP

Applications (XDAQ)

ptATCPPort(s)

XDAQ Framework

Sender Object(s)

Input SAP(s) Output SAP(s)

Driver(s)

NIC (10GE)NIC (FE) NIC (GE)

= Creation of object= Sending= Receiving= other communication

Communication via the transport layerCommunication via the transport layer

Page 8: A TCP/IP transport layer for the DAQ of the CMS Experiment

Throughput optimisationThroughput optimisation

Single rail Multi-rail

App 1

App 2 App 2

App 1

• Operating System tuning (kernel options+buffers)

• Jumbo Frames• Transport protocol options

• Communication techniques

– Blocking vs. Non-Blocking I/O

– Single/Multi-rail

– Single/Multi-thread

– TCP options (e.g.:Nagle algorithm)

– ….

Page 9: A TCP/IP transport layer for the DAQ of the CMS Experiment

Test networkTest network

Cluster size: 8x8 CPU: 2x Intel Xeon (2.4 GHz), 512KB CacheI/O system: PCI-X: 4 buses (max 6) .Memory: Two-way interleaved DDR: 3.2 GB/s (512 MB)NICs: 1 Intel 82540EM GE

1 Broadcom NeXtreme BCM 5703x GE1 Intel Pro 2546EB GE (2port)

OS: Linux RedHat 2.4.18-27.7 (SMP)

Switches: 1 BATM- T6 Multi Layer Gigabit Switch (medium range)

2 Dell Power Connect 5224 (medium range)

Page 10: A TCP/IP transport layer for the DAQ of the CMS Experiment

0

20

40

60

80

100

120

140

100 1000 10000 100000

Fragment Size (Byte)

Th

rou

gh

pu

t p

er N

od

e (M

B/s

)

link BW (1Gbps)

8x8 EVB [P4 e1000 Powerconnect 5224]

32x32 EVB [P3 AceNIC FastIron8000]

Conditions:• XDAQ+Event Builder

– No Readout Unit inputs– No Builder Unit outputs– No Event Manager

• PC: dual P4 Xeon• Linux 2.4.19• NIC: e-1000• Switch: Powerconnect 5224• Standard MTU (1500 Bytes)• Each BU builds 128 events • Fixed fragment sizes

Result:For fragment size > 4 kB:• Thru /node ~100 MB/s i.e.

80% utilisation

Working point

Event Building on the cluster Event Building on the cluster

Page 11: A TCP/IP transport layer for the DAQ of the CMS Experiment

Two Rail Event Builder measurementsTwo Rail Event Builder measurements

Test case:

Bare Event Builder (2x2)• No RU inputs• No BU outputs• No Event Manager

Options:• Non blocking TCP• Jumbo frames (mtu 8000)• Two rail• One thread

RU working point (16 kB)Throughput/node = 240 MB/ si.e. 95% bandwidth

Page 12: A TCP/IP transport layer for the DAQ of the CMS Experiment

• Achieved 100 MB/s per node in 8x8 configuration (1rail).

• Improvements seen with the use of two rail, non-blocking I/O, with Jumbo frames. In 2x2 configuration over 230 MB/s obtained.

• High CPU load.

• We are also studying other networking and traffic shaping options.

ConclusionsConclusions