LFC Replication Tests LCG 3D Workshop Barbara Martelli.

14
LFC Replication Tests LCG 3D Workshop Barbara Martelli

Transcript of LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Page 1: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

LFC Replication Tests

LCG 3D Workshop

Barbara Martelli

Page 2: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Objectives of LFC Replication Tests Understand if and how the Streams replication

impacts LFC behaviour. Understand if the throughput achievable in terms of

number of entries inserted per second is suitable for LHCb needs.

Understand if the sustained rate achievable in terms of number of entries inserted per second is suitable for LHCb needs.

Mesure the delay of replication for a particular entry. Mesure the max throughput achievable in our

configuration. Mesure the max sustained rate achievable in our

configuration. Compare the read performances between present

setup and Streamed setup (hope they’ll improve with a replica).

Page 3: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

LHCb Access Pattern on LFC

At the moment LFC is used for DC06 MC production Stripping Analysis

Really difficult to estimate access pattern for the future, but we can make a snapshot of what happens today

Read access (end 2006) 10M PFNs expected, read access mainly for analisis, one average user

starts O(100) jobs. Each job contacts LFC twice: once for DIRAC optimization, once at the

aim of creating an XML POOL slice that will be used by the application to access data.

Every 15 minutes 1000 users are expected to submit jobs contacting the LFC 200 times.

24*4*1000*200 ~ 20M LFC requests for analisis. 200Hz Read Only Requests.

Write access (today) MC Production: 10-15 inserts per day DC06: About 40MB/s transfers from CERN to T1s, file size is about

100MB -> one replicated file every 3 seconds. Every 30 files processed, 2 are created.

So we can expect about 1Hz for Write Access.

Page 4: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

LFC Local Test Description (Feasibility test) 40 LFC clients, 40 LFC daemons threads,

streams pool. Client’s actions

Control if LFN exists into the database Select from cns_file_metadata

If yes -> add a sfn for that lfn Insert sfn into cns_file_replica

If not -> add both lfn and sfn Insert lfn into cns_file_metadata Insert sfn into cns_file_replica

For each lfn 3 sfn are inserted

Page 5: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

LFC Master HW Configuration Gigabit Switch Gigabit Switch

Private LH

CB

link

Priv

ate

LHC

B li

nk

rac-lhcb-01 rac-lhcb-02

Dell 224F14 x 73GB disks

ASM

Dual Xeon 3,2GHz,4GB memory2nodes-RAC on Oracle 10gR2RHEL 4 kernel 2.6.9-34.ELsmp

14 Fibre Channel disks (73GB each)HBA Qlogic Qla2340 – Brocade FC Switch Disk storage managed with Oracle ASM (striping and mirroring)

Page 6: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

LFC Slave Configuration LFC Read only replica

Dual Xeon 2.4, 2GB RAM Oracle 10gR2 (oracle RAC but used as single

instance) RHEL 3 kernel 2.4.21 6 x 250GB disks in RAID 5 HBA Qlogic Qla2340 – Brocade FC Switch Disk storage formatted with OCFS2

Page 7: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Performance

• About 75 transactions per second on each cluster node.• Inserted and replicated 1700k entries in 4 hours (118 insert per second).•Almost real-time replica with Oracle Streams without significant delays (<< 1s).

Page 8: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

• CPU load on cluster nodes is far from being saturated.

Page 9: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

CERN to CNAF LFC Replication At CERN: 2 LFC servers connected to the same LFC Master

DB Backend (single instance). At CNAF: 1 LFC server connected to the replica DB Backend

(single instance). Oracle Streams send entries from the Master DB at CERN to

the replica DB at CNAF. Population Clients: python script which starts N parallel

clients. The clients write entries and replicas into the Master LFC at CERN.

Read Only Clients: python script which reads entries from the master and from the replica LFC.

Page 10: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

LFC Replication Testbed

LFC Read-Only Server

LFC Oracle Server

Replica DB

LFC R-W Server

LFC Oracle Server

Master DB

LFC R-W Server

Population Clients

Population Clients

Oracle Streams

rls1r1.cern.ch

lxb0716.cern.chlxb0717.cern.ch

Read Only Clients

lfc-streams.cr.cnaf.infn.it

lfc-replica.cr.cnaf.infn.it

WAN

Page 11: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Test 1: 40 Parallel Clients 40 parallel clients equally divided between the two LFC master

servers.

Inserted 3700 replicas per minute during the first two hours.

Very good performance at the beginnig, but after few hours the master fall into a Flow Control state.

Flow Control means that the master is notified by the client that the update rate is too fast. Master slows down to avoid Spill Over at client side.

Spill Over means that the buffer of the Streams queue is full, so Oracle has to write the entries into the disk (persistent part of the queue). This decreases performances.

Apply side of Streams replication (slave) is usually slower than the master side, we argue that is necessary to decrease the insert rate to achieve good sustained performance.

Page 12: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Test 2: 20 Parallel Clients 20 parallel clients equally divided between the two

LFC master servers. Inserted 3000 replicas per minute, 50 replicas per

second. Apply parallelism enhanced: 4 parallel apply

processes on the slave. After some hours the rate decreases, but reaches a

stable state at 33 replicas per second. Achieved sustained rate of 33 replicas per second. No flow control on the master has been detected.

Page 13: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Replicated Entries

0

10

20

30

40

50

60

1 2 3 4 5 6 7

Day

Hz SFNs

Page 14: LFC Replication Tests LCG 3D Workshop Barbara Martelli.

Conclusions Even this test setup is less powerful than

the production one, sustained insertion rate is even higher than LHCb needs.

Need to test read random access to understand if and how the replication impacts the response time.

Could be interesting understand which is the best replication rate achievable whith this setup, even if not requested by the experiments.