Initial Data Access Module & Lustre Deployment Tan Li.

14
Initial Data Access Module Initial Data Access Module & Lustre Deployment & Lustre Deployment Tan Li Tan Li
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Initial Data Access Module & Lustre Deployment Tan Li.

Initial Data Access Module & Initial Data Access Module & Lustre DeploymentLustre Deployment

Tan Li Tan Li

2

Outline

• Disk I/O test for netqos03 and netqos04

• Initial design for file I/O module Data read with different function and buffer size Data read with fread() with different waiting time and buffer size Some conclusions

• Intro to Lustre setup

• Lustre deployment for the new servers

3

Initial Design for Data Access Current data access module (Block size: 100K, 1M, 10M,100M, 500M for 100G file)

4

Initial design for file I/O module1. Head file: ftp_io.h2. Date access functionsint ftp_open(char *path, int block_size, int mode);int ftp_read(int infile_fd, char *out_buf, int block_size);int ftp_write(int outfile_fd, char *in_buf, int block_size);int ftp_close(int close_fd, int block);Usage of ftp_open(): Block size passed to the function in order to decide the

open method (open, fopen or open with O_DIRECT), and the close method of ftp_close should accord with the ftp_open. mode=0 is open for read, and mode=1 is for write

5

Initial design for file I/O module

6

Initial design for file I/O module

7

Initial design for file I/O module

Block size > 400K?

open/fopen (Read only)

open with O_DIRECT(Read only)

NoYes

Mode=0 or 1

Mode=0 or 1

Return the file descriptor

open with O_DIRECT(Write only)

open/fopen (Write only)

8

Initial design for file I/O module Problem with O_DIRECT when write data

When write data with O_DIRECT, the block should be the multiple of 512 Byte on our platform. So, we will have problem to write the last few bytes of the file.

Possible solution: 1. using the regular write() to output the remaining data. 2. Integrate open function into the read and write function

9

Data reading test on fread()1. Test result by the time tool of linux2. Test result by nmon (recording data every two secs)

10

Data reading test on fread() Some Conclusions

The bandwidth grows with the increment of buffer size, especially when the buffer size change from 100K to 1000K(3 times).

The bandwidth is not sensitive to the wait time until it reach some threshold. And the larger the buffer size is, the bandwidth is less sensitive to the delay.

The CPU utilization is 0% when the buffer size is below 100K. And it grows with the increase of buffer size.

11

IWARP and Infiniband

Infiniband IWARP

Hardware Specialized I/O structure A set of mechanisms over Ethernet that

moving data management and network protocol

processing to the RNIC card

Transport method point-to-point end to end

Compatibility fully compatible with existing Ethernet

switching

specialized infrastructure

Vendors A broad range of vendors

Only two: Mellanox and QLogic

12

RoCEE RoCEE = Infiniband over Ethernet(IBoE)

RDMA over Converged Enhanced Ethernet (RoCEE) protocol proposal, is designed to allow the deployment of RDMA semantics on Converged Enhanced Ethernet fabric by running the IB transport protocol using Ethernet frames.In other words, to take the InfiniBand transport layer and package it into Ethernet frames, instead of using the iWARP protocol for Ethernet-based high-performance cluster networking.

13

RoCEE Problem 1: IWARP has already leveraged the performance

benefit of RoCEE Problem 2: hard to implement. Problem 3: the RoCEE is dependent on the deployment of

10GbE CEE infrastructure; currently only one vendor (Cisco) offers CEE switches, which are at relatively high price points.

14

Thanks & Questions