Enabling a connection-oriented internet

47
1 Enabling a connection- oriented internet • Outline • Background • CHEETAH • Testbed • Software Applications: GridFTP + Ensight • Security Extension of work to connection-oriented internet Networking research problems Malathi Veeraraghavan Univ. of Virginia [email protected] Talk at the BNL, May 16, 2005

description

Enabling a connection-oriented internet. Outline Background CHEETAH Testbed Software Applications: GridFTP + Ensight Security Extension of work to connection-oriented internet Networking research problems. Malathi Veeraraghavan Univ. of Virginia [email protected]. - PowerPoint PPT Presentation

Transcript of Enabling a connection-oriented internet

Page 1: Enabling a connection-oriented internet

1

Enabling a connection-oriented internet

• Outline• Background• CHEETAH

• Testbed• Software• Applications: GridFTP + Ensight• Security

• Extension of work to connection-oriented internet• Networking research problems

Malathi Veeraraghavan Univ. of Virginia

[email protected]

Talk at the BNL, May 16, 2005

Page 2: Enabling a connection-oriented internet

2

Team & Acknowledgment

• Team (PI/Co-PIs):– Malathi Veeraraghavan, Univ. of Virginia– Nagi Rao, Bill Wing, Tony Mezzacappa, ORNL– John Blondin, NCSU– Ibrahim Habib, CUNY

• UVA funding sources:– NSF EIN grant ANI-0335190 – NSF ITR small grant ANI-0312376– DOE FG02-04ER25640

Page 3: Enabling a connection-oriented internet

3

UVA team members

• Postdocs: – Xuan Zheng– Haobo Wang

• Graduate students– Xiuduan Fang (GridFTP/PVFS)– Zhanxiang Huang (MPLS)– Tao Li (testing)– Anant P. Mudambi (new transport protocol: FRTP)– Xiangfei Zhu (RSVP-TE)

Page 4: Enabling a connection-oriented internet

4

eScience application reqmts. for the network

• Our eScience partner: TSI project– High bandwidth end-to-end for

terabyte sized file transfers– End-to-end QoS assurance

• remote visualization• remote computational steering

TSI: Terascale Supernova InitiativeQoS: Quality of Service

Page 5: Enabling a connection-oriented internet

5

“Communications” answer(no networking)

• If – number of end points is small – costs are not prohibitive

• Purchase high-speed communication links end-to-end

• Problems to solve:– Limited to end hosts– Disk limitations, bus speed limits, etc.

Page 6: Enabling a connection-oriented internet

6

Important questions for networking

• How many end users/hosts need access to the network?

• Where are they located?• Role of networking: enable sharing

• Answers for HEP community– 300-400 physicists in the US– Located in 30 univs/labs– 2000 physicists from 150

institutions world-wide

Page 7: Enabling a connection-oriented internet

7

Networking community’sanswers to eScience needs

• Use existing networks (Internet2/ESnet); upgrade links

• Improve TCP by only changing the software at the end hosts– High-speed TCP, Scalable TCP,

FAST– Parallel TCP (e.g. GridFTP)

• Pros:– Easy to deploy

• Cons: – End-to-end QoS assurance

• Deploy new networks (switches) while upgrading links– Connection-oriented– e.g., Cheetah, Dragon,

Canarie’s CA*net4– Complements type of service

offered by Internet2• Pros:

– End-to-end QoS• Cons:

– Cost

Page 8: Enabling a connection-oriented internet

8

A little background on networking

• What’s a network (wide-area)?– A bunch of communication links

interconnected by network SWITCHES

• Purpose of a network switch:– enable sharing of communication link

resources

End hostGenerate/store data that needs to be moved

End host

End host End host

Directcommunicationlinks:doesn’t scaleSWITCH

Page 9: Enabling a connection-oriented internet

9

The fun in networks research: Type of sharing

• Connectionless (CL) switch (packet switch)– No explicit requests to reserve bandwidth

prior to data transfer– All control for bandwidth sharing is

implemented at end hosts – TCP software• TCP keeps increasing the rate at which it sends

packets• Congestion detected• Rate of sending packets dropped • Slowly starts increasing rate

Page 10: Enabling a connection-oriented internet

10

TCP at end host controls sending ratesending rate increases

sending rate decreased

Connectionless packet switch

• Bandwidth share given to one flow keeps varying depending on how many other flows join and leave within the first flow’s duration

• Socialistic sharing• Fair but hard to provide rate guarantees

Page 11: Enabling a connection-oriented internet

11

A second type of sharing

• Connection-oriented (CO)– Request a reservation

• If accepted, can be guaranteed the assigned rate– Release reservation when done– Note: uncertainty in whether BW request will be

granted

Connection-oriented switch (packet or circuit)

Bandwidth manager

Connection-oriented switch (packet or circuit)

Bandwidth manager

BW-request BW-request

Distributed bandwidth managementScales to large networks

Complete bandwidthon a link used for oneconnection

Multiplexed: lanes on a highway

Page 12: Enabling a connection-oriented internet

12

CHEETAH (Circuit-Switched High-speed

End-to-End Transport Architecture)

• Falls in the second category– deploying a new network– connection-oriented flavor of sharing

• Meets TSI needs– High-bandwidth connections for file

transfers– End-to-end QoS for remote visualization

• Cons– High cost

Page 13: Enabling a connection-oriented internet

13

CHEETAHTopology & equipment

Raleigh PoP (MCNC)(Sycamore SN16000)

Atlanta PoP (SoX/SLR)(Sycamore SN16000)

Ethernetswitch

Hosts5 GbEs

Enterprise networks

NCSU

Ethernetswitch

Hosts

GbE

OC192(NLR, SLR)

G. Tech

OC192

ORNL PoP(Sycamore SN16000)

Control

Gb/s and 10Gb/sEthernetinterfacecards

Time-divisionmultiplexing optical interfacecard

bandwidth manager: dynamic distributed sharing

Hosts

Ethernetswitch

Maps GbE to equivalent SONET circuit

OC192

ToUltraScience Net

Page 14: Enabling a connection-oriented internet

14

CHEETAH concept

• Use off-the-shelf circuit-based gateways– that support GMPLS routing and signaling protocols for

dynamic circuit setup/release– enables the creation of large-scale shared CO networks

• It is not a standalone network– Leverages the presence of connectionless IP service

(host-to-host IP connectivity; DNS)• Implement cheetah software to run on end hosts• Integrate with host applications

– applications generate requests for bandwidth as needed– SHORT-LIVED: increase sharing– Hold circuit for a few seconds/minutes and release

Page 15: Enabling a connection-oriented internet

15

Cheetah solution leveraging the presence of the Internet

• Use second NICs at hosts for circuit connectivity leaving primary NIC for Internet access

Connectionless Internet

Connectionless Internet

End host I

End host II

Circuit-Switched Network

Circuit-Switched Network

• Attempt circuit setup• If rejected, fall back to

using TCP/IP

Should we attempt a circuit setup for ALL file transfers?

Two paths available

Or is there a crossover file size below which we use the TCP/IP network and above which we attempt a circuit setup?

Page 16: Enabling a connection-oriented internet

16

Two metrics: delay and utilization

• For most regions of operation on 1Gbps circuits– in wide-area scenarios (50ms prop.

delay)• delay: crossover size ~10KB• utilization: crossover size is ~50MB

– in local-area scenarios (1ms prop. delay)• delay: crossover size is 1s to 10s of MBs • utilization: 1s of MBs

Page 17: Enabling a connection-oriented internet

17

Cheetah software on end hosts

TCP NIC I

NIC II

FRTPPrimary TCP/IP path

End-to-end CHEETAH circuit

GridFTPWeb

server

Remote viz.(Ensight)

Signalingclient

End-host CHEETAH software

Routing decision

DNS lookup

Applications

DNS query(to check if far end

host is also on cheetah)Routing decision

to check whether to use the TCP/IP path or attempt a cheetah

circuit setupSignaling client

to request a circuit

Fixed-Rate Transport Protocol (FRTP)

designed for circuits

Page 18: Enabling a connection-oriented internet

18

Transport protocol problem

• Variability in sender: – other processes (e.g. matlab) + disk access (disk head location)

• Variability in receiver: if buffer not emptied out, data loss occurs

Networkprotocols

networkcard

networkcard

Filesystem

Filetransfer

Matlab

kernel

user space

Circuit-switchednetwork

Networkprotocols

Filesystem

FiletransferMatlab

Page 19: Enabling a connection-oriented internet

19

Effects of mismatch in nature of circuits and nature of

hosts • Choose a high circuit rate and receive

buffer can fill up if circuit rate is not matched to receive rate – impacts delay + utilization

• Choose a low circuit rate and delay will be higher than necessary

• If sending rate is not matched exactly with circuit rate– circuit lies idle; utilization impacted

Page 20: Enabling a connection-oriented internet

20

Transport protocol for end-to-end dedicated circuits

• Requirements & solution:– No contention for bandwidth resources in network during

user data flow (bandwidth already reserved)• No congestion control

– Contention at end-hosts due to multitasking• Flow control: null or window based

– Reliable transfer: error control• Detect/recover from drops in receiver buffer

– High circuit utilization• Keep sending rate fixed to match circuit rate• Hence the name Fixed-Rate Transport Protocol (FRTP)• Receive rate selection important

– Disk-to-disk transfer• FRTP module is handed a file descriptor instead of a buffer

location in main memory

Page 21: Enabling a connection-oriented internet

21

FRTP Implementation I

• Null flow control– data blocks can get dropped at receiver

• disk access variability and multitasking

– recover through retransmissions

• Implementation in user-space – Opens UDP and TCP sockets– UDP data channel on unidirectional dedicated circuit – TCP control channel on primary Internet path

• Modified SABUL code

Page 22: Enabling a connection-oriented internet

22

FRTP Implementation I (cont.)

• SABUL implementation:– Uses busy-wait to maintain fixed low inter-

packet times (to achieve fixed sending rate)– Drawback: high CPU utilization

• Modified to:– send a burst of packets periodically

• set a periodic timer; when process gets a signal indicating timer expiry, send a batch of packets

– use data link layer flow control (i.e., Ethernet PAUSE) to prevent bursts

Page 23: Enabling a connection-oriented internet

23

FRTP Implementation II

• Window-based flow control– prevents data blocks from being dropped at

receiver– due to disk access variability and

multitasking• Implementation in kernel-space • Uses Web100/Net100 code

Page 24: Enabling a connection-oriented internet

24

Modifications• Web100/Net100

– Implement TCP– Added hooks to tune parameters at run-time

• FRTP usage of this code/modifications– Used tuning capability to set:

• initial ssthresh to the Bandwidth Delay Product– using fixed circuit rate for bandwidth

– Made code modifications to set:• additive increase (AI), multiplicative decrease (MD) factors to 0

(sending rate will not change in congestion avoidance)– Sending rate increases to the circuit rate and stays there

• Added advantage: – TCP’s self-clocking is a pretty good way to maintain fixed

sending rate

Page 25: Enabling a connection-oriented internet

25

Disk-to-disk transfer requirement

• Sender side actions: – read() system call:

• move a block from disk to user space memory– send() system call:

• write the block to network socket– sendfile(): reduces # of copies and # of system calls

• Receive side actions:– open the file with the O_LARGEFILE flag – calibrate disk write rate limits– select file system (xfs, pvfs)– if multitasking receiver, use RT schedulers to

schedule disk write thread to match circuit rate

Page 26: Enabling a connection-oriented internet

26

GridFTP application

• Disk considerations– Hardware solution: RAID striping

• expensive solution

– Split large file into small files and store small files on disks of different hosts in a cluster

• not user-friendly

– GridFTP striping with PVFS2 - striping across disks of different hosts of a cluster

• best solution, but both GridFTP and PVFS2 code need modifications to use on dedicated circuits

Page 27: Enabling a connection-oriented internet

27

• Equip host with a fast CPU, a RAID controller and disk array and a 10Gbps NIC

Data blocks

Hardware solution

1 2 3 4 5 6 7 8 9 10

RAID 0 with 5 disks (Striping)

1 2 3 4 5

6 7

8 9 10

… … … … …

Page 28: Enabling a connection-oriented internet

28

File splitting

File Splitter

GridFTP

GridFTP

GridFTP

GridFTP

GridFTP

Original file

File partition1

Cray

Zelda1

File partition2

File partition2

File partition2

File partition2

Zelda2

Zelda3

Zelda4

Zelda5

GridFTP

GridFTP

GridFTP

GridFTP

GridFTP

File partition1

File partition2

File partition2

File partition2

File partition2

Orbitty1

Orbitty2

Orbitty3

Orbitty4

Orbitty5

File Assembler assembled

file

Head NodeORNL NCSU

Page 29: Enabling a connection-oriented internet

29

Implementation of file splitting

• Use GridFTP partial transfer feature– But disk space allocated on each host needs

to equal the whole file size• Without GridFTP partial transfer

– Wrote C programs to partition and assemble the file

– Use any file transfer tool to transfer the partitions (which are distinct files) in parallel

– Tools with third party transfer utility desirable

Page 30: Enabling a connection-oriented internet

30

• PVFS2 (Parallel Virtual File System)– Three kinds of roles for nodes in PVFS2

• Compute node/client: on which applications are run• Metadata server: handles metadata operations• I/O nodes/server: stores file data for PVFS2 file

systems– Stripes a file across multiple servers like

RAID0• But

– The latest version 1.0.1 does not provide any specific utility to inspect data distribution

– The pvfs2-cp tool ignores the –s option for configuring striped size

GridFTP striped transfer over PVFS2

Page 31: Enabling a connection-oriented internet

31

Our work on PVFS2

• Determined how PVFS2 stripes files across hosts– Using strace command that provides a trace of systems

calls • Analyzed how the file is striped

– Utility pvfs2-fs-dump gives the order of I/O servers used for file distribution (order obtained from config. file)

• Change pvfs2 code:– PVFS2 stripes files starting with a random server (done

with PINT_cached_config_get_next_io() function call in file src/common/misc/pint-cached-config.c) jitter = (rand() % num_io_servers);

– Change it into jitter = -1 to get a fixed order of data distribution

– Change the default striped size (original: 64KBytes)

Page 32: Enabling a connection-oriented internet

32

ControlControl

globus-url-copyMode E

SPAS (Listen)

- returns list of host: port pairs

STOR <FileName>

Mode E

SPOR (Connect)

- connect to the host-port pairs

RETR <FileName>

Host A1Block 1

Block 4

Block 1

Block 4

But the current GridFTP does not work in this ideal way. The data channel connections between the sending and receiving sides are arbitrary because the processing of SPAS and SPOR commands is nondeterministic.

• Does not match with the dedicated circuit model• Code being modified

Block 2

Block 5

Block 2

Block 5

Host X2Host A2

Host X1

Page 33: Enabling a connection-oriented internet

33

Security: control plane

• Cannot have arbitrary end hosts send bandwidth request messages to network switch– Place VPN server/firewall in front of

SN16000’s control port • provides some DDoS attack protection (ns5)

– Establish IPsec tunnels between each host and SN16000 through VPN server

• Openswan software at Linux end hosts

– Establish IPsec tunnels between SN16000s

Page 34: Enabling a connection-oriented internet

34

Security: data plane

• GridFTP security– Between client and server

• Can use ssh, ssl, ipsec between end hosts connected on cheetah circuit

Page 35: Enabling a connection-oriented internet

35

Back to outline

• Outline• Background• CHEETAH

• Testbed• Software• Applications• Security

Extension of work to connection-oriented internet

• Network research problems

Talk at the BNL, May 16, 2005

Page 36: Enabling a connection-oriented internet

36

Networking community’sanswers to eScience needs

• Use existing connectionless networks with improved TCP

• Deploy new connection-oriented networks– (e.g., cheetah)

• Enable connection-oriented service in already deployed switches– MPLS– VLANs

• Spend money upgrading links

Page 37: Enabling a connection-oriented internet

37

Extend CHEETAH concept and apps to connection-oriented internet

• On Internet2/ESnet: deployed IP routers (Cisco and Juniper) have:– MPLS capability (with RSVP-TE)– Connection-oriented service – Can map packets from one flow (five-tuple) to a

reserved MPLS tunnel

• Within LANs:– Ethernet switches have IEEE 802.1q VLAN capability– Ingress rate shaping– With external control software, can make these

switches operate in connection-oriented mode

Page 38: Enabling a connection-oriented internet

38

Many advantages to this approach

• Already deployed (“just” enable!)• Bandwidth granularity can be low

– improves bandwidth utilization

• Allows for sharing of link bandwidth between CL and CO traffic

Page 39: Enabling a connection-oriented internet

39

CO “internet”

• Because heterogeneous connections will be needed at least in the short-term– MPLS segments (Label Switched Paths)– VLAN segments (within enterprises)– SONET circuits (popular in commercial

world)– WDM lightpaths (research testbeds)

Page 40: Enabling a connection-oriented internet

40

Back to outline

• Outline• Background• CHEETAH

• Testbed• Software• Applications• Security

• Extension of work to connection-oriented internet

Network research problems

Talk at the BNL, May 16, 2005

Page 41: Enabling a connection-oriented internet

41

Networking research problems

• Bandwidth sharing modes– Low load performance – Scheduled vs. immediate-request– Multi-level problem– Partial-path reservations– Fairness

Page 42: Enabling a connection-oriented internet

42

Fixing the bandwidth for the transfer could be a bad thing: low load problem

• Varying bandwidth list scheduling algorithm– uses knowledge of file size to make varying

bandwidth allocations for transfer– catch: requires circuit switches to be reprogrammed

multiple times within lifetime of a transfer (circuit)

Capacity CPacketSwitch

1

.

.N

2

3

Each transfer gets C/N capacity

1

23

NThe lone remaining transfer enjoys

full capacity C

Capacity CCircuitSwitch

1

.

.N

2

3

Each transfer is allocated C/N capacity

1

23

N

The lone remaining transfer continueswith capacity allocation C/N

Page 43: Enabling a connection-oriented internet

43

Scheduled vs. immediate-request calls

Session type requests:• long holding times (2 hours)• specific rate• remote visualizations• scientists participate in sessions• best served with an advance reservation

File transfer requests:• file sizes provided not holding times• max rate specified but any rate can be allocated• scientists not involved; just computers

Small files (e.g. 1 GB on 1 Gbps takes 8 sec)• should be handled in immediate-request mode

Large files (e.g. 1 TB on 1 Gbps takes 2.2 hours)• should be handled in scheduled mode• should we allocate 10Gbps and finish in 800 sec?

• immediate-request? or scheduled?• depends on m, the number of 10Gbps circuits

Page 44: Enabling a connection-oriented internet

44

Multi-level problem

• A new problem: not yes/no but how much?– Real-time (interactive) audio-video

applications generate data at a certain rate (constant or variable)

• implication: application requests the required bandwidth from the network, and answer is binary (accept or reject); multiple classes

– File transfers: “any” bandwidth that the network can provide could be acceptable

• implication: application requests a MAX bandwidth, but the answer can be multi-level

Page 45: Enabling a connection-oriented internet

45

Partial-path reservations

• Peel off bandwidth (partial-path reservation)• Put back for CL traffic use when done

Enterprise EnterpriseWide-area network(Abilene, ESnet backbone)

Enterprise

Enterprise

GbE insideenterprises

GbE insideenterprises

10Gbps in WANs

Typicallythe access linkis the bottleneck

Set up MPLS tunnelsdynamically forindividual flowson bottleneck links

Page 46: Enabling a connection-oriented internet

46

Fairness

• Call admission algorithms– Use Markov Decision Process (MDP) tools to

balance fairness and overall throughput– Long-path and short-path calls– Large files (high-BW; high holding time) and

short files (low-BW; low holding time) calls– Multi-level answer rather than binary

accept/reject– CO traffic vs. CL traffic

• Both with Fixed bandwidth and Varying bandwidth

Page 47: Enabling a connection-oriented internet

47

Conclusions

• End-to-end dedicated connections appear to be the right answer for many eScience applications– But, many networking problems need to be

solved to achieve cost reduction through scaling

• Utilization concerns: bandwidth sharing + FRTP

• Specific concerns of TSI: TB file handling – PVFS2 and GridFTP

• Web site: http://cheetah.cs.virginia.edu