4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
-
Upload
garry-stevens -
Category
Documents
-
view
218 -
download
0
Transcript of 4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
4/19/2002 4
Why work with TCP?
Over 85% on internet traffic is TCP based Internet is growing TCP is a proven reliable transport for data
delivery Provide high speed active networks the ability
work with TCP flows
4/19/2002 5
Why not implement a full TCP stack in hardware? Complex protocol stack Several interactions on client interface
(sockets?) Difficult to achieving high performance Large memories required for reassembly Limited number of simultaneous connections
4/19/2002 6
Solution
Develop TCP flow monitor - TCPSplitter Utilize existing hardware infrastructure (FPX) Expand upon Layered Protocol Wrappers
4/19/2002 11
FPX Internal Structure
SRAM
EC
Mo
du
le
EC
NID
Switch LineCard
Mo
du
le
RAD
VC VC
VCVC
RAD
Program
SDRAM SDRAM
Data Data
SRAM
Data
SRAM
Data
RAD: Reprogrammable Application Device
•Xilinx XCV1000E FPGA
•External SRAM/SDRAM
•Reprogrammable
NID: Network Interface Device
•XCV600E FPGA
•Controls FPX
•Programs RAD
•Forwards traffic
4/19/2002 13
Goals
High Speed Design Small FPGA Footprint Simple Client Interface Support Large Number of Flows
4/19/2002 14
Challenges
Dealing with dropped frames Packet reordering Maintaining state for large number of flows Developing an efficient implementation Processing data at line rates Minimizing resource requirements
4/19/2002 15
Assumptions/Limitations
All frames must flow through switch Frames traversing in opposite direction
handled as separate flow In-order processing of frames for each flow
4/19/2002 16
TCPSplitter Data Flow
IP Wrapper
Client Application
TCP SplitterIP frames IP frames
Byt
e S
trea
m
4/19/2002 17
Input Processing
Flow Classification TCP Checksum Engine Input State Machine Control FIFO Frame FIFO Output State Machine
4/19/2002 18
LayoutTCPProc
TCPInput
Frame FIFO
Input State Machine
Checksum Engine
Flow Classifier
Ou
tpu
t S
tate
Ma
chin
e
Control FIFO
TCPOutput
Packet Routing
ClientApplication
IP InputIP Output
4/19/2002 19
Packet Routing Decisions
Forward to outbound IP stack only Forward to both Client App and outbound
IP stack Discard packet
4/19/2002 20
Packet Routing
Non-TCP packets IP stack Invalid TCP checksum drop TCP SYN packets IP stack (Seq # < Expected Seq #) IP stack (Seq # > Expected Seq #) drop Else client AND IP stack
4/19/2002 21
Client Interface
1 bit Clock 1 bit Reset 32 bit Data Word 1 bit Data Enable 4 bit Start/End of Data Signals 2 bit Valid Data Bytes N bit Flow Identifier 2 bit Start/End of Flow Signals 1 bit TCA
ClientApplication
4/19/2002 23
Possible Application 1
Simultaneous update of multiple active network nodes
ProgrammerConnection
EndpointSwitch A Switch B Switch C Switch D
4/19/2002 24
Possible Application 2
Dynamic loading of customizable QoS algorithms
Switch
F
A
D
C
B
E
Source
Destination
4/19/2002 25
Possible Application 3
Monitoring content of all TCP flows for security
Internet
Workstation Workstation Workstation
WorkstationWorkstation
Switch
4/19/2002 27
Synthesis Results for Xilinx XCV1000E-7TCPSplitter Full Wrappers
(Cell + Frame + IP + TCP + Client)
Space/LUTs 617 (2%) 4954 (20%)
Register bits 503 (2%) 4933 (20%)
Input processing delay
7 clock cycles * 44-68 clock cycles *
* Plus length of packet in 32 bit words
4/19/2002 29
Current State of Research
Developed and simulated design Handles 256 simultaneous flows
33 bits * 256 entries = 1,056 bytes Synthesizes at 74MHz Simple test client counts TCP data bytes
4/19/2002 30
Future Directions
Execute design in hardware Increase the number of simultaneous flows
262,144 flows require only 1 MByte (+) RAM Develop more elaborate client applications Improve processing performance Implement sliding window – passive solution Enhance frame generation utility for
simulations
4/19/2002 32
Conclusion Runs on reconfigurable hardware platform Process packets at Gigabit line rates Monitors all TCP flows Generates proper byte stream for each flow Requires only minimal memory (33 bits/flow) Simple client interface demonstrated