TCP STREAM PROCESSING AT GIGABIT LINE RATES David Vincent Schuehler Dissertation Defense Washington...
-
Upload
morris-davidson -
Category
Documents
-
view
212 -
download
0
Transcript of TCP STREAM PROCESSING AT GIGABIT LINE RATES David Vincent Schuehler Dissertation Defense Washington...
TCP STREAM PROCESSINGAT GIGABIT LINE RATES
David Vincent SchuehlerDissertation Defense
Washington University in St. LouisDepartment of Computer Science and Engineering
November 3, 2004
TCPProcessor
HARDWARE CIRCUIT
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
2
Outline
• Motivation and Background
• Architecture and Related Work
• Live Internet Traffic Processing
• Conclusion and Future Work
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
3
Motivation
• Inspect data moving through networks• Enable application level data processing• Secure networks
– Safeguard confidential data
• Detect and prevent intrusions– Worms, viruses, spam, espionage
• Mitigate denial of service attacks• Characterize and analyze network traffic• Operate at multi-gigabit data rates
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
4
Transmission Control Protocol
Source Destination
Network Data Packets Moving Through Network
Data Payload IP HdrTCP Hdr
Layout of Single Packet
Payload Header
• 86% to 90% of all Internet traffic uses TCP– Web, email, file transfer, remote login, secure communications
• Provides virtual bit pipe between two end systems– Retransmission services– Data reordering services– Flow control services– Congestion avoidance services
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
5
Internet
Cell phone Cellular tower
Municipality
Hand heldcomputer
Satelliteuplink
Laptop
Computer
UNIVERSITY
Government Agency
Corporation
University
Computer
ComputerComputer
InternetServiceProvider
G
C
Gateway router
Core router
G
C
C
C
G
G
C
C
G
C
G
G
GG
C
C
C
C
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
6
Internet
Cell phone Cellular tower
Municipality
Hand heldcomputer
Satelliteuplink
Laptop
Computer
UNIVERSITY
Government Agency
Corporation
University
Computer
ComputerComputer
G
C
Gateway router
Core router
G
C
C
G
G
C
G
G
CC
SPAMVIRUSINTRUSION
C
C
G
G
G
C
C
C
InternetServiceProvider
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
7
Internet
Cell phone Cellular tower
Municipality
Hand heldcomputer
Satelliteuplink
Laptop
Computer
UNIVERSITY
Government Agency
Corporation
University
Computer
ComputerComputer
G
C
Gateway router
Core router
G
C
C
G
G
C
G
G
CC
SPAMVIRUSINTRUSION
C
C
G
G
G
C
C
C
InternetServiceProvider
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
8
Cost of Internet Attacks
YearEconomic Impact Worldwide (mi2g’04)
Representative Attacks (cost)
2003 $236 Billion Sobig.F ($2B)
Blaster ($1.3B)
Slammer ($1.2B)
2002 $118 Billion KLEZ ($9B)
Bugbear ($950M)
2001 $36 Billion Nimbda ($635M)
Code Red ($2.62B)
SirCam ($1.15B)
2000 $26 Billion Love Bug ($8.75B)
1999 $20 Billion Melissa ($1.10B)
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
9
Economic Damage Estimate
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
10
Design Requirements• Architecture that is fast
– Hardware-based system– High-performance (multi-gigabit networks)– Per-flow context storage & retrieval
• Architecture that is scalable– Performance improves with advances in technology
• In-line traffic processing model• Implementation using reasonable resources
– FPGA implementation can be done in research lab• Framework that is flexible
– Integrates with multiple applications– Multi-device coordination of TCP stream processing
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
11
Outline
• Motivation and Background
• Architecture and Related Work
• Live Internet Traffic Processing
• Conclusion and Future Work
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
12
TCP-Processor Architecture
TCPProcessing
Engine
InputBuffer
State Store Manager
PacketRouting
Egress
Data Processing Circuit
TCP Processing Architecture
Stats
Off-Chip Memory
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
13
TCP Processing Engine
Frame FIFO
State Store Manager
TCP Processing Engine
Inpu
t S
tate
Mac
hine
ChecksumEngine
Out
put
Sta
te M
achi
ne
TCP State Processing
Flow Hash Computation
Control&
StateFIFO
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
14
Challenges and Design Choices• Performance
– Operate at multi-gigabit data rates– Hardware-based design exploiting pipelining and parallelism
• Flow classification– Open addressing hash with limited bucket sizes
• Context storage and retrieval– Requires memory read and write for each packet– 64-byte per-flow context - use burst read/write operations
• Reassembly of out-of-order packets– Multiple processing modes (guaranteed and passive)
• TCP processing– Flow monitoring instead of flow termination
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
15
Link Speeds and Packet Rates
Link Type Data rate40 byte
pkts/sec64 byte
pkts/sec500 byte pkts/sec
1500 byte pkts/sec
OC-3 155 Mbps .48 M .3 M 38 K 12 K
OC-12 622 Mbps 1.9 M 1.2 M .16 M 52 K
GigE 1.0 Gbps 3.1 M 2.0 M .25 M 83 K
OC-48 2.5 Gbps 7.8 M 4.8 M .63 M .21 M
OC-192
10 GigE
10 Gbps 31 M 20 M 2.5 M .83 M
OC-768 40 Gbps 125 M 78 M 10 M 3.3 M
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
16
Systems with TCP Processors• Load balancing systems
– Content (cookie) based request routing– Delayed binding technique– Limited to scanning start of flow
• TCP offload engines– Move TCP protocol processing to NIC– Targeting Gigabit NIC market– Intel, NEC, Adaptec, Lucent, and others
• SSL Accelerators– Offload encryption/decryption– Protocol translation
• Intrusion Detection Systems– Traffic Rates < 1Gbps – Perform content scanning and some stream reassembly
LOAD
BALANCER
SYN
SYN ACK
ACK
Request
SYN
SYN ACK
ACK
Request
ResponseResponse
END
USER
WEB
SERVER
SSL
ACCELERATOR
SYN
SYN ACK
ACK
Request
SYN
SYN ACK
ACK
Request
Response
Response
END
USER
WEB
SERVER
Encrypted Not Encrypted
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
17
Related Work in TCP Processing• Software-based TCP processing
– Ethereal, tcpdump, etc – require post processing– Snort w/TCP option – larger virtual packets– Cluster-based online monitoring system (Mao: WIDM’01)– Bro – rule based processing (Paxson: Computer Networks’99)– STAT/STATL – state based processing (Vingna: DISCEX’00)– Intel – Xeon as packet processor (Regnier: HotI’03)
• Hardware-based TCP processing– Georgia Tech – 1 flow/circuit (Necker: FCCM’02)– University of Oslo – 1 flow/ circuit (Li: FPL’03)
– Indiana University and Imperial College – Netflow statistics– University of Tokyo – multi-flow stream scanning (Sugawara: FPL’04)– Intel TCP processor – 8k connections, 9Gbps (Xu: HotChips’03)
• Network processors– Intel IXP 1200, 2400, 2800, 2850– Motorola PowerQUICC
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
18
Taxonomy of Packet ProcessorsD
ata
Rat
eX
Con
text
Rec
ords
Hardware
Snort w/TCP option
TCP-Processor
BRO/STATL
IP LookupPacket Forwarding
Other FPGATCP Processors
TCPTermination
NetworkProcessors
PacketCapture
Load BalancerSSL Accelerator
Software based systems
Store little or no state
Intel projects
TCP-Processor
Software
ExperimentalTCP Processor
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
19
Multi-Device Coordination
• Encodes interface signals
• Regenerates waveforms on separate device
• Provides extensible format & self describing structure
TCPProcessing
Circuit
DataProcessing
Circuit1
Device 1 Device 2
DataProcessing
Circuit2
Device 3
Encode Decode Transport
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
20
Place & Route Results
• Including Protocol Wrappers & Encoder/Decoder • Target Xilinx Virtex XCV2000E-8• FPX Platform• Number of BLOCKRAMs
– 95 out of 160 (59%)
• Number of SLICEs– 7279 out of 19200 (37%)
• Maximum clock frequency: 85.565MHz• Maximum data throughput: 2.7 Gbps• Maximum packets per second: 2.9M packets/sec
– Min 29 clock cycles per packet (345 ns)– Throughput limited by memory latency
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
21
Content Scanning
PC100SDRAM
ZBTSRAM
addr
D[64]
addr
D[36]
Xilinx XCV2000E FPGA
TCPDecode
TCPEncode
Ctl CellProcessor
Cell Wrapper
Frame Wrapper
IPWrapper
CTLProc
Scan Circuit
StateStore
quer
yst
ate
upda
test
ate
PC100SDRAM
ZBTSRAM
addr
D[64]
addr
D[36]
Xilinx XCV2000E FPGA
Cell Wrapper
Frame Wrapper
IPWrapper
TCP-Processor
TCPEncode
TCPDecode
TCP circuit Scan circuit
Network TrafficControl Interface
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
22
Outline
• Motivation and Background
• Architecture and Related Work
• Live Internet Traffic Processing
• Conclusion and Future Work
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
23
Washington University Network
• 384 Mbps total Internet bandwidth– 300 Mbps Internet– 84 Mbps Internet2
• Approx 19,000 active end systems • Approx 10,000 students• Traffic analyzed for 5 week period
– Aug 20th to Sep 24th
– Over 1000 charts generated
• Selected highlights presented
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
24
Washington University Network
To TCP Processor
Internet /
Internet2
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
25
Live Internet Traffic Analysis
Empty
Unused
ScanCircuit
GigELine Card
Port 4Port 5
Port 6
Port 7
Port 3
Port 1Port 0
WUGS-20
WashUInternettraffic
Port 2
PortTrackerCircuit
TCPProcessor
GigELine Card
G-LinkSwitch Ctrl
External Stats
Monitor
WUGS-20Standalone
FPX-in-a-Box
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
26
Data Collection
StatsCollectorSNMPAgent
Multi-RouterTraffic Grapher
Statistics are sentto StatsCollectorapplication fromhardware circuits
StatsCollectorspools raw data to
disk files andretransmits stats
A SNMP agentpublishes thestatistics in a
standard format
MRTG queries theSNMP agent andgenerates traffic
charts
gnuplot
A Perl script readsraw data files and
calls gnuplot togenerate charts P
kts
Time
Pkt
s
Time
Real-time processing
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
27
Current Live Traffic
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
28
Collected Statistics
Configuration InformationSSM New ConnectionsSSM End ConnectionsSSM Reused ConnectionsSSM Active ConnectionsINB Input WordsINB Input PacketsINB Dropped PacketsINB Output PacketsENG TCP PacketsENG SYN PacketsENG FIN PacketsENG RST PacketsENG Zero Length PacketsENG Retransmitted PacketsENG Out-of-Sequence PktsENG Bad ChecksumsRTR TCP Data BytesRTR Client Packets
RTR Bypass PacketsEGR Client Packets InEGR Bypass Packets InEGR TCP Checksum UpdateEGR Packets Out
FTPSSHTelnetSMTPTIMNameservWhoisLoginDNSTFTPGopherFingerHTTPPOP
SFTPSQLNNTPNetBIOSSNMPBGPGACPIRCDLSLDAPHTTPSDHCPLowerUpper
TCP Statistics Port Statistics
Scan StatisticsString 1String 2
String 3String 4
Cells InCells DroppedCells BypassCells OutFrame Words InFrame Packets InIP Packets DroppedIP Packet FragmentsIP Packets InIP Words InIP Packets BypassIP Words BypassIP Bad Checksum
Protocol Statistics
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
29
Typical Daily Traffic Pattern
Lowest activity
Highest activity
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
30
IP and TCP Traffic Rates
>90% TCP packets
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
31
Zero Length TCP Packets
20-40% zero length pkts
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
32
Fragmented IP Packets
.25% Fragmented
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
33
Packet Sequencing
3x-4x more retransmitted
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
34
Packet Sequencing (cont)
3%-4% Retransmitted 1% Out of Seq
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
35
Worm/Virus Detection
• Search for digital signatures• MyDoom (appeared 1/26/04)
– Spread via email attachment– Opens back door via ports 3127-3198– Contains SMTP engine to replicate itself– Contains denial of service attack (25% operational)– At Peak, 1 in 12 emails contained virus
• Netsky (appeared 3/1/04)– Spread via email attachment– Scans drives C through Z looking for email addresses– Contains SMTP engine to replicate itself
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
36
MyDoom Virus Detection
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
37
Netsky Virus Detection
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
38
Denial of Service Attack
• TCP SYN Attack– 8 minutes in duration– 71,000 TCP pkts/sec avg (34,000 normal)
– 40,000 TCP SYN pkts/sec avg (2,000 normal)
• IP attack (non TCP traffic)– 3.5 minutes in duration– 91,000 IP pkts/sec peak (36,000 normal)
– 57,000 Non-TCP pkts/sec peak (2,000 normal)
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
39
Attack Difficult to Detect
TCP: 10:25 to 10:34am
IP: 10:37 to 10:41am
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
40
Both Attacks Visible
TCP attack
Non-TCP attack
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
41
TCP SYN Attack
20x increase in SYN packets
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
42
Attack Directed at SSH Port
counter saturated
True spike at 2.4 M pkts
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
43
Non-TCP Attack
29x increase in non-TCP packets
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
44
Flow Classification and Attacks
• State store contains 1 million records
• Record removed after TCP FIN or RST
• Stale records are not aged out
• 500,000 to 800,000 active records normal
• DoS attack can cause flow saturation
• Table quickly settles back to normal range
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
45
Active State Store Records
400,000 new flows
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
46
Outline
• Motivation and Background
• Architecture and Related Work
• Live Internet Traffic Processing
• Conclusion and Future Work
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
47
Insights
• 20%-40% zero length packets– Increase from 18% to 22% (Shalunov: Internet2‘01)
– Implies larger amount of 1-way traffic– Optimization skips processing of these packets
• 5% out of order packets– Agrees with results from (Jaiswal: Infocom‘03)
• Flow classification tables need to be larger– Flow table ½ to ¾ full during normal processing– 1M entry table saturated during attack
• Automated response systems required– Short lived attacks difficult to address manually
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
48
Contributions• Developed Architecture for TCP-Processor
– Hardware-based system– High-performance (multi-gigabit networks)– Per-flow context storage & retrieval
• Implemented TCP-Processor in Reprogrammable Hardware– Operates at 85Mhz on Xilinx Virtex 2000E FPGA– Maximum throughput of 2.7 Gbps– Maximum 2.9M packets/sec
• Created inter-device protocol TCP applications– Multi-device coordination of TCP stream processing– Interfaces with TCP-Processor– Self-describing/extensible transport protocol
• Analyzed live Internet traffic– Insight into Internet traffic profiles
• Supported academic and commercial endeavors
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
49
Future Work
• Packet defragmentation
• Flow classification
• Packet storage manager
• 10Gbps and 40Gbps data processing
• Histogram (packet size, packet type, etc)
• Event rate detection
• Traffic sampling and real-time analysis
• Application integration
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
50
Acknowledgments• Advisor & committee
– John Lockwood (advisor)– Chris Gill– Ron Loui– Ron Indeck– Dave Schimmel
• ARL faculty & staff– Jon Turner– Patrick Crowley– Fred Kuhns– John DeHart
• CSE faculty & staff• ARL & FPX students• NTS
– Steve Wiese• Global Velocity
– Matthew Kulig
• Reuters (formerly Bridge)– Scott Parsons– Deb Grossman– John Leighton
• Recommendations– Scott Parsons– Don Bertier– Andy Cox– Chris Gray
• Reviewers– Tanya Yatzeck– James Hartley
• Family– Jerry & Lois (parents)– Chris & Kreslyn– Nancy, Jeff & Nathan
• Friends
David V. SchuehlerTCPProcessor
HARDWARE CIRCUIT
51
Questions