TCP STREAM PROCESSING AT GIGABIT LINE RATES David Vincent Schuehler Dissertation Defense Washington...

Post on 12-Jan-2016

212 views 0 download

Tags:

Transcript of TCP STREAM PROCESSING AT GIGABIT LINE RATES David Vincent Schuehler Dissertation Defense Washington...

TCP STREAM PROCESSINGAT GIGABIT LINE RATES

David Vincent SchuehlerDissertation Defense

Washington University in St. LouisDepartment of Computer Science and Engineering

November 3, 2004

TCPProcessor

HARDWARE CIRCUIT

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

2

Outline

• Motivation and Background

• Architecture and Related Work

• Live Internet Traffic Processing

• Conclusion and Future Work

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

3

Motivation

• Inspect data moving through networks• Enable application level data processing• Secure networks

– Safeguard confidential data

• Detect and prevent intrusions– Worms, viruses, spam, espionage

• Mitigate denial of service attacks• Characterize and analyze network traffic• Operate at multi-gigabit data rates

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

4

Transmission Control Protocol

Source Destination

Network Data Packets Moving Through Network

Data Payload IP HdrTCP Hdr

Layout of Single Packet

Payload Header

• 86% to 90% of all Internet traffic uses TCP– Web, email, file transfer, remote login, secure communications

• Provides virtual bit pipe between two end systems– Retransmission services– Data reordering services– Flow control services– Congestion avoidance services

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

5

Internet

Cell phone Cellular tower

Municipality

Hand heldcomputer

Satelliteuplink

Laptop

Computer

UNIVERSITY

Government Agency

Corporation

University

Computer

ComputerComputer

InternetServiceProvider

G

C

Gateway router

Core router

G

C

C

C

G

G

C

C

G

C

G

G

GG

C

C

C

C

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

6

Internet

Cell phone Cellular tower

Municipality

Hand heldcomputer

Satelliteuplink

Laptop

Computer

UNIVERSITY

Government Agency

Corporation

University

Computer

ComputerComputer

G

C

Gateway router

Core router

G

C

C

G

G

C

G

G

CC

SPAMVIRUSINTRUSION

C

C

G

G

G

C

C

C

InternetServiceProvider

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

7

Internet

Cell phone Cellular tower

Municipality

Hand heldcomputer

Satelliteuplink

Laptop

Computer

UNIVERSITY

Government Agency

Corporation

University

Computer

ComputerComputer

G

C

Gateway router

Core router

G

C

C

G

G

C

G

G

CC

SPAMVIRUSINTRUSION

C

C

G

G

G

C

C

C

InternetServiceProvider

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

8

Cost of Internet Attacks

YearEconomic Impact Worldwide (mi2g’04)

Representative Attacks (cost)

2003 $236 Billion Sobig.F ($2B)

Blaster ($1.3B)

Slammer ($1.2B)

2002 $118 Billion KLEZ ($9B)

Bugbear ($950M)

2001 $36 Billion Nimbda ($635M)

Code Red ($2.62B)

SirCam ($1.15B)

2000 $26 Billion Love Bug ($8.75B)

1999 $20 Billion Melissa ($1.10B)

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

9

Economic Damage Estimate

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

10

Design Requirements• Architecture that is fast

– Hardware-based system– High-performance (multi-gigabit networks)– Per-flow context storage & retrieval

• Architecture that is scalable– Performance improves with advances in technology

• In-line traffic processing model• Implementation using reasonable resources

– FPGA implementation can be done in research lab• Framework that is flexible

– Integrates with multiple applications– Multi-device coordination of TCP stream processing

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

11

Outline

• Motivation and Background

• Architecture and Related Work

• Live Internet Traffic Processing

• Conclusion and Future Work

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

12

TCP-Processor Architecture

TCPProcessing

Engine

InputBuffer

State Store Manager

PacketRouting

Egress

Data Processing Circuit

TCP Processing Architecture

Stats

Off-Chip Memory

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

13

TCP Processing Engine

Frame FIFO

State Store Manager

TCP Processing Engine

Inpu

t S

tate

Mac

hine

ChecksumEngine

Out

put

Sta

te M

achi

ne

TCP State Processing

Flow Hash Computation

Control&

StateFIFO

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

14

Challenges and Design Choices• Performance

– Operate at multi-gigabit data rates– Hardware-based design exploiting pipelining and parallelism

• Flow classification– Open addressing hash with limited bucket sizes

• Context storage and retrieval– Requires memory read and write for each packet– 64-byte per-flow context - use burst read/write operations

• Reassembly of out-of-order packets– Multiple processing modes (guaranteed and passive)

• TCP processing– Flow monitoring instead of flow termination

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

15

Link Speeds and Packet Rates

Link Type Data rate40 byte

pkts/sec64 byte

pkts/sec500 byte pkts/sec

1500 byte pkts/sec

OC-3 155 Mbps .48 M .3 M 38 K 12 K

OC-12 622 Mbps 1.9 M 1.2 M .16 M 52 K

GigE 1.0 Gbps 3.1 M 2.0 M .25 M 83 K

OC-48 2.5 Gbps 7.8 M 4.8 M .63 M .21 M

OC-192

10 GigE

10 Gbps 31 M 20 M 2.5 M .83 M

OC-768 40 Gbps 125 M 78 M 10 M 3.3 M

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

16

Systems with TCP Processors• Load balancing systems

– Content (cookie) based request routing– Delayed binding technique– Limited to scanning start of flow

• TCP offload engines– Move TCP protocol processing to NIC– Targeting Gigabit NIC market– Intel, NEC, Adaptec, Lucent, and others

• SSL Accelerators– Offload encryption/decryption– Protocol translation

• Intrusion Detection Systems– Traffic Rates < 1Gbps – Perform content scanning and some stream reassembly

LOAD

BALANCER

SYN

SYN ACK

ACK

Request

SYN

SYN ACK

ACK

Request

ResponseResponse

END

USER

WEB

SERVER

SSL

ACCELERATOR

SYN

SYN ACK

ACK

Request

SYN

SYN ACK

ACK

Request

Response

Response

END

USER

WEB

SERVER

Encrypted Not Encrypted

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

17

Related Work in TCP Processing• Software-based TCP processing

– Ethereal, tcpdump, etc – require post processing– Snort w/TCP option – larger virtual packets– Cluster-based online monitoring system (Mao: WIDM’01)– Bro – rule based processing (Paxson: Computer Networks’99)– STAT/STATL – state based processing (Vingna: DISCEX’00)– Intel – Xeon as packet processor (Regnier: HotI’03)

• Hardware-based TCP processing– Georgia Tech – 1 flow/circuit (Necker: FCCM’02)– University of Oslo – 1 flow/ circuit (Li: FPL’03)

– Indiana University and Imperial College – Netflow statistics– University of Tokyo – multi-flow stream scanning (Sugawara: FPL’04)– Intel TCP processor – 8k connections, 9Gbps (Xu: HotChips’03)

• Network processors– Intel IXP 1200, 2400, 2800, 2850– Motorola PowerQUICC

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

18

Taxonomy of Packet ProcessorsD

ata

Rat

eX

Con

text

Rec

ords

Hardware

Snort w/TCP option

TCP-Processor

BRO/STATL

IP LookupPacket Forwarding

Other FPGATCP Processors

TCPTermination

NetworkProcessors

PacketCapture

Load BalancerSSL Accelerator

Software based systems

Store little or no state

Intel projects

TCP-Processor

Software

ExperimentalTCP Processor

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

19

Multi-Device Coordination

• Encodes interface signals

• Regenerates waveforms on separate device

• Provides extensible format & self describing structure

TCPProcessing

Circuit

DataProcessing

Circuit1

Device 1 Device 2

DataProcessing

Circuit2

Device 3

Encode Decode Transport

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

20

Place & Route Results

• Including Protocol Wrappers & Encoder/Decoder • Target Xilinx Virtex XCV2000E-8• FPX Platform• Number of BLOCKRAMs

– 95 out of 160 (59%)

• Number of SLICEs– 7279 out of 19200 (37%)

• Maximum clock frequency: 85.565MHz• Maximum data throughput: 2.7 Gbps• Maximum packets per second: 2.9M packets/sec

– Min 29 clock cycles per packet (345 ns)– Throughput limited by memory latency

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

21

Content Scanning

PC100SDRAM

ZBTSRAM

addr

D[64]

addr

D[36]

Xilinx XCV2000E FPGA

TCPDecode

TCPEncode

Ctl CellProcessor

Cell Wrapper

Frame Wrapper

IPWrapper

CTLProc

Scan Circuit

StateStore

quer

yst

ate

upda

test

ate

PC100SDRAM

ZBTSRAM

addr

D[64]

addr

D[36]

Xilinx XCV2000E FPGA

Cell Wrapper

Frame Wrapper

IPWrapper

TCP-Processor

TCPEncode

TCPDecode

TCP circuit Scan circuit

Network TrafficControl Interface

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

22

Outline

• Motivation and Background

• Architecture and Related Work

• Live Internet Traffic Processing

• Conclusion and Future Work

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

23

Washington University Network

• 384 Mbps total Internet bandwidth– 300 Mbps Internet– 84 Mbps Internet2

• Approx 19,000 active end systems • Approx 10,000 students• Traffic analyzed for 5 week period

– Aug 20th to Sep 24th

– Over 1000 charts generated

• Selected highlights presented

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

24

Washington University Network

To TCP Processor

Internet /

Internet2

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

25

Live Internet Traffic Analysis

Empty

Unused

ScanCircuit

GigELine Card

Port 4Port 5

Port 6

Port 7

Port 3

Port 1Port 0

WUGS-20

WashUInternettraffic

Port 2

PortTrackerCircuit

TCPProcessor

GigELine Card

G-LinkSwitch Ctrl

External Stats

Monitor

WUGS-20Standalone

FPX-in-a-Box

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

26

Data Collection

StatsCollectorSNMPAgent

Multi-RouterTraffic Grapher

Statistics are sentto StatsCollectorapplication fromhardware circuits

StatsCollectorspools raw data to

disk files andretransmits stats

A SNMP agentpublishes thestatistics in a

standard format

MRTG queries theSNMP agent andgenerates traffic

charts

gnuplot

A Perl script readsraw data files and

calls gnuplot togenerate charts P

kts

Time

Pkt

s

Time

Real-time processing

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

27

Current Live Traffic

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

28

Collected Statistics

Configuration InformationSSM New ConnectionsSSM End ConnectionsSSM Reused ConnectionsSSM Active ConnectionsINB Input WordsINB Input PacketsINB Dropped PacketsINB Output PacketsENG TCP PacketsENG SYN PacketsENG FIN PacketsENG RST PacketsENG Zero Length PacketsENG Retransmitted PacketsENG Out-of-Sequence PktsENG Bad ChecksumsRTR TCP Data BytesRTR Client Packets

RTR Bypass PacketsEGR Client Packets InEGR Bypass Packets InEGR TCP Checksum UpdateEGR Packets Out

FTPSSHTelnetSMTPTIMNameservWhoisLoginDNSTFTPGopherFingerHTTPPOP

SFTPSQLNNTPNetBIOSSNMPBGPGACPIRCDLSLDAPHTTPSDHCPLowerUpper

TCP Statistics Port Statistics

Scan StatisticsString 1String 2

String 3String 4

Cells InCells DroppedCells BypassCells OutFrame Words InFrame Packets InIP Packets DroppedIP Packet FragmentsIP Packets InIP Words InIP Packets BypassIP Words BypassIP Bad Checksum

Protocol Statistics

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

29

Typical Daily Traffic Pattern

Lowest activity

Highest activity

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

30

IP and TCP Traffic Rates

>90% TCP packets

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

31

Zero Length TCP Packets

20-40% zero length pkts

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

32

Fragmented IP Packets

.25% Fragmented

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

33

Packet Sequencing

3x-4x more retransmitted

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

34

Packet Sequencing (cont)

3%-4% Retransmitted 1% Out of Seq

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

35

Worm/Virus Detection

• Search for digital signatures• MyDoom (appeared 1/26/04)

– Spread via email attachment– Opens back door via ports 3127-3198– Contains SMTP engine to replicate itself– Contains denial of service attack (25% operational)– At Peak, 1 in 12 emails contained virus

• Netsky (appeared 3/1/04)– Spread via email attachment– Scans drives C through Z looking for email addresses– Contains SMTP engine to replicate itself

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

36

MyDoom Virus Detection

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

37

Netsky Virus Detection

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

38

Denial of Service Attack

• TCP SYN Attack– 8 minutes in duration– 71,000 TCP pkts/sec avg (34,000 normal)

– 40,000 TCP SYN pkts/sec avg (2,000 normal)

• IP attack (non TCP traffic)– 3.5 minutes in duration– 91,000 IP pkts/sec peak (36,000 normal)

– 57,000 Non-TCP pkts/sec peak (2,000 normal)

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

39

Attack Difficult to Detect

TCP: 10:25 to 10:34am

IP: 10:37 to 10:41am

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

40

Both Attacks Visible

TCP attack

Non-TCP attack

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

41

TCP SYN Attack

20x increase in SYN packets

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

42

Attack Directed at SSH Port

counter saturated

True spike at 2.4 M pkts

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

43

Non-TCP Attack

29x increase in non-TCP packets

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

44

Flow Classification and Attacks

• State store contains 1 million records

• Record removed after TCP FIN or RST

• Stale records are not aged out

• 500,000 to 800,000 active records normal

• DoS attack can cause flow saturation

• Table quickly settles back to normal range

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

45

Active State Store Records

400,000 new flows

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

46

Outline

• Motivation and Background

• Architecture and Related Work

• Live Internet Traffic Processing

• Conclusion and Future Work

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

47

Insights

• 20%-40% zero length packets– Increase from 18% to 22% (Shalunov: Internet2‘01)

– Implies larger amount of 1-way traffic– Optimization skips processing of these packets

• 5% out of order packets– Agrees with results from (Jaiswal: Infocom‘03)

• Flow classification tables need to be larger– Flow table ½ to ¾ full during normal processing– 1M entry table saturated during attack

• Automated response systems required– Short lived attacks difficult to address manually

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

48

Contributions• Developed Architecture for TCP-Processor

– Hardware-based system– High-performance (multi-gigabit networks)– Per-flow context storage & retrieval

• Implemented TCP-Processor in Reprogrammable Hardware– Operates at 85Mhz on Xilinx Virtex 2000E FPGA– Maximum throughput of 2.7 Gbps– Maximum 2.9M packets/sec

• Created inter-device protocol TCP applications– Multi-device coordination of TCP stream processing– Interfaces with TCP-Processor– Self-describing/extensible transport protocol

• Analyzed live Internet traffic– Insight into Internet traffic profiles

• Supported academic and commercial endeavors

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

49

Future Work

• Packet defragmentation

• Flow classification

• Packet storage manager

• 10Gbps and 40Gbps data processing

• Histogram (packet size, packet type, etc)

• Event rate detection

• Traffic sampling and real-time analysis

• Application integration

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

50

Acknowledgments• Advisor & committee

– John Lockwood (advisor)– Chris Gill– Ron Loui– Ron Indeck– Dave Schimmel

• ARL faculty & staff– Jon Turner– Patrick Crowley– Fred Kuhns– John DeHart

• CSE faculty & staff• ARL & FPX students• NTS

– Steve Wiese• Global Velocity

– Matthew Kulig

• Reuters (formerly Bridge)– Scott Parsons– Deb Grossman– John Leighton

• Recommendations– Scott Parsons– Don Bertier– Andy Cox– Chris Gray

• Reviewers– Tanya Yatzeck– James Hartley

• Family– Jerry & Lois (parents)– Chris & Kreslyn– Nancy, Jeff & Nathan

• Friends

David V. SchuehlerTCPProcessor

HARDWARE CIRCUIT

51

Questions