Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current...
Transcript of Applications & Application-Layer Protocols: Synthetic ...mweigle/clemson/courses/...»(Current...
1
CPSC 826Internetworking
Applications &Application-Layer Protocols:Synthetic Traffic Generation*
Michele WeigleDepartment of Computer Science
Clemson [email protected]
September 13, 2004
http://www.cs.clemson.edu/~mweigle/courses/cpsc826
* Slides derived from “How Real Can Network Traffic Be” by Kevin Jeffay, presented at the Univ. of Virginia, 2004. 2
Generating Synthetic TrafficOutline
Why do we need to generate synthetic traffic?
Why can’t we just replay a packet trace?
How can HTTP traffic be described?» History (Mah model, SURGE, thttp, PackMime)
Is just describing HTTP traffic enough?» a-b-t trace modeling
3
Generating Synthetic TrafficWhy?
Suppose you develop a new router active queuemanagement (AQM) scheme
How does it affect the performance of currentInternet applications?
Simulation!» In a network simulator (like ns-2)» In a testbed
ISP 1 EdgeISP 1 EdgeRouterRouter
ISP 2 EdgeISP 2 EdgeRouterRouter
ISP1ISP1Clients/ServersClients/Servers
ISP2ISP2Clients/ServersClients/Servers
Will AQMWill AQMHelp?Help?
4
Generating Synthetic Trafficns-2 and Testbeds
ns-2» Network simulator» www.isi.edu/nsnam/ns/» Simulate thousands of
connections on onecomputer
Testbeds» Use lots of computers to
simulate even morecomputers
» Input some traffic» Monitor how it performs
“Tuning RED for Web Traffic,”Mikkel Christiansen, KevinJeffay, Don Smith, and David Ott, SIGCOMM 2000.
5
Internet
Generating Synthetic TrafficFirst Approach
“Realistic” traffic generation» Collect a packet trace from a link of interest
Arrival times, packet sizes, …» Replay the trace directly, or» Model the trace and use the model to generate statistically similar
traces Will the resulting traffic be “real” enough?
Clemson
time
6
Generating Synthetic TrafficSource-level traffic generation
Since the network shapes the traffic, what about the traffic isinvariant of the network?» Axiom: The application/user’s behavior is invariant of low-level
network processes
The Floyd, Paxson argument: source-level generation of trafficis preferred over packet-level generation» We desire application-dependent, network independent models of traffic
InternetClemson
time
7
Generating Synthetic TrafficSource-level traffic generation
Clemson Internet
time
WebRequest
HTMLSource Req. Image
2,500 bytes 4,800 bytes 800 b 1,800 b
Per-Flow Filtering
We need models ofhow applicationsgenerate traffic» Models of
applicationprotocols plusmodels of howapplications areused by users
Approaches:» Analytic models» Empirical models
8
Source-Level Traffic GenerationApproaches
Mah Model» Mah, “An Empirical Model of HTTP Network Traffic”, INFOCOM
1997. SURGE
» Barford and Crovella, “Generating Representative Web Workloads forNetwork and Server Performance Evaluation”, ACM SIGMETRICS1998
thttp» Hernandez-Campos, Jeffay, and Smith, “Tracking the Evolution of
Web Traffic: 1995-2003”, IEEE/ACM MASCOTS 2003 PackMime
» Cao, Cleveland, Gao, Jeffay, Smith, and Weigle, “Stochastic Modelsfor Generating Synthetic HTTP Source Traffic”, INFOCOM 2004.
tmix» Hernandez-Campos, Smith, and Jeffay, “Generating Realistic TCP
Workloads”, CMG 2004.
9
ApproachesMah Model (1997)
Based on packet traces» Late 1995» 10 Mbps Ethernet» ~1.7 million HTTP packets traced
~5000 HTTP responses HTTP/1.0
» no persistent connections» no pipelining
Model is a series of empirical CDFs Primary random variables:
» Request sizes/Reply sizes» User think time» Document size - number of embedded images/page» Consecutive documents retrievals - same server» Server selection - popularity
REQREQ
RSPRSP
UserUser
ServerServer
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
TimeTime
10
ApproachesSURGE (1998) REQREQ
RSPRSP
UserUser
ServerServer
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
TimeTime
Intensity measured in “user equivalents”» The number of web users whose behavior is simulated» Make request, wait» ON / OFF process
Primary random variables:» Request sizes/Reply sizes» Popularity» Embedded references» Temporal locality» OFF times (active and inactive)
11
Approachesthttp (2003)
Primary random variables:» Request sizes/Reply sizes» User think time» Persistent connection usage» Number of objects per
persistent connection
» Number of embedded images/page» Number of parallel connections» Consecutive documents per server» Number of servers per page
REQREQ
RSPRSP
UserUser
ServerServer
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
TimeTime
Based on packet traces» 2003» 1 Gbps Ethernet» Millions of HTTP packets traced
96 million HTTP responses HTTP/1.0 and HTTP/1.1 Model is a series of empirical CDFs
12
ApproachesPackMime (2004)
Based on packet traces» Early 2000 / Late 2000» 100 Mbps Ethernet / 1 Gpbs Ethernet» Millions of HTTP packets
HTTP/1.0 and HTTP/1.1 Connection-based rather than page-based Intensity measured in new connections per second Primary random variables:
» Request sizes/Reply sizes» Time between requests» Number of pages requested» Number of request-response exchanges per page to same server» Network-specific: RTT, loss, bottleneck link speed
REQREQ
RSPRSP
UserUser
ServerServer
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
REQREQ
RSPRSP
TimeTime
13
Source-Level Traffic GenerationThe failure of existing approaches
Dominant approach is tomodel individual applications
Wide-area traffic isgenerated by many differentapplications
Simulation/testbedexperiments should generate“traffic mixes”
Does the HTTP source-levelmodel constructionparadigm scale to otherapplications?
WebWeb
File-SharingFile-Sharing
Other TCPOther TCPOther UDPOther UDP
FTP, EmailFTP, EmailStreaming, DNS, GamesStreaming, DNS, Games
14
Constructing Source-Level ModelsSteps for simple request/response protocols
Obtain a trace of TCP/IP headers from a network link» (Current ethics dictate that tracing beyond TCP header is
inappropriate without users’ permission) Use changes in TCP sequence numbers (and
knowledge of HTTP) to infer application data unit(ADU) boundaries
Compute empirical distributions of the ADUs (andhigher-level objects) of interest
15
Ex: HTTP Model ConstructionHTTP inference from TCP packet headers
ClientClient ServerServer
DATADATA
ACKACK
DATADATADATADATAACKACK
FINFINFIN-ACKFIN-ACK
FINFINFIN-ACKFIN-ACK
SYNSYN
SYN-ACKSYN-ACKACKACK
seqnoseqno 305305 acknoackno 11
seqnoseqno 11 acknoackno 305305
seqnoseqno 14611461 acknoackno 305305
seqnoseqno 28762876 acknoackno 305305
seqnoseqno 305305 acknoackno 28762876TIMETIME
16
Ex: HTTP Model ConstructionHTTP inference from TCP packet headers
ClientClient ServerServer
DATADATA
ACKACK
DATADATADATADATAACKACK
FINFINFIN-ACKFIN-ACK
FINFINFIN-ACKFIN-ACK
SYNSYN
SYN-ACKSYN-ACKACKACK
305305 bytes bytes
28762876 bytes bytes
seqnoseqno 305305 acknoackno 11
seqnoseqno 1 1 ackno ackno 305305
seqnoseqno 14611461 acknoackno 305305
seqnoseqno 28762876 acknoackno 305305
seqnoseqno 305 305 ackno ackno 28762876TIMETIME
17
Ex: HTTP Model ConstructionHTTP inference from TCP packet headers
ClientClient ServerServer
ACKACK
ACKACK
FINFINFIN-ACKFIN-ACK
FINFINFIN-ACKFIN-ACK
SYNSYN
SYN-ACKSYN-ACKACKACK
305 bytes305 bytes
seqnoseqno 1 1 ackno ackno 305305
seqnoseqno 305 305 ackno ackno 28762876TIMETIME
2,876 bytes2,876 bytes
HTTPHTTPRequestRequest
HTTPHTTPResponseResponse
18
Source-Level Traffic GenerationDo current model generation methods scale?
Implicit assumptions behind application modelingtechniques:» We can identify the application corresponding to a given flow
recorded during a measurement period» We can identify traffic generated by (instances) of the same
application» We know the operation of the application-level protocol
Ex: The HTTP success story:» Request sizes/Reply sizes» User think time» Persistent connection usage» Nbr of objects per persistent
connection
» Number of embedded images/page– Number of parallel connections» Consecutive documents per server» Number of servers per page
19
Source-Level Traffic GenerationDo current model generation methods scale?
Implicit assumptions behind application modelingtechniques:» We can identify the application corresponding to a given flow
recorded during a measurement period» We can identify traffic generated by (instances) of the same
application» We know the operation of the application-level protocol
What’s needed is an application-independentmethod of constructing source-level traffic models» We need to be able to construct application-level models
of traffic without knowing what applications are beingused or how the applications work
» We need to construct source-level models of applicationmixes seen in real networks
20
TCP Connection SignaturesRecording communication “patterns”
DATADATA
ACKACK
DATADATADATADATAACKACK
FINFINFIN-ACKFIN-ACK
FINFINFIN-ACKFIN-ACK
SYNSYN
SYN-ACKSYN-ACKACKACK
seqseq 305305 ackack 11
seqseq 11 ackack 305305
seqseq 14611461 ackack 305305
seqseq 28762876 ackack 305305
seqseq 305305 ackack 28762876
WebWebServerServer
2,876 bytes2,876 bytes
DATADATA
ACKACK
DATADATADATADATAACKACK
FINFINFIN-ACKFIN-ACK
FINFINFIN-ACKFIN-ACK
SYNSYN
SYN-ACKSYN-ACKACKACK
seqseq 305 305 ack ack 11
seqseq 1 1 ack ack 305305
seqseq 1461 1461 ack ack 305305
seqseq 2876 2876 ack ack 305305
seqseq 305 305 ack ack 28762876
CallerCaller CalleeCallee
HTTPHTTPResponseResponse
HTTPHTTPRequestRequest305 bytes305 bytes
WebWebBrowserBrowser
21
TCP Connection SignaturesRecording communication “patterns”
TIMETIME
Web ClientWeb Client
Web ServerWeb Server
HTTPHTTPRequestRequest305 bytes305 bytes
HTTPHTTPResponseResponse
2,876 bytes2,876 bytes
Communication pattern was (a1, b1)» E.g., (305 bytes, 2,876 bytes)
22
TCP Connection SignaturesThe a-b-t trace model
CallerCaller
CalleeCallee
aa11bytesbytes
bb11 bytes bytes
aa22bytesbytes
bb33bytesbytes
aa33bytesbytes
Epoch 1Epoch 1 Epoch 2Epoch 2 Epoch 3Epoch 3
tt11 seconds seconds tt22 seconds seconds
bb22bytesbytes
We model a TCP connection as a-b-t vector:((a1, b1, t1), (a2, b2, t2), …, (ae, be, ⊥))
where e is the number of epochs
23
SMTP (send email)
The a-b-t Trace ModelTypical Communication Patterns
Telnet (remote terminal)
FTP-DATA (file download)
24
Source-Level Trace ReplayTraffic generation in a laboratory testbed
Given a testbed or simulator, can we effectively simulateAbilene?
» Can we simulate “the Internet” in a lab or inside a modest computerusing a simple dumbbell topology?
» Can we get away from having to make arbitrary decisions about howwe generate synthetic traffic?
Cloud1Cloud1Browsers/Browsers/
ServersServersCloud2Cloud2
Browsers/Browsers/ServersServers
EthernetEthernetSwitchSwitch
EthernetEthernetSwitchSwitch
Cloud 1Cloud 1Edge RouterEdge Router
Cloud 2Cloud 2Edge RouterEdge Router
… …
Abilene traffic?Abilene traffic?
25
Source-Level Trace ReplayTraffic generation in a laboratory testbed
Testbed:» 150+ end-systems, 10/100/1,000 Mbps connectivity,
dozens of switches and routers
Cloud1Cloud1Browsers/Browsers/
ServersServersCloud2Cloud2
Browsers/Browsers/ServersServers
EthernetEthernetSwitchSwitch
EthernetEthernetSwitchSwitch
Cloud 1Cloud 1Edge RouterEdge Router
Cloud 2Cloud 2Edge RouterEdge Router
… …
Abilene traffic?Abilene traffic?
Input trace: A 2-hour Abilene trace from theNLANR repository» 334 billion bytes, 404 million packets, 5 million TCP
connections26
Source-Level Trace ReplayTraffic generation in a laboratory testbed
Anonymized Packet Header Trace
Source-level Trace:Set of a-b-t Connection Vectors
TrafficGenerators
TrafficGenerators
Processing
WorkloadPartitioning
Synthetic Packet Header Trace
TESTBED
How doessynthetic trace
compare tooriginal trace?
27
Source-Level Trace ReplayTraffic generation in a simulator
Anonymized Packet Header Trace
Source-level Trace:Set of a-b-t Connection Vectors
SimulatedSources
SimulatedSources
Processing
Synthetic Packet Header Trace
SIMULATOR
How doessynthetic trace
compare tooriginal trace?
Translation
28
Validation of Generated TrafficQuestions
Can we reproduce source-level properties of the originaltraffic?
Can we reproduce interesting measures of the originaltrace?» Throughput per unit time» Number of active connections per unit time» Connection transmission rates» Long range dependence in packet and byte arrivals» …
Can we see interesting differences between UNC trafficand Abilene traffic?
29
Synthetic Traffic GenerationSummary
Simulation is the backbone of networking research
Too little attention is paid to realistic traffic generation» How can we derive fundamental truths from today’s
simulation results?
Model traffic as patterns of data exchange patternswithin TCP connections
» Application-independent, network-independent
Development of new, flexible traffic generators» With tunable degrees of realism