John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007....
Transcript of John Bresnahan, Ioan Raicu & Gohar Margaryaniraicu/research/presentations/2004_cs322... · 2007....
A Study between Networks A Study between Networks and General Purpose Systems and General Purpose Systems
for High Bandwidth Video for High Bandwidth Video StreamingStreaming
John Bresnahan, Ioan Raicu & Gohar Margaryan
June 3rd, 2004Computer Architecture – Spring Quarter 2004
Computer Science DepartmentUniversity of Chicago
Computer Architecture Presentation 26/6/2004
Problem Description & Motivation
• Study interaction between network, memory and CPU– Hardware is improving at different rates– Flow of bytes between components– CPUs involvement in rate of flow
• Predict appropriate hardware for a given high bandwidth workload
• Identify bottlenecks• Visualize applications in pseudo-realtime
Computer Architecture Presentation 36/6/2004
Yearly Performance Improvement
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1980 1985 1990 1995 2000 2005 2010 2015 2020
Year
Year
ly P
erfo
rman
ce Im
prov
emen
t
Memory Bandw idth Netw ork Bandw idth I/O Bus Bandw idth Disk Bandw idth Memory Latency CPU Latency
Computer Architecture Presentation 46/6/2004
Historical Trends
0.01
0.1
1
10
100
1000
10000
100000
1000000
1980 1985 1990 1995 2000 2005 2010 2015 2020
Year
Band
wid
th (G
b/s)
0.0001
0.001
0.01
0.1
1
10
100
1000
Late
ncy
(nan
osec
onds
)
Memory Bandwidth Network Bandwidth I/O Bus Bandwidth Disk Bandwidth Memory Latency CPU Latency
Computer Architecture Presentation 56/6/2004
Network Application Model NETWORK
Kernel Buffer
User Buffer
NIC Buffer OS
Application
TransportLayer(TCP)
TransportLayer(UDP)
Network Layer (IP)
Physical/Data LinkLayer (Ethernet)
MPEGDecoder
DMA
CPU
General Purpose System
Computer Architecture Presentation 66/6/2004
Our Approach
• Create a discrete event simulator– Model network app components– Flow of data between components– Configurable parameters…
• Empirical study to collect component performance• Profiling jobs
– Visualize use of components• Achieved throughput, dropped packets
Computer Architecture Presentation 76/6/2004
Computer Architecture Presentation 86/6/2004
Component Modeling
• NIC– Tests over 10/100/1000 Mb/s running TCP and UDP
• Memory– Cache Burst 32– Measures L1 cache, L2 cache, and main memory read, write, and
copy throughput & latency• CPU
– Network processing• packet/second CPU Cycles per byte of header processing • 2 copies: NIC buffer Kernel buffer User buffer• Iperf over local loopback address
– MPEG• CPU Cycles per byte of processing
Computer Architecture Presentation 96/6/2004
0
500
1000
1500
2000
2500
3000
MB/sec
SD-RAM @ 66 MHz SD-RAM @ 133 MHz DDR-RAM @ 332 MHzType of Memory
Main Memory Performance
Memory BandwidthMemory Read ThroughputMemory Write ThroughputMemory Copy Throughput
Computer Architecture Presentation 106/6/2004
0.0
500.0
1000.0
1500.0
2000.0
2500.0
3000.0
3500.0
4000.0
Thro
ughp
ut (M
bit/s
)
466 MHz 1300 MHz 2166 MHzProcessor Speed
TCP, UDP, and Memory Copy Performance
Achieved TCP ThroughputTheoretical TCP ThroughputAchieved UDP ThroughputTheoretical UDP ThroughputMemory Copy Throughput
Computer Architecture Presentation 116/6/2004
0
50
100
150
200
250
300
350
400
Pack
ets
/ sec
ond
(thou
sand
s)Cy
cles
/ by
te
466 MHz 1300 MHz 2166 MHzProcessor Speed
TCP and UDP Performance vs. Processing Power
TCP Processing Power (packets/sec) UDP Processing Power (packets/sec)TCP Processing Overhead (cycles/byte) UDP Processing Overhead (cycles/byte)
Computer Architecture Presentation 126/6/2004
Benchmarks
• MPEG_sw– software decoding (intensive CPU) and variable network traffic
• MPEG_hw– hardware decoding and variable network traffic
• RAW– Constant network traffic
Computer Architecture Presentation 136/6/2004
Video 1 Required Variable Throughput (Mb/s)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3
Time (seconds)
Thro
ughp
ut (M
b/s)
Required Throughput (Mb/s)
Computer Architecture Presentation 146/6/2004
Benchmark Results
0.0
500.0
1000.0
1500.0
2000.0
2500.0
3000.0
3500.0
4000.0
Thro
ughp
ut (M
b/s)
466 MHz 1300 MHz 2166 MHzProcessor Speed
sysSIM Validation
sysSIM UDP ThroughputAchieved UDP ThroughputTheoretical UDP ThroughputMemory Copy Throughput
Computer Architecture Presentation 156/6/2004
Bottleneck Shifting in Time
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
10
100
1000
10000
100000
1000000
1980 1985 1990 1995 2000 2005 2010 2015 2020
Year
Thro
ughp
ut (G
b/s)
Network Bandwidth Memory Copy ThroughputsysSIM Throughput UDP Theoretical Throughput - CPU bound
Computer Architecture Presentation 166/6/2004
Assumptions and Weaknesses
• LAN environment– NO out of order arrival of packets– NO “lost” packets– NO erroneous packets
• Unidirectional traffic– OK for modeling UDP, but oversimplification for TCP
• TCP/UDP/IP: 2 copies of data in protocol stack• Future trends will follow past trends• Empirical studies sampled only 3 machines• Many details about network protocol stack and OS left
untouched
Computer Architecture Presentation 176/6/2004
Related work
• Simulators– SimOS: complete machine simulation environment that runs commercial
OS– M5: simulation system targeting network intensive workloads that runs
unmodified commercial OS– CSIM: discrete event simulator for describing parallel processor
architectures and software mappings• Visualizations
– Visualization Tool (VT) – FlowScan: A Network Traffic Flow Reporting and Visualization Tool
• Empirical Studies– The Architectural Costs of Streaming I/O: A Comparison of Workstations,
Clusters, and SMPs– Server Network I/O Acceleration: Fundamental to the Data Center of the
Future– lmbench: Portable Tools for Performance Analysis
Computer Architecture Presentation 186/6/2004
Conclusions
• Memory is not a bottleneck yet, but the gap is closing• CPU is the bottleneck, but at the rate of increase in
CPU speeds, it will not be a bottleneck for long• At the current rate of network speed increases, we
don’t foresee the network to be a bottleneck
Computer Architecture Presentation 196/6/2004
Solutions and Open Problems
• Multiple memory banks• TCP offloading / Network processors• Hardware threads• Multiple processors (SMP)• Use high speed cache memory for buffers• 0-copy scheme
Computer Architecture Presentation 206/6/2004
References
[1] S. McCanne and S. Floyd. “NS-2 Network Simulator”. http://www.isi.edu/nsnam/ns/. [2] MENDEL ROSENBLUM, EDOUARD BUGNION, SCOTT DEVINE, and STEPHEN A. HERROD. “Using the SimOS Machine Simulator to
Study Complex Computer Systems.” ACM Transactions on Modeling and Computer Simulation, Vol. 7, No. 1, January 1997, Pages 78–103.
[3] Carl Hein. “CSIM - Parallel Process and Diagrams Simulator”, Lockheed-Martin ATL, 2004, http://www.atl.lmco.com/proj/csim/simulator/csim_doc.html.
[4] Nathan L. Binkert, Erik G. Hallnor, and Steven K. Reinhardt. “Network-Oriented Full-System Simulation using M5”. Sixth Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), Feb 2003.
[5] R. H. Arpaci-Dusseau, A. C. Arpaci-Dusseau, D. E. Culler, J. M. Hellerstein, and D. A. Patterson. "The architectural costs of streaming I/O: a comparison of workstations, clusters, and SMPs," Proc. 4th Symposium on High-Performance Computer Architecture (HPCA-4), pages 90 - 101, February 1998.
[6] Mellanox Technologies. “Comparative I/O Analysis: InfiniBand Compared with PCI-X, Fiber Channel, Gigabit Ethernet, Storage over IP, HyperTransport, and RapidIO.” Mellanox Technologies White Paper, 2001. http://www.mellanox.com/technology/shared/IOcompare_WP_140.pdf.
[7] Andrea Emilio Rizzoli. “A Collection of Modelling and Simulation Resources on the Internet”, April 2004, http://www.idsia.ch/~andrea/simtools.html.
[8] Min Xu, Milo Martin+, Doug Burger*, and Mark Hill. “WWW Computer Architecture Page“, 2004, http://www.cs.wisc.edu/~arch/www/tools.html.
[9] John R. Mashey. Big Data and the Next Wave of InfraStress Problems, Solutions, Opportunities. Invited Talk, USENIX 1999. http://www.usenix.org/events/usenix99/invited_talks/mashey.pdf.
[10] Ajay Tirumala, Feng Qin, Jon Dugan, Jim Ferguson, Kevin Gibbs. “Iperf”, March 2003, http://dast.nlanr.net/Projects/Iperf/#whatis.[11] Lawrence A. Rowe, Steve Smoot, Ketan Patel, and Brian Smith. “MPEG Video Software Statistics Gatherer.” Computer Science Division-
EECS, Univ. of Calif. at Berkeley, February 1st, 1995.[12] Vladimir Afanasiev. “Cache Burst 32”. October 17, 2002. http://user.rol.ru/~dxover/cburst/. [13] A Beginner’s Guide for MPEG-2 Standard. http://www.fh-friedberg.de/fachbereiche/e2/telekom-labor/zinke/mk/mpeg2beg/beginnzi.htm. [14] J. Postel. “User Datagram Protocol”, Request for Comments 768, Internet Engineering Task Force, August 1980. [15] DARPA Internet Program. “Transmission Control Protocol”, Request for Comments 793, Internet Engineering Task Force, September
1981. [16] Alessandro Rubini & Jonathan Corbet. “Linux Device Drivers, 2nd Edition, Chapter 14, Network Drivers”. June 2001. [17] Glenn Herrin. “Linux IP Networking: A Guide to the Implementation and Modification of the Linux Protocol Stack”. May 31, 2000.