Post on 28-Dec-2015
An integrated Experimental Environment for Distributed
Systems and Networks
B. White, J. Lepreau, L. Stoller, R. Ricci, S. Guruprasad, M. Newbold, M. Hibler, C. Barb, A. Joglekar
Presented by Sunjun Kim
Jonathan di Costanzo2009/04/13
Outline
MotivationNetbed structureValidation and testingNetbed contributionConclusion
2
Outline
MotivationNetbed structureValidation and testingNetbed contributionConclusion
3
Background
Researchers need a platform in which they can develop, debug, and evaluate their systems
One lab is not enough, lack of resources Need more computers Scalability in terms of distance and number
of nodes can’t be reached Requires a huge amount of time to develop
large scale experiments4
Previous approaches
Simulation: NS
Live networks: PlanetLab
Emulation: Dummynet, NSE
controlled, repeatable environment
Achieves realism Not easy to repeat the experiment again
controlled packet loss and delay
Manual configuration is boring
Loses accuracy due to abstraction
5
Netbed ideas
Derives from “Emulab Classic” A universally-available time- and space-
shared network emulator Automatic configuration from NS script
Add Virtual topologies for network experimentations Integrates simulation, emulation, and live-
network with wide-area nodes experimentation in a single framework
6
Netbed goals
Accuracy Provide artifact-free environment
Universality Anyone can use anything the way he wants
conservative policy for the resource allocation No multiplexing (virtual machine) The resource of one node can be fully utilized
7
Resources
Local-Area Resources Distributed Resources Simulated Resources Emulated Resources
WAN emulator (integrated yet)
PlanetLab ModelNet (still in work)
8
Outline
MotivationNetbed structureValidation and testingNetbed contributionConclusion
9
Netbed structure
Resource
Life cycle
10
Local-Area resources
3 clusters 168 in Utah, 48 PCs in Kentucky & 40 in Georgia
Each node can be used as Edge node, router, traffic-shaping node, traffic
generator
Exclusivity of a machine during an experiment
The OS is given but entirely replaceable11
Local-Area resources
12
Distributed resources
Also called wide-area resources
50-60 nodes in approximatively 30 sites
provides characteristic live network
Very few nodes These nodes are shared between many users FreeBSD Jail mechanism (kind of Virtual machine) Non-root access
13
Distributed resources
14
Simulated resources
Based on nse (NS-emulation) Enables interaction with real traffics
Provides scalability beyond physical resources Many simulated nodes can be multiplexed
15
Emulated resources
VLANs Emulate wide-area links within a local-area
Dummynet Emulates queue & bandwidth limitation ,
introducing delays and packet loss between physical nodes
nodes act as Ethernet bridges transparent to experimental traffic
16
Netbed structure
Resource
Life cycle
17
Life cycle
18
Life cycle
$ns duplex-link $A $B 1.5Mbps 20ms
BA DB
A BBA
SpecificationGlobal Resource AllocationNode Self-ConfigurationExperiment ControlSwap OutParsingSwap In
19
Accessing Netbed
Experiment creation A project leader propose a project on the web A netbed staff accept or reject the project All the experiment will be accessible from the
web
Experiment managment Log on allocated nodes or on the usershost
(fileserver) The fileserver send the OS images, home and
project directories to the other nodes20
Accessing Netbed
21
Specification
Experimenters use ns scripts with Tcl can do as many functions & loops as they want
Netbed defines a small set of ns extension Possibility of chosing a specfic hardware
simultation, emulation, or real implementation Program objects can be defined using a Netbed-
specific ns extension Possibility of using graphical UI
22
Parsing
Front-end Tcl/ns parser Recognizes subset of ns relevant to
topology & traffic generation
Database Store an abstraction of everything about
the exeriment▪ Fixed generated events▪ Information about Hardwares , users &
experiments▪ procedures
23
Parsing
24
Global Resource Allocation
Binds abstractions from the database to physical or simulated entities Best effort to match with specifications On-demand allocations (no reservations)
2 different algorithms for local and distributed nodes (different constraints) Simulated annealing Genetic algorithm 25
Global Resource Allocation
Over-reservation of the bottleneck inter-switch bandwith is to small (2
Gbps) Against their conservative policy
Dynamic changes of the topology are allowed Add and remove nodes
Consistent naming across instantiations Virtualization of IP addresses and host
names
26
Node Self-Configuration
Dynamic linking and loading from the DB Let have the proper context (hostname,
disk image, script to start the experiment)
No persistent configuration states Only volatile memory on the node If requiered, the current soft state can be
stored in the DB as a hard state Swap out / Swap in
27
Node Self-Configuration Local Nodes
All nodes are rebooted in parallel Contact the masterhost which loads the
kernel directed by the database A second level boot may be requiered
Distributed nodes Boot from a CD-ROM then contact the
masterhost A new FreeBSD Jail is instantiated Tested Master Control Client 28
Experiment Control
Netbed supports dynamic experiment control Start, stop and resume processes, traffic
generators and network monitors
Signals between nodes Used of a Publish/Subscribe event
routing system The static events are retrieved from the
DB Dynamics events are possible
29
Experiment Control
ns configuration files is only high-level control
Experimenters can made some low-level controls On local node: root privileges▪ Kernel modification & access to raw sockets
On distributed: Jail-restricted root privileges▪ Access to raw socket with a specific IP address
Each local node support separated network isolated from the experimental one Enable to control a node via a tunnel as we where on
it without interfering 30
Preemption and Scheduling
Netbed try to prevent idling 3 metrics: traffic, use of pseudo-terminal
devices & CPU load average To be sure, a message is sent to the user
who can disapprove manually A challenge for distributed nodes with
several Jails
Netbed proposes automated batch experiments When no interaction is required Enables to wait for available resources
31
Outline
MotivationNetbed structureValidation and testingNetbed contributionConclusion
32
Validation
1st row : emulation overhead Dummynet gives better results than
nse33
Validation
They expect to have better results with future improvements of nse 34
Validation
5 nodes are communicating with 10 links
Evaluation of a derivative of DOOM
Their goal is to sent 30 tics/sec35
Testing
Challenges Depends on physical artifacts (cannot be
cloned) Should evaluate arbitrary programs Must run continuoustly
Minibed: 8 separated Netbed nodes Test mode: prevent hardware
modifications Full-test mode: provides isolated
hardware 36
Outline
MotivationNetbed structureValidation and testingNetbed contributionConclusion
37
Practical benefits
All-in-one set of tools Automated and efficient realization of
virtual topologies Efficient use of resources through time-
sharing and space-sharing Increase of fault-tolerance (resource
virtualization)
38
Practical benefits
Examples The “dumbbell” network▪ 3h15 --> 3 min
Improvement in the utilization of a scarce and expensive infrastructure: 12 months & 168 PC in Utah▪ Time-sharing (swapping): 1064 nodes▪ Space-sharing (isolation): 19,1 years
Virtualization of name and IP addresses▪ No problem with the swappings
39
Experiment creation and swapping
Mapping Reservation Reboot issuing Reboot Miscellaneous
Double time to boot on a custom disk image
Key services
40
Key services
Mapping local resources: assign
Match the user’s requirements Based on simulated annealing Try to minimizes the number of switch and
inter-switch bandwidth Less than 13 seconds
41
Key services
Mapping local resources: assign
42
Key services
Mapping distributed resources: wanassign
Different constraints▪ Fully connected via the internet▪ “Last mile”: type instead of topology▪ Specific topologies may be guaranteed by
requesting particular network characteristics (bandwidth, latency & loss)▪ Based on a genetic algorithm
43
Mapping distributed resources: wanassign 16 nodes 100 edges : ~1sec
256 nodes & 40 edges/nodes : 10min~2h
Key services
44
Key services
Disk reloading 2 possibilities ▪ complete disk image loading▪ incremental synchronization (hash tables on files
or blocks) Good▪ Faster (in their specific case)▪ No corruption
Bad▪ Waste of time when similar images are needed
repeatly▪ Pace reloading of freed node (reserved for 1 user)
45
Key services
Disk reloading Frisbee
Performance techniques:▪ Uses a domain-specific algorithm to skip
unused blocks▪ Delivers images via a custom reliable
multicast protocol
117 sec for 80 nodes, write 550MB instead of 3GB 46
Key services
Scaling of simulated resources
Simulated nodes are multiplexed on 1 physical node▪ Must deal with real time taking into account the
user’s specification : rate of events
Test of a live TCP at 2Mb CBR▪ 850MHz PC with UDP background 2Mb CBR / 50ms
▪ Able to have 150 links for 300 nodes▪ Problem of routing in very complex topologies
47
Example of a new possibility
Possibility to program different batch experiment, with the modification of only 1 parameter by 1
The Armada file system from Oldfield & Kotz 7 bandwidths x 5 latencies x 3
application settings x 4 configs of 20 nodes
420 tests in 30 hrs (4.3 min ~ per experiment)
48
Outline
MotivationNetbed structureValidation and testingNetbed contributionConclusion
49
Summary
Netbed deals with 3 test environments
Reuse of ns script
Quick setup of the test environment
Virtualization techniques provide the artifact-free environment
Enables qualitatively new experimental techniques 50
Future Work
Reliability/Fault Tolerance
Distributed Debugging: Checkpoint/Rollback
Security “Petri Dish”
51
Thank you
Any Question ?