DICE: Performance Update Eric L. Boyd (Internet2) Joe Metzger (ESnet) Nicolas Simar (G2 – JRA1)

Post on 18-Dec-2015

215 views 0 download


Transcript of DICE: Performance Update Eric L. Boyd (Internet2) Joe Metzger (ESnet) Nicolas Simar (G2 – JRA1)

DICE: Performance Update

Eric L. Boyd (Internet2)

Joe Metzger (ESnet)

Nicolas Simar (G2 – JRA1)

Vision: Performance Information is …

• Available– People can find it (Discovery)– “Community of trust” allows access across administrative domain

boundaries (AA)• Ubiquitous

– Widely deployed (Paths of interest covered)– Reliable (Consistently configured correctly)

• Valuable– Actionable (Analysis suggests course of action)– Automatable (Applications act on data)

Getting There: Build & Empower the Community

Decouple the Problem Space:• Analysis and Visualization• Performance Data Sharing• Performance Data GenerationGrow the Footprint:• Clean APIs and protocols between each

layer• Widespread deployment of

measurement infrastructure• Widespread deployment of common

performance measurement tools

Analysis & Visualization

Measurement Infrastructure

Performance Tools Performance


Analysis & Visualization

Measurement Infrastructure



perfSONAR Credits

• perfSONAR is a joint effort:– ESnet– Fermilab– GÉANT2 JRA1– Internet2– RNP

• Internet2 includes:– University of Delaware– Georgia Tech– Internet2 staff

• GÉANT2 JRA1 includes:– Arnes– Belnet– Carnet– Cesnet– DANTE– DFN– FCCN– GRNet– GARR– ISTF– PSNC– Nordunet (Uninett)– Renater– RedIRIS– Surfnet– SWITCH

perfSONAR: Project Activity Meter

• Interactions– 1-2 conf calls/week– 1 new service/month (accelerating)– 3-4 development workshops/year– 3-4 paper submissions/year

• Recruitment– RNP has joined the effort– Outreach to LHC community– GaTech beginning six month commitment

perfSONAR: Services (1)

• Measurement Point Service– Enables the initiation of performance tests

• Measurement Archive Service– Stores performance monitoring results

• Lookup Service– Allows the client to discover the existing services and other LS services.– Dynamic: services registration themselves to the LS and mention their

capabilities, they can also leave or be removed if a service gets down.• AuthN/Z Service

– Internet2 MAT, GN2-JRA5 (eduGAIN)– Authorization functionality for the framework– Users can have several roles, the authorisation is done based on the user

role.– Trust relationships defined between users affiliated with different

administrative domains.

perfSONAR Services (2)

• Transformation Service

– Transform the data (aggregation, concatenation, correlation, translation, etc).

• Topology Service

– Make the network topology information available to the framework.

– Find the closest MP, provide topology information for visualisation tools

• Resource protector

– Arbitrate the consumption of limited resources between multiple services.

Types of perfSONAR Services

• Core Services– Set released by perfSONAR Team

• e.g. LS, AA, 3 MPs, 2 MAs, RP, Tos, TS– Tested for interoperability– Serve as examples for affiliated developers– Targeted at next generation network needs (e.g. GÉANT2,

Internet2 New Network, etc.)• Affiliated Services

– Released by perfSONAR partners, lag Core– May share development infrastructure (Bugzilla, Website, Mailing

Lists)– Candidates for migration to Core Services

• Unaffiliated Services

perfSONAR: Core Status Update

• Production release of core services package v1.0 ready (pending licensing completion)

• Core services include:– Single domain LS solution (PSNC)– RRD MA (PSNC)

• Affiliate services and client applications supporting this version will soon follow:– BWCTL MP (DFN)– perfSONAR UI (ISTF)

• Ongoing work– AA Design (Internet2, JRA1, JRA5)– Multi-LS (PSNC, RNP, UDel)– ToS (DFN, UDel)

perfSONAR Process Status Update

• We have processes … ;-)• Release management process implemented (Internet2, RedIRIS,

UDel)• Bugzilla up and running (UDel)• Migrated from CVS to SVN (Internet2)• Functional testing under construction (GRnet)• Monitoring deployed services with Tomcat (ISTF)• Installation process eased significantly (DANTE, PSNC, UDel)• www.perfsonar.net under development (Internet2, Renater)

– Development information will stay on the Wiki– Adopter information will migrate to website

perfSONAR: Affiliate Status Update

• Affiliated Services– Command Line Interface MP (Ping,

OWAMP, Traceroute) (RNP, released)

– BWCTL MP (DFN, released)– SQL MA (PSNC, released)– L2-specific MA (DANTE)– SSH MP (Looking Glass) (Belnet,

released)– ABW MP (bandwidth packet

capture cards) (Cesnet)– NMS MP (SDH status) (DANTE)– Hades MA (OWD, Jitter, OWPL)

(DFN)– Flow Replicator MA (Surfnet,


• User Interfaces



– Visual PerfSONAR (Carnet)

– Looking Glass (Belnet)

– ICE/NeTraMet (RNP)

What You See Is What You Get

• perfsonarUI– Retrieval of published data

• RRD MA• Hades MA

– Visualisation of OWD, IPDV and packet loss between Hades MP

– Parsing of arbitrary IPv4 or IPv6 traceroute commands

• CNM – map based– GEANT2 + NRENS maps

• VisualperfSONAR• Looking Glass

RRD MA features• Wrapper around RRD tool.• Request/reply interface.• Write into RRD. • LS registration.

• Installation scripts.• Test configuration files available.

Lookup Service Features

• Centralized LS (Creating a distributed LS is ongoing development) • Service Registration (including updates) functionality • Service deregistration functionality • Lookup/query functionality (XQuery/XPath) • Services keep-alives

– including database cleanup, scheduled functionality • Registration component for a service available.

• Installation scripts.

RRD MA deployment Status

MA deployment over time

Numbers of MAs deployed













Time (in months)



er o

f M

A d



New MAdeployed

Total MADeployed

PerfSONAR Next steps• Formal partnership

– License, Partnership Agreement– Interim solution

• Upgrade existing user base (currently using prototype)• Data exchange policy (measurement peering agreement)• Consistent offer of services.

– What services package to suggest to networks. • L2 status monitoring.


Joe Metzger


Last months (Jan – May) - 1

• Services– Lookup Service

• Centralized• Registration / deregistration• Lookup query• Result code

– SQL MA• Stores data in relational database• Supports

– Utilization– L2 status– Result code.

– HADES MA• Provides access to the data archive of Hades measurements

from GEANT2 network

Last months - 2• Tools integration

– Telnet / SSH MP• On-demand requests for device specific information• Cisco/Juniper/Quagga support• Resource protection mechanisms

– To verify the parameters send in the commands– To prevent flood of requests


– TCP throughput measurements

• OWAMP– OWD, PL measurements

Last months - 3• Tools integration

– Passive• ABW

– Counts the number of captured packets and bytes and computes used bandwidth

– Short timescale intervals

• Tracefile Capture Measurement Point (TCMP)– Used for capturing packets of selected flows using either regular

Eth cards or special DAG or COMBO6 cards

• SNMP MP– Web Service access to the usage of SNMP– Get for now and OID discovery

Last months - 4• Alcatel NMS MP

– Web Service access to SDH and WDM monitoring parameters such as SES, ES, UAS and also the G.709 metric BBE

– Acts as a reference implementation– Can be used by other NRENs in order to build perfSONAR

compliant services which can retrieve data from NMS• AA

– Designing and developing a perfSONAR AA service making use of JRA5's eduGAIN

• Topology Service– Common schema with SA3

What You See Is What You Get

• perfsonarUI– Retrieval of published data

• RRD MA• Hades MA

– Visualisation of OWD, IPDV and packet loss between Hades MP

– Parsing of arbitrary IPv4 or IPv6 traceroute commands

• CNM – map based– GEANT2 + NRENS maps

• VisualperfSONAR• Looking Glass

Meet the NRENs sessions

• Powerful tools and useful information

• Design (MA’s, MP’s… approach is good)

• The number of deployed services is high

• Friendly user interfaces• Tools bring a motivation for

installing services for attendees• Sharing of info between projects

is useful

• Need to integrate the tools in a single visualisation application.

• There are too few networks nodes running the services

• Not enough data available• Not enough information

available about perfSONAR• Would like to have libraties/APIs• Requirement for having its

network perfSONAR enabled.

Next disseminations workshops

• SEEREN2 workshop in Heraklion.• E2E service status services deployment for NRENs next

week in Muenchen.• Three more installation workshop planned over the next 12


Data Exchange for E2E Monitoring – Archive scenario

• NREN in charge of retrieving the data from the NMS/DB to analyse them and pass the information to a java class.– About 700-1000 lines of

code for GÉANT – 15 days.• JRA1

– Provides the “mySQL MA service” code

– maintains it.– Provides the script to write

into the DB• JRA4 in charge of the E2E NOC


Connect. Communicate. Collaborate

Year 3 Objectives• Improving the visualisation and tools features (NOC,

PERT, project)• Integration of AA.• Services deployments.• Going operational (with SA3 WI15).• Mastering the amount of data.• L1-L2• Dissemination workshops for NRENs.

Timeline Connect. Communicate. Collaborate

Phases III • End of November 2006• Going operational (SA3 WI-15) :

– RRD MA– LS (plus LS registration for the other services)– SNMP MP– perfsonarUI– CNM– Hades and RIPE TTM MA– BWCTL MP.– L2 status MP.

• Novelties– Netflow Integration – Topology Service – VisualperfSONAR– Multi-LS– Push interface

Phase IV• End of May 2007• Going operational (SA3 WI-15) :

– Multi-LS– Topology Service– Hades MP– BWCTL MA– VisualperfSONAR

• Novelties– First set of services using JRA5 Authentication with

some Authorization.– Performance anomaly detection

Phase V• End of November 2007• Going operational (SA3 WI-15) :

– Authentication Service


Eric Boyd

Vision: Performance Information is …

• Available– People can find it (Discovery)– “Community of trust” allows access across administrative domain

boundaries (AA)• Ubiquitous

– Widely deployed (Paths of interest covered)– Reliable (Consistently configured correctly)

• Valuable– Actionable (Analysis suggests course of action)– Automatable (Applications act on data)

Getting There: Build & Empower the Community

Decouple the Problem Space:• Analysis and Visualization• Performance Data Sharing• Performance Data GenerationGrow the Footprint:• Clean APIs and protocols between each

layer• Widespread deployment of

measurement infrastructure• Widespread deployment of common

performance measurement tools

Analysis & Visualization

Measurement Infrastructure

Performance Tools Performance


Analysis & Visualization

Measurement Infrastructure



Result: No more mystery …

• Increase network awareness– Set user expectations accurately

• Reduce diagnostic costs– Performance problems noticed early – Performance problems addressed efficiently– Network engineers can see & act outside their turf

• Transform application design– Incorporate network intuition into application behavior

Immediate Game-plan:

• Internet2 is leveraged to help provide diagnostic information for “backbone” portion of problem– Create *some* diagnostic tools– Make Abilene data as public as is reasonable

• Work on efforts to more widely make performance data available (perfSONAR)– Contribute to ‘base’ development– Integrate ‘our’ diagnostic tools as ‘good’ example MP/MA services

BWCTL (Bandwidth Controller)

• What is it?A resource allocation and scheduling daemon for arbitration of iperf

tests• Typical Solution

– Run “iperf” or similar tool on two endpoints and hosts on intermediate paths

• Typical road blocks– Need permissions on all systems involved– Need to coordinate testing with others– Need to run software on both sides with specified test parameters

BWCTL: 3-Party Flow Diagram


bwctld(request broker)

bwctld(peer agent)

iperf(test process)

bwctldresource broker

(master daemon)

bwctld(request broker)

bwctld(peer agent)

iperf(test process)

bwctldresource broker

(master daemon)

OWAMP: One-Way Active Measurement Protocol

• What is it?• Measures one-way latency: 1-way ping• Control connection used to broker test request based

upon policy restrictions and available resources. (Bandwidth/disk limits)

• Specification• http://tools.ietf.org/wg/ippm/draft-ietf-ippm-owdp/draft-


OWAMP Flow Diagram




owampd[Resource Broker]


OWD TestEndpoint

OWD TestEndpoint


Thrulay Overview

• Network capacity and delay tester• Same class of tools as iperf, netperf, nettest, nuttcp, ttcp, etc.• Unique features not found in other tools:

– TCP: measures round-trip delay along with goodput– UDP: measures:

• One-way delay, with quantiles• Packet loss• Packet duplication• Reordering

– UDP: ability to send precisely positioned true Poisson streams (microsecond errors in sending times)

– Human and machine-readable (ready to be fed to gnuplot)

Thrulay Update• New release v0.8• Tests with multiple TCP streams• Set DSCP (a.k.a. first 6 bits of the TOS byte)• Report MTU and/or MSS (whichever the OS makes available)• More UDP statistics: duplication, reordering, quantiles of delay• SPARC/Solaris support• Mac OS X support• IPv6 support• Non-busy-waiting UDP mode (less precise, but can run more concurrent tests)• Documentation: manual pages have been added• Basic client authorization based on IP address• Integration of TSC timekeeping projects for faster and more precise timestamping

NDT: Network Diagnostic Tool

• Web100 enhanced server handles testing and diagnostic services• Java based and command line clients allows testing from any client (local or

remote)• Performance and configuration faults reported back to client• Drill-down functions provide more details & error reporting capabilities• Grant from NIH/NLM to explore duplex mismatch detection

NDT Flow Diagram






NDT - Server






Test Engine

Spawn child

Well KnownNDT Server

Web RequestRedirect msgWeb Page Request

Web page response

Test Request

Control Channel

Specific test channels

Bulk Transport

• Build a library / tool for bulk transport that does not require kernel level modifications yet achieves the performance of such

• VFER library– Congestion control hooks– Implements loss-based congestion control– Working on delay-based version

• File transfer utility– An initial version demoed

Everything we work on is available

• Tools are open source, supported, well-documented• BWCTL/Iperf, OWAMP, NDT are deployed across Abilene

backbone and at many partners• You can:

– See ongoing measurement results at the Abilene Observatory

– Test to/from the Abilene backbone

Network Performance Measurement Workshops

– Example Course Materials:• http://e2epi.internet2.edu/npw/presentations.html

Goals:– Grow installed base of BWCTL/Iperf, OWAMP, and NDT at

GigaPoP and regional campuses.• http://e2epi.internet2.edu/pipes/pmp/pmp-dir.html

– Begin integration into IT support processes.– Create an installed base for perfSONAR deployment.– Give each participant tool-specific cookbooks.

Network Performance MeasurementWorkshop Locations and Dates

• Completed– SOX / GaTech (03/05)– CENIC / UCLA (06/05)– JT – Vancouver (07/05)– OARNet / OSU (09/05)– MAGPI / FMM (09/05)– MAX / College Park (12/05)– APAN (01/06)– JT - Albuquerque (02/06)– MERIT (02/06)– Columbia / NYSERNet (04/06)– University of Virginia (04/06)

• Planned– Wisconsin (07/06)

• Under Consideration– Alaska, …