Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork,...
-
Upload
randolf-porter -
Category
Documents
-
view
214 -
download
0
description
Transcript of Mark Leese (Daresbury Laboratory) Paul Mealor (University College London) 1st EGEE Conference Cork,...
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
1st EGEE ConferenceCork, April 2004
Network Monitoring: The GGF Perspective
Mark Leese Paul Mealor
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Contents
Simple really: Use cases - why this is important What GGF is doing
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
The Grid?
Basic Grid principle:
User applications (Grid apps) submit their work to the middleware which selects the “best” resources available to runs the job.
Network performance information is essential...because…
Grid App
Middleware
Grid App
Resource
(SE)
Resource
(CE) Network
Grid App
Uzbekistan CERN
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Resource Brokers (RBs) are responsible for finding the best resource (Computing Element, CE) to be used for a job, e.g.:Run job at B, using copy of data from A, then store results at CAll other things being equal, take into account the data access requirements of the jobOut of the list of CEs capable of running the job, use network cost function to identify the CE with the “best” data access:
Consider “best” combination of data sources and sinks, e.g. IF source data = 10 GB AND resulting data will = 100 GB THEN pick CE based on performance to result storing SE (Storage Element).European Data Grid does something along these lines (Please, no one tell me that this is wrong)
Use Case 1: Resource Selection
Network Cost
FunctionEstimated transfer time
File source & destination
File size
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Use Case 2: Replica Selection
File replication = proven technique for improving data accessSpread multiple copies of same file across the Grid
Do you really want to get everything from CERN, everytime? Do you really want to get everything from your geographically nearest
site everytime?A file has Logical File Name (LFN) which maps to 1 or more PFNs (physicals)Replica Manager should include Replica Selection Service which uses network performance data (from somewhere) to find “best” replica.
5. GridFTP commands
Replica Catalogue
Replica Selection
Grid App
2. Multiple locations (PFNs)
1.LFN
4. Selected replica (PFN)
Net Mon Service
3. Get performance
data/ predictions
GGF looking at formally defining these (and other) use
cases
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Patience ;-) First we must look at web services.Essentially, an online application accessed using XML...…which makes it easier for other apps to use yours……which allows the Grid middleware to access our data
How are GGF addressing problem?
UDDI registry
WSPClient 3. Client requests WSDL doc
4. WSDL tells client how to interact
1. WSP registers service with registry
2. Client locates suitable service using registry
5. Service and client interact using XML messages, sent via SOAP
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
By producing standards relating to network monitoring services.First with the Network Measurements Working Group (NM-WG):
Defining XML schemas for requesting tests and historic data, and publishing network measurements
Aims: to standardise communication, and… …use XML, for web services and OGSI model Simple use case…
All request & result messages can be formatted using standardised schemas = truly powerful combination
How are GGF addressing problem?
Network Monitoring
Servicetest request
(request schema)
tests results
(publication schema)
DANTE, Internet2, SLAC etc. already using NM-WG work.
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Standard measurements? Schemas based on NM-WG proposed measurement classification system:
describes a set of network characteristics and their classification hierarchy used for creating common schemata for describing network monitoring data using a standard classification maximises data portability
description
+
hierarchy
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
So what can you ask for 1?
Initial schema requirements set. Four sections: what, where, when, how
What: Use GGF metric names, e.g. path.delay.oneWay Can request statistical data, with a specified sample interval, e.g.
daily averages for one-way delay over the last month After some “discussion”, multiple statistics in same request Can limit number of returned results to avoid overload
Where: Source and destination Flexible: IPv4|6, hostnames, or textual names such as “core
router” and “edge router” (e.g. for security)
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
When: The primary means of specifying the time period we
are interested in (for tests or data retrieval) is:target Time (an absolute time or “now”)relative +ve and -ve time tolerances…
-ve time tolerance = 600 secstarget_time = 14:00-ve time tolerance = 600 secs
So what can you ask for 2?
= 13:50-14:10
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
So what can you ask for 3?
Setting limit on number of results controls possibilities:when number of results = “all”: supply all matching
measurements in given time periodwhen number of results = 1: time data defines the
period for which a measurement is considered to be acceptable, e.g. 14:00 +/- 10 minutes
Can also give start & end time if you wish, but values are mapped to target_time & number of results will = all
“testing interval” controls how often tests are run
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
So what can you ask for 4096?
How: Can supply values to act as parameters for tests, or
filters for querying past data, including tool name. Uses param specific tags or list of parameters:
<remoteParamList>-a –b 10 -c</remoteParamList>
Possible to set ranges for parameters…<tcpBufferSize range=“max”>4194304</tcpBufferSize><tcpBufferSize range=“min”>1048576</tcpBufferSize>
…and orders of preference. Unspecified params use receiving system’s defaults Can request reporting of actual param values used Can control whether a test is ever run
<tcpBufferSize>4194304</tcpBufferSize> <tcpBufferSize>1048576</tcpBufferSize>
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Is that all GGF is doing?
No, GGF Grid High Performance Networking Research Group also hard at work, modelling the network as a Grid resource so they can perform “advance reservation” etc.
Computing, storage and interconnecting network are all resources: Easier to manage
All can be reservedCapability discoveryExploit commonalities
Forms integrated stack
computing
middleware
Grid applications
network storage
“advance reservation”
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
The network as a resource
To be achieved with set of network sub-services forming holistic network service.
Can't say more as this probably going to change quite a lot.
Want to know more? Then get involved!
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Network monitoring service Historic measurement data Predictions Allow clients to run scheduled tests On-demand (real-time) tests Provide less-frequently monitored information (network route, topology…) Event notifications, for all of the above Across multiple administrative domains for all of the above
NetworkMonitoring
Service
Network domain Y
Grid Middleware
Grid Applications
AutomatedTest
Systems
GOC/NOCAdmin
Software
Grid/NetOperations
OtherNetwork Services
OtherNetwork Services
OtherNetwork Services
Network Monitoring
Service
Network domain Z
Network Monitoring
Service
Network domain X
Diagram shows potential clients:
numerous and varied
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Will this be easy?
Probably not, but like all good car salespeople, I won’t tell you about the problems.
But the potential benefits are worth the effort!
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
Conclusion
Grid network monitoring crucial to the Grid But you all know that already!
GHPN: looking at network services, inc. monitoring service
NM-WG: looking at how to interface to network monitoring services
Ambitious, but potential benefits justify efforts!
JRA4 SHOULD be involved!
Mark Leese (Daresbury Laboratory)Paul Mealor (University College London)
? ? ? ? [email protected] [email protected]
GET INVOLVED!http://www-didc.lbl.gov/NMWG/
http://forge.gridforum.org/projects/ghpn-rg
The End