Multicast troubleshooting with ssmping and asmping Stig Venaas [email protected].

24
Multicast troubleshooting with ssmping and asmping Stig Venaas [email protected]

Transcript of Multicast troubleshooting with ssmping and asmping Stig Venaas [email protected].

Multicast troubleshooting

with ssmping and asmping

Stig Venaas

[email protected]

Introduction

There are very few tools available for end users to verify that multicast works

Also there are few tools to aid network administrators in debugging multicast connectivity problems

We will present two tools called ssmping and asmping that can be of use to both end users and network administrators

We will explain how these tools can be used to test multicast connectivity, and provide much more detailed information that might aid in discovering, debugging and fixing multicast problems

We will first look at ssmping for testing SSM SSM – Source Specific Multicast

And next look at asmping for testing ASM ASM – Any Source Multicast (the traditional multicast model)

ssmping

A tool for testing multicast connectivity and more Behaviour is a bit like the common ping tool Implemented at application layer using UDP

No additional requirements on the operating system The operating system and network must support SSM

A server must run ssmpingd A client pings a server by sending a unicast ssmping

query The server replies with both unicast and multicast

ssmping replies In this way a client can check that it receives SSM

from the server And also parameters like delay, number of router hops etc.

How it worksClient Server

User runsssmping <S>

Client joins S,G

Clients sendsunicast to S

Server receives unicast ssmping query

Responds with ssmpingunicast reply andmulticast reply to G

Client receivesreplies andprints RTT andhops fromserver

Client sends anew query everysecond

t t

Example output

$ ssmping -c 5 -4 flo.nrc.cassmping joined (S,G) = (132.246.2.20,232.43.211.234)pinging S from 158.38.63.20 unicast from 132.246.2.20, seq=1 dist=13 time=122.098 ms unicast from 132.246.2.20, seq=2 dist=13 time=122.314 msmulticast from 132.246.2.20, seq=2 dist=13 time=125.061 ms unicast from 132.246.2.20, seq=3 dist=13 time=122.327 msmulticast from 132.246.2.20, seq=3 dist=13 time=122.345 ms unicast from 132.246.2.20, seq=4 dist=13 time=122.334 msmulticast from 132.246.2.20, seq=4 dist=13 time=122.371 ms unicast from 132.246.2.20, seq=5 dist=13 time=122.360 msmulticast from 132.246.2.20, seq=5 dist=13 time=122.384 ms

--- 132.246.2.20 ssmping statistics ---5 packets transmitted, time 5003 msunicast: 5 packets received, 0% packet loss rtt min/avg/max/std-dev = 122.098/122.286/122.360/0.394 msmulticast: 4 packets received, 0% packet loss since first mc packet (seq 2)

recvd rtt min/avg/max/std-dev = 122.345/123.040/125.061/1.192 ms

What does the output tell us?

13 unicast hops from source, also 13 for multicast Multicast is likely to follow same path as multicast Note that with SSM we immediately join shortest path tree

Multicast RTTs are slightly larger and vary more The difference in unicast and multicast RTT shows one way

difference for unicast and multicast replies, since they are replies to the same request packet

The multicast tree is not ready for first multicast reply, ok for 2nd

No unicast loss, no multicast loss after tree established

Is it useful?

We believe it complements multicast beacons Useful for “end users” or others that want to perform a

“one-shot” test rather than continuously running a beacon

Beacons don’t show how long it takes to establish the multicast tree, the show only the “steady state” We’ve seen cases where it takes much longer than expected

Neither do they compare unicast and multicast Are there other data than RTT and hops that should

be measured? Hops are measured by always using a ttl/hop count of 64

when sending replies

History

Based on an idea by Pavan Namburi, Kamil Sarac University of Texas and Kevin C. Almeroth UCSB http://www.utdallas.edu/~ksarac/research/publications/

draft-sarac-mping-00.txt http://www.utdallas.edu/~ksarac/research/publications/

CIIT04-1.pdf Their idea is to extend IGMP/MLD

Not much interest in IETF so far This does some of the same, but doesn’t require

network support Only uses UDP But it requires server to run ssmpingd

Summary

Tool and further documentation available from http://www.venaas.no/multicast/ssmping/

You can deploy your own server, or check that you can receive from the public servers listed at the above URL

Supports both IPv4 and IPv6 Currently it works for Linux, Solaris and some BSD systems

Note that ssmping client requires SSM support BSD systems need to be patched to support SSM No SSM support yet on Mac OS X

Version with some reduced functionality available for Windows XP IPv4 only, no IPv6 SSM available on XP Does not show number of hops

There is also asmping

asmping is very similar to ssmping asmping is ASM version of ssmping A tool for testing multicast connectivity Behavior is a bit like normal ping A server must run ssmpingd (latest version supports asmping) A client pings a server by sending a unicast asmping query The server replies with both unicast and multicast asmping replies In this way a client can check that it can receive ASM from the

server And also parameters like delay, number of router hops etc.

How it worksClient Server

User runsasmping <G> <S>

Client joins G

Clients sendsunicast to S

Server receives unicast asmping query

Responds with asmpingunicast reply andmulticast reply to G

Client receivesreplies andprints RTT andhops fromserver

Client sends anew query everysecond

t t

Example output

$ asmping -c 5 224.1.2.234 ssmping.net.switch.chssmping joined (S,G) = (130.59.35.130,224.1.2.234)pinging S from 158.38.63.22 unicast from 130.59.35.130, seq=1 dist=11 time=66.203 ms unicast from 130.59.35.130, seq=2 dist=11 time=66.042 msmulticast from 130.59.35.130, seq=2 dist=11 time=66.492 ms unicast from 130.59.35.130, seq=3 dist=11 time=66.515 msmulticast from 130.59.35.130, seq=3 dist=11 time=66.520 ms unicast from 130.59.35.130, seq=4 dist=11 time=66.316 msmulticast from 130.59.35.130, seq=4 dist=11 time=66.321 ms unicast from 130.59.35.130, seq=5 dist=11 time=66.407 msmulticast from 130.59.35.130, seq=5 dist=11 time=66.956 ms

--- 130.59.35.130 ssmping statistics ---5 packets transmitted, time 5000 msunicast: 5 packets received, 0% packet loss rtt min/avg/max/std-dev = 66.042/66.296/66.515/0.326 msmulticast: 4 packets received, 0% packet loss since first mc packet (seq 2) recvd rtt min/avg/max/std-dev = 66.321/66.572/66.956/0.296 ms

What may the output tell us?

The output on the previous slide pretty much tells us the same as in the ssmping example

It appears we got all the multicast packets on the shortest path tree At least all multicast packets have travelled same number of

hops as the unicast packets However, there are several cases where packets may

not be forwarded on the optimal path, and asmping might reveal this

We will now look at the possible ASM forwarding paths when using PIM-SM which is the most common multicast routing protocol

Forwarding via PIM registers

B

E

F

D

C

Source

RP Assume we have the above network with receiver R and source S

R might be running asmping client, and S a server Initially the multicast packets are encapsulated into PIM register

messages and sent to the RP (B) The PIM register is a unicast packet sent from the source’s DR to the RP

From the RP the packets will be forwarded on the so-called shared tree to R

The hop count seen by R will be 5 Routers D, B, C, E and F

Note that when doing inter-domain IPv4 multicast we also use MSDP We can think of MSDP as a way of doing PIM registers from source to all

RPs on the internet. MSDP may do encapsulation just like PIM registers

A

S

R

Native forwarding on shared tree

B

E

F

D

C

Source

RP When the RP receives packets via PIM registers and know that there

are members joined to the group, the RP will join towards the source and we will get native forwarding

Here the hop count seen by R will be 6 Routers D, A, B, C, E and F

A

S

R

Shortest Path Tree

B

E

F

D

C

Source

RP When F receives the first multicast packet, it will usually join the

Shortest Path Tree to get optimal forwarding Note that behaviour might depend on configuration and vendor

We then get the above optimal path Here the hop count seen by R will be 3

Routers D, E and F

So with this topology, the hop count displayed by asmping will tell us which path we are following and when we switch between them 5, 6 and 3 hops for the resp. paths

A

S

R

More interesting example

sv@xiang /tmp $ asmping 224.3.4.234 ssmping.uninett.nossmping joined (S,G) = (158.38.63.22,224.3.4.234)pinging S from 152.78.64.13 unicast from 158.38.63.22, seq=1 dist=23 time=57.261 ms unicast from 158.38.63.22, seq=2 dist=23 time=56.032 msmulticast from 158.38.63.22, seq=2 dist=7 time=207.876 msmulticast from 158.38.63.22, seq=2 dist=7 time=208.567 ms (DUP!) unicast from 158.38.63.22, seq=3 dist=23 time=56.852 msmulticast from 158.38.63.22, seq=3 dist=21 time=70.352 msmulticast from 158.38.63.22, seq=4 dist=21 time=57.208 ms unicast from 158.38.63.22, seq=4 dist=23 time=57.910 ms unicast from 158.38.63.22, seq=5 dist=23 time=56.206 msmulticast from 158.38.63.22, seq=5 dist=21 time=57.375 ms

What does output tell us?

23 unicast hops from source For multicast we first have 7, later stays at 21 We also get some info about loss and RTTs

Number of hops is perhaps the most interesting In the stable situation with 21 there is probably native forwarding all

the way Forwarding is probably on shortest path tree from source to receiver In theory we might also detect switch from RPT to SPT if number of

hops are different Number of hops on RPT are then probably larger than number of

unicast hops But why only 7 hops initially? Why did we get the packet twice, and why do both have a very large

ttl?

The mystery of the 7 hops

So why only 7, and why large RTT and duplicate? The reason is MSDP and PIM registers In this case we know for sure that packets must have

been encapsulated in MSDP We are about 5 hops from the local RP There should not be any other tunnels, so no other way we

can have only 7 hops (23 unicast hops) Assuming there are no other listeners, the packet was

first encapsulated in a PIM register going to the local RP. Then it was encapsulated in MSDP all the way to our local RP

So that explains the number of hops, but why large RTT and duplicate?

Large RTT and duplicate

So why the large RTT and duplicate? The reason is again MSDP

At least that’s the theory The register packets are in this case passed with MSDP to

the RP, and probably cached there When our (*,G)-joins reach our RP, we receive the packet

from the cache, which by now is 200ms old Next, our last-hop router switches to SPT, and what we

think happens, is that the (S,G)-join also reaches the RP (RP is on the SPT path), and the RP again forwards its copy when this happens

This also gives us a measure of how long it took before the RPT to SPT switch took place

Troubleshooting 1/2

If it takes a long time from we join until we receive data or we stop receiving after a while, then something is wrong

Initial delay might be due to join packets getting lost, or join and MSDP propagation delays at routers

We will now give further examples of how the tools have been used for troubleshooting

Case 1, multicast working initially, then stopping after about 180s This was an IGMP issue Host sends IGMP report when start pinging If host doesn’t receive the periodic IGMP queries from

the router, the router state will time out

Troubleshooting 2/2

Case 2, multicast working initially, then stopping after about 210s We observed this for several different groups and

sources We then found from PIM-SM specification that this is

when the initial join state times out if the periodic PIM joins don’t refresh it

Knowing this, we tracked down the problem to an MTU mismatch

When PIM routers send periodic joins they may aggregate many joins, resulting in large PIM packets For the initial join there is usually no aggregation, so packet much

smaller than MTUs

Historical data

Started work on system for collecting results from periodic measurements

Summary

The tools might be used to simply check connectivity, but also get info like RTT and hops, join latency etc

With asmping we might be also able to derive some more interesting information due to the complexities of PIM register, MSDP and switching of forwarding paths

asmping works on most operating systems Both tools support both IPv4 and IPv6 See http://www.venaas.no/multicast/ssmping/ for more

info