Supercomputing Center
Measurement and Performance Analysis ofSupercomputing Traffic by FlowScan+ 2.0
Supercomputing Center of KISTI
Kookhan KimAugust 28, 2003
Supercomputing Center
2
Contents
• Introduction• FlowScan• FlowScan+ 2.0• Traffic Measurement & Analysis• Others
Supercomputing Center
3
Introduction
• We have various types supercomputers– NEC, IBM, Compaq, PC cluster
• Supercomputing traffics– All traffics to calculate many kinds of data, which is generated bet
ween supercomputers and every users• Users would have authenticated and authorized ID
• Until now, we did’t try to measure supercomputing traffic and analyze them yet
• We want to know the characteristics of supercomputing traffics– who use it?– what applications & protocols used?– how much amount traffic generated?
• To meet these demands, we improved FlowScan
Supercomputing Center
4
What is FlowScan?
• FlowScan is a passive measurement tool drawing traffic graphs by analyzing network flows exported by routers and switches– NetFlow is exported CISCO routers and switches
• It was developed by Dave Plonka and managed by CAIDA (http://www.caida.org)
• Main modules - Perl scripts – cflowd (a flow collection engine)– flowscan (central process in the system)
• Our improvement focuses on this module– RRDtool (a visualization tool)
• Definition : Flow– An IP flow is a unidirectional series of IP packets of a
given protocol, travelling between a source and destination, within a certain period of time.
Supercomputing Center
5
Enhanced FlowScan+
• The goal– Make a good passive measurement tool
• The Motivations– Lack of traffic measurement tool that supports real time visu
alization and detailed traffic analysese on demand– To make user friendly tool, it can help everyone easy to use
• Why FlowScan?– An open source program– It has good graphing function on the web– But yet it does not support query interface
• Who is involved?– Supercomputing Center of KISTI – System Architecture Lab., Dept. of Computer Science, KAIST
Supercomputing Center
6
Flowscan
Flow-ToolsRRD
Staticgraph
DB Aggregation(15 min)
Dynamicgraph
LinkQuery
NetFlow v7 FlowScan Original Module
Analysis Module (FlowScan+ 1.0)
VisualizationModule
(FlowScan+ 2.0)
ParsedData
FlowScan+ 2.0
Supercomputing Center
7
FlowScan+ Main Point
• FlowScan+ 1.0– Use MySQL
• Store NetFlow Information into DB
– Rawflows– Aggregated data
– Query interface• Access to the DB• By Web• Easy to use
• FlowScan+ 2.0– Flow-tools
• NetFlow version problem
– User Group Edit• Small group, large group • Divided by IP Class
– Visualization of DB query result
• JAVA Servlet, jfreechart
Supercomputing Center
8
FlowScan+ 2.0 : NetFlow Versions
NetFlow Version
Comments
1 Original
5 Standard and most common
7 Specific to Cisco Catalyst 6500 and 7600 Series Switches Similar to Version 5, but does not include AS, interface, TCP Flag & TOS information
8 Choice of eleven aggregation schemesReduces resource usage
9 Flexible, extensible file export format to enable easier support of additional fields & technologies; coming out now MPLS, Multicast, & BGP Next Hop
Supercomputing Center
9
FlowScan+ 2.0 : Flow-tools
• NetFlow v5 & v7 have different PDU formats and do not correspond with including information
• Cflowd, main NetFlow collection module in the FlowScan, cannot collect NetFlow v7
• We have to change NetFlow capture module
• Flow-tools replace cflowd as NetFlow v7 collection modules
NetFlow v5 NetFlow v7
FLOW index: 0xc7ffff router: 134.75.20.70 src IP: 128.253.253.59 dst IP: 210.98.25.11 input ifIndex: 60 output ifIndex: 14 src port: 445 dst port: 2979 pkts: 6 bytes: 744 IP nexthop: 134.75.20.3 start time: Thu May 15 15:10:47 2003 end time: Thu May 15 15:10:51 2003 protocol: 6 tos: 0x0 src AS: 17579 dst AS: 17579 src masklen: 16 dst masklen: 19 TCP flags: 0x1b (PUSH|SYN|FIN|ACK) engine type: 1 engine id: 10
FLOW index: 0xc7ffff router: 150.183.5.251 src IP: 150.183.5.194 dst IP: 150.183.138.216 input ifIndex: 0 output ifIndex: 0 src port: 80 dst port: 3215 pkts: 6 bytes: 497 IP nexthop: 0.0.0.0 start time: Mon May 12 18:41:34 2003 end time: Mon May 12 18:41:34 2003 protocol: 6 tos: 0x0 src AS: 0 dst AS: 0 src masklen: 0 dst masklen: 0 TCP flags: 0x0 engine type: 0 engine id: 0
Supercomputing Center
10
FlowScan+ 2.0 : User Grouping
• There is no way to veryfy user(id) of the Supercomputer– The user-related information is only IP address in the NetFlow– By this information, we can consider that “who is generating traffic
user” • If users always connect the supercomputer with same syst
em, they have the same source/dest IP : it is no problem• But they can log in with other systems in the same office o
r same building– So we takes a user grouping concept– If completely different place log in, it is impossible
analysis user(id) from NetFlow• Except from this siuation, we can verify supercomputing user with netw
ork IP of NetFlow
Supercomputing Center
11
FlowScan+ 2.0 : User Grouping
Group name group numberGroup ID user ID or related informationWe have classified only C class IP
- If one has many user ids - When we compare the traffic of a number of institutes with each others- We should aggregate its total traffics- Large grouping
Supercomputing Center
12
FlowScan+ 2.0 : Visualization
• In FlowScan+, improved by adding MySQL, has free DBMS based on the query interface to get flow information
• But results of query are text based information – difficulties to intuitive und
erstand– It cannot display result plo
t as time sereis• To support this, FlowScan+
2.0 takes a visualization servlet
Supercomputing Center
13
FlowScan+ 2.0 : Visualization
Visualization process & graph
- The text result is only way that we can see the result of query interface until now- If we want to see the result of graphical plot as time passed- FlowScan+ 2.0 makes one more query into DB
Supercomputing Center
14
Traffic Measurement topology
Ruby-8/80Catayst6506
BaramTigerKordicKfddi2Lion
Cisco7513Cisco7513C6506
SUPER COMPUTERS
H-NFSH-NFS
SiSi SiSi
SiSi
H-Opal H-Ruby
IBM
NEC
COMPAQ
FlowScan+ 2.0
PC Cluster
C6506
Ruby-8/80Catayst6506
NetFlow v7 export
• Our supercomputer is linked mesh type with 2 catalyst 6500 series switches
• NetFlow v7 export• Drawing graph every 5m
in.• Storing aggregated data
& rawflows into BD every 15min.
Supercomputing Center
15
Top user (by Institute)
Institute Bytes (MB) %
KMA 48,135 47.22%
Seoul National Univ.
11,433 11.22%
KISTI 10,319 10.12%
Air Force 9,609 9.43%
KAIST 3,713 3.64%
Yonsei Univ. 2,912 2.86%
ETS soft 1,063 1.04%
Kyunhee Univ. 451 0.44%
Choongnam Univ. 416 0.41%
Pusan National Univ. 415 0.41%
FlowScan+ 2.0 – traffic analysis
(2003/July/21 14:00 ~ /28 14:00)
- 1 week measurement traffic- It is analyzed by large group- The pie graph draws again by the Excel sheets
Supercomputing Center
16
Application
Service Bytes (MB) %
http 547,902 47.34%
ftp 491,691 42.48%
unknown 115,319 9.96%
telnet 2,216 0.19%
domain 273 0.02%
FlowScan+ 2.0 – traffic analysis
(2003/July/21 14:00 ~ /28 14:00)
• It shows a strange result, we cannot expect• We want to know the cooupied portion by various applications
– Involved in bio, physics, aerospace, chemistry and so on.• But those are operated in the supercomputer
– Those applications are installed in the supercomputers– Users log in the supercomputer by telnet and ftp– Transfer theirs data & Operate application from remote sites
Supercomputing Center
17
Other usage of FlowScan+ 2.0
• Detection of Network abnormalities– Port scanning– Cord Red virus– NIMDA virus
• Mass mailing worm component– DDoS attack
• Some features between flow and traffic amount
• Byte : normal size traffic• Flow : explosive increase
• Detection of emerging new applications– GRID applications, P2P applications and so
on – If we should match new emerge application
with defined its port number• Decrease unknown traffic portion
Supercomputing Center
18
FlowScan+ of KISTI
Supercomputing Center
19
Conclusions
• FlowScan+ developed by KISTI & KAIST• Characteristics of FlowScan+ 2.0
– Flow-tools• NetFlow version problem.
– Group edit• It can be measure & analysis of traffics by each users
– Visualization of results• It makes graphical plot as time serise.
• Future Works– DB optimization to speed up– Installation packaging– More stability of flowscan– Aggregate merits of each versions
Supercomputing Center
20
Thank you for your attention
Questions ?
Top Related