1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA...
-
Upload
jacob-barton -
Category
Documents
-
view
214 -
download
1
Transcript of 1 The Research on Analyzing Time- Series Data and Anomaly Detection in Internet Flow Yoshiaki HARADA...
1
The Research on Analyzing Time-Series Data and Anomaly Detection in Internet Flow
Yoshiaki HARADA
Graduate School of Information Science and Electrical Engineering (ISEE)
Kyushu University
2
Contents
Background Purpose Background Knowledge
AS and Internet routing Property of Internet Flow
Analysis method Progress of this research Conclusion and Future Work
3
Background
Internet is growing as a Global Information Infrastructure
always-on connection by laptop PC, cellular, etc. many service as music and video delivery distance medicine and learning
reliable Internet system are required
We should grasp tendency of flows in Internet to manage reliable Internet infrastructure
4
Background
It is difficult to grasp the tendency of Internet flows Amount of flow are increasing with development of
Internet A lot of Garbage such as DDos Attack and illegal
accesses are flows in Internet. Physical hazard such as electrical power failure and
router failure Expert engineers are requires to manage Internet
system It take a great deal of time and effort
5
Purpose
It is required that the method to detecting anomaly and tendency in Internet flow automatically There are many research of macro analyzing research in Internet
flow It is difficult to grasp detail bias and anomaly because Internet
flow are complicated
I suggest that micro analyzing method by segment Network Flows in port number, AS number ,area information and country etc.
I can analyze Flow Data in detail The drop of false alarm can give reduce managing cost
I suggest that detecting anomaly in Network traffic, and visualize
6
Background knowledge AS(Autonomous system)
Collection of IP networks and routers under the control of one entity (or sometimes more) that presents a common routing policy to the Internet.
An Internet Service Provider (ISP) A very large organization
AS numbers are currently 16-bit integers, which allow for a maximum of 65536 assignments.
AS:1 AS:2
AS:3
AS:4
Router
7
BGP table BGP
BGP is the core routing protocol in Internet It works by maintaining a table of IP networks or 'prefixes'
which designate network reachability among autonomous systems (AS).
We find out the destination AS number by referring to the prefix
Network Next Hop Metric LocPrf Weight Path*>i3.0.0.0 210.138.15.145 300 0 2497 2497 701 703 80 i*>i4.0.0.0 210.138.15.145 300 0 2497 2497 3356 i*>i4.23.112.0/22 210.138.15.145 300 0 2497 2497 174 21889 i*>i4.23.180.0/24 210.138.15.145 300 0 2497 2497 701 6128 30576 i
reachable prefix (IP address)
destination AS number
8
Flow-Data
Flow-Data is the collection of unidirectional packets which used in same
application is exported by router include the information that source (destination) IP address, port
number, number of packet, etc. are enormous quantity, so we use sampling data
The example of Flow Data (of Kyushu University)
9
Analysis method
We propose that hierarchically building of database to enhance scalability
I export Flow Data and BGP routing information maintained in server, and calculate AS number from Flow Data.I make database which include necessary data (AS number, port number, number of packets, etc..).
I categorize database as country, area, and port number. I sort database and calculate correlation for each data which we want to see tendency.
I refer to the categorized database, and visualize.I calculated the database and detect anomaly.
analyzing trafficcategorize
visualizeanomaly detection
10
Analysis method – BGP table and Flow Data
I use the collecting BGP table exported from QGPOP and the collecting Flow Data exported from Kyushu University
Flow Data I analyze the sampled day’s data which is collected at 0-5
minutes in every hour Sampling rate is 10%
KOREN
SINET
QGPOPInformation communication network dedicated to academic research
Korea Advanced Research Network
BGP table
IIJInternet Initiative Japan
Kyushu University
Universities
Researchinstitutes
Universities and research institutes
Flow Data
11
Analysis method 1
Detailed Analysis and Categorize I assign AS number to IP address with reference
BGP table and Flow Data. I categorize Flow Data as port number
(communicative purpose), country, area information (Asia, Europe, etc.).
I analyze the distribution of the port number in each country. The distribution of port number may be nonbiased in the
countries which frequently accesses with illegal port number illegal accesses use various (random) port number.
12
Time change of number of flows in Asia
Almost of traffic flew with Japan, and number of flows in Japan is increasing for a year.
This figure shows time change of number of flows of top 5 country in decreasing order of amount
13
Time change of number of flows in Asia
This figure shows time change of number of flows of top 4 country in decreasing order of amount, except Japan.
The number of flows in China is increasing for a year.
14
Analyzing distribution of port number I analyze the distribution of port number used with port 53 flows. I analyze the destination of port number accessed by the
host which accessed the DNS server The host is determined by the IP address on Flow Data
port:53
port:??
port:??
port:XX
DNS serverhost
database
port
number
20 22 25 53 80 443 well – known
registrated
private and
dynamic
2007/0104
504 76 21757 27179 25066 1294 51077 15011 3519
・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・
15
The distribution of port 53 flows and port 25 flows
2007/01/04 ~ 02/22every Wednesday’s Flow data(every one hours)
Horizontal axis show the number of flows in port 25Vertical axis show the number of flows in port 53
The number of port 53 flows is increasing with the number of port 25 flows (positive correlation)
16
Analysis method 2
Anomaly detection We handle the database compiled from Flow Data
We smooth the database to make data visualizing easier by adopting exponential smoothing method
Flow Data have periodicity (daily, or weekly), so we use Holt-Winters method
17
Anomaly detection
Data smoothing When I analyze long term in Flow Data, I use Exponentially Weighted
Moving Average (EWMA) method. applies weighting factors which decrease exponentially. The weighting for each older data point decreases exponentially
Flow Data have periodicity property, so we adopt Holt-Winters method in short term analysis. Holt-Winters method is expanded EWMA method for the periodicity data
Yt+1 = at + bt + ct+1-m
Yi = α * Yi - 1 + ( 1 - α ) * Yi - 1
at = α( Yt + ct-m ) + ( 1 - α)( at-1 + bt-1 )
bt = β( at - at ) + ( 1 - β) bt-1
ct = γ( Yt - at ) + ( 1 - γ) ct-m
18
Anomaly detection
I smooth Flow Data by using EWMA or Holt-Winters method, and calculate threshold. When the value exceed the threshold, I consider this point
as anomaly
0 time
Num
ber of flow
s
1 cycle (one day)anomalyhigh threshold
level
low threshold level
threshold area
19
Visualization
I develop the tool which detect anomaly and visualize The tool should analyze only specific Flow Data which
is selected by user (port number, country etc.) In Internet traffic, there are communication data which have
large amount of packets, such as port 8000 (DVTS)
We want to grasp the tendency not only All Flow Data but also the Flow Data restricted to certain country, AS or port number. It should be versatile tool.
20
Conclusion and future work
Implementation of analyzing Flow Data The program that categorize Flow Data as country, AS
number, and port number are completed I will develop the program to find out the correlation
between each port number. Anomaly detection and visualization
I smooth the Database made by analyzing program, and calculate the threshold and detect anomaly in Flow Data
I develop the tool to visualize not only all data and anomaly, but also the data which is selected by user.
I conduct verification experiment for Flow Data include electrical power failure.