Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue...

Metadata Management of Terabyte Datasets from an IP

Backbone Network: Experience and Challenges

Sue B. Moon and Timothy Roscoe

5/25/2001 NRDM 2001 2

Overview

• Sprint IP Monitoring Project• Types of Data• Types of Analysis• Experience and Challenges• Metadata Abstractions and Model• Design and Implementation

5/25/2001 NRDM 2001 3

Sprint IP Monitoring Project

• Design Goal: to acquire data without sampling or insufficient accuracy.

• System Components:– Linux PC with 3 PCI buses and 100GB– DAG card with OC3 to OC48 support

and GPS.– SAN-based analysis platform– Data repository

5/25/2001 NRDM 2001 4

Configuration at Monitored PoP

customer

5/25/2001 NRDM 2001 5

Analysis Platform and Data Repository at Sprint ATL

5/25/2001 NRDM 2001 6

Types of Collected Data

• Packet trace of 50 to 100GB– 44 byte packet header + 12 byte

framing info per packet

• BGP routing tables• IS-IS tables• PoP configuration (topology)

5/25/2001 NRDM 2001 7

Types of Analysis

• Simple statistics gathering• Isolation of TCP flows• Trace correlation• Generation of traffic matrices

5/25/2001 NRDM 2001 8

Challenges

• Total amount of data > 10 TB– What to keep on-line and off-line

• Sharing data and results– What has been computed/generated

• Correlating different types of data– E.g. packet traces with routing tables

• Determining s/w dependency• Reproducibility of results

5/25/2001 NRDM 2001 9

Task Abstraction

• Storage of data– Ad-hoc solution: disk arrays, SAN,

tape library

• Source code maintenance– CVS

• Metadata management– Our focus in this work

5/25/2001 NRDM 2001 10

Metadata Abstraction

• Raw input data sets• Result data sets• Analysis programs

– Versions of s/w

• Analysis operations– between data sets and programs

5/25/2001 NRDM 2001 11

Design and Implementation

• Dependency graph in relational database schema => RDBMS

• Interaction with version control– S/W major release

• Linkage to data storage system– Make raw data set self-describing– Metadata independent of data location

• User interface– Browsing DB thru GUI and capturing analysis

operations by simple command scripts.

5/25/2001 NRDM 2001 12

Conclusion and Future Work

• Flexible and minimally intrusive• Extensions:

– Automatic storage management– Result caching– Job scheduling– Automation of analysis

• Will results be easily reproducible?• Will users adapt to the new

discipline?

Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue...

Documents

Transcript of Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue...