Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue...
-
Upload
dulcie-hunt -
Category
Documents
-
view
215 -
download
3
Transcript of Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue...
Metadata Management of Terabyte Datasets from an IP
Backbone Network: Experience and Challenges
Sue B. Moon and Timothy Roscoe
5/25/2001 NRDM 2001 2
Overview
• Sprint IP Monitoring Project• Types of Data• Types of Analysis• Experience and Challenges• Metadata Abstractions and Model• Design and Implementation
5/25/2001 NRDM 2001 3
Sprint IP Monitoring Project
• Design Goal: to acquire data without sampling or insufficient accuracy.
• System Components:– Linux PC with 3 PCI buses and 100GB– DAG card with OC3 to OC48 support
and GPS.– SAN-based analysis platform– Data repository
5/25/2001 NRDM 2001 4
Configuration at Monitored PoP
customer
5/25/2001 NRDM 2001 5
Analysis Platform and Data Repository at Sprint ATL
5/25/2001 NRDM 2001 6
Types of Collected Data
• Packet trace of 50 to 100GB– 44 byte packet header + 12 byte
framing info per packet
• BGP routing tables• IS-IS tables• PoP configuration (topology)
5/25/2001 NRDM 2001 7
Types of Analysis
• Simple statistics gathering• Isolation of TCP flows• Trace correlation• Generation of traffic matrices
5/25/2001 NRDM 2001 8
Challenges
• Total amount of data > 10 TB– What to keep on-line and off-line
• Sharing data and results– What has been computed/generated
• Correlating different types of data– E.g. packet traces with routing tables
• Determining s/w dependency• Reproducibility of results
5/25/2001 NRDM 2001 9
Task Abstraction
• Storage of data– Ad-hoc solution: disk arrays, SAN,
tape library
• Source code maintenance– CVS
• Metadata management– Our focus in this work
5/25/2001 NRDM 2001 10
Metadata Abstraction
• Raw input data sets• Result data sets• Analysis programs
– Versions of s/w
• Analysis operations– between data sets and programs
5/25/2001 NRDM 2001 11
Design and Implementation
• Dependency graph in relational database schema => RDBMS
• Interaction with version control– S/W major release
• Linkage to data storage system– Make raw data set self-describing– Metadata independent of data location
• User interface– Browsing DB thru GUI and capturing analysis
operations by simple command scripts.
5/25/2001 NRDM 2001 12
Conclusion and Future Work
• Flexible and minimally intrusive• Extensions:
– Automatic storage management– Result caching– Job scheduling– Automation of analysis
• Will results be easily reproducible?• Will users adapt to the new
discipline?