3 April 2010 The RIPE NCC Internet Measurement Data Repository Shane Alcock.
-
Upload
esmond-ross -
Category
Documents
-
view
213 -
download
0
Transcript of 3 April 2010 The RIPE NCC Internet Measurement Data Repository Shane Alcock.
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 2
Introductions
• Research Programmer with WAND
• NOT affiliated with RIPE NCC, just speaking on their behalf
• Passive measurement
• Organise packet trace captures
• Maintainer of the WITS website
• Experienced in dealing with measurement data sets
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 3
Outline
• Sharing Internet datasets
• Challenges
• Case studies
• The RIPE NCC repository
• Available datasets
• Other RIPE datasets that may be added
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 4
Sharing Measurement Data
• Internet measurement research requires data
• Often it is difficult to collect suitable data
• Privacy
• Security
• Cost of infrastructure
• Selecting appropriate times and locations
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 5
Sharing Measurement Data
• Sharing data with the community is an awesome idea
• Saves time and effort
• Promotes collaboration
• Enables validation of previous results
• Encourages others to share their data as well
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 6
Sharing Measurement Data
• WITS – Waikato Internet Traffic Storage
• http://www.wand.net.nz/wits
• CAIDA
• http://www.caida.org/data/
• PREDICT
• https://www.predict.org/
• CRAWDAD
• http://crawdad.cs.dartmouth.edu/data.php
• NLANR
• No longer exists :(
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 7
Challenges
• Community awareness
• Datasets are scattered amongst multiple hosts
• Lack of publicity and detailed information about datasets
• Meta-data
• DatCat (CAIDA)
• http://www.datcat.org
• Catalogue of publicly available datasets
• Not an actual repository – data is hosted externally
• Not a comprehensive resource
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 8
Challenges
• Repositories often maintained by research groups
• Limited funding, therefore limited resources
• People
• Expertise
• Disk space
• Bandwidth
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 9
Case Study: WITS
• Maintenance is intermittent
• Maintainer has many other responsibilities
• Disk space is a huge limitation
• No room on the FTP server to put new data sets
• Adding new disks costs both money and time
• Sanitizing datasets requires even more space as we must retain the original version as well
• Bandwidth
• Cost of commercial bandwidth hinders availability of data
• Enable access via KAREN (NZ national research network) only
• Fortunately, KAREN peers with many international NRENs
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 10
Challenges
• Permanence
• Research groups typically depend on competitive funding
• Funding runs out – repository vanishes
• Loss of data is a major issue
• No longer able to replicate and validate previous studies
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 11
Case Study: NLANR
• Large public archive of measurement data
• Auckland, Abilene traces (PMA)
• AMP
• US government ceased funding
• Repository no longer maintained
• Domain eventually expired
• CAIDA and WAND salvaged the data
• Traces now available on WITS
• Without intervention, the data could easily have been lost permanently
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 12
Challenges
• Avoiding inappropriate disclosure
• Anonymisation of sensitive information, e.g. IP addresses
• Developing policy to cover user access and agreements
• Many datasets have unique restrictions or policies
• Policy that is appropriate for one dataset is not for another
• Personal contact information
• IP addresses
• User payload in packet traces
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 13
Challenges
• Communication with users
• Data sharing is often not top priority for collectors
• Collection designed to suit their purposes
• Small changes to the collection process can often make the data more useful to a wider audience
• Encourage users to engage with collectors
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 14
Challenges
• Support
• Measurement data is complicated to deal with
• Steep learning curve
• Formats, e.g. PCAP vs ERF vs legacy DAG formats for traces
• Tools / Processing libraries
• Timezones
• Documentation of shared datasets is often poor
• User support is intermittent, due to lack of resources again
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 15
Challenges
• Size
• Internet measurement datasets are huge
• Push modern storage technologies to the limit
• Server hosting and maintenance
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 16
The RIPE NCC Repository
• RIPE NCC collects a lot of measurement data already
• They want to share this data with the community
• Most is already available through various repositories
• Develop a single common and consistent platform
• Hosting
• Browsing
• Accessing and downloading data
• Open to other collectors who wish to share data
• Still under development
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 17
Hardware
• 2 servers – Master and back-up
• Size: 9U
• Disk: 48x 2TB on 2 controllers – 2 cold spares
• CPU: 2x Quad core Xeon L5420 2.5GHz
• Memory: 32GB
• Chassis: Chenbro RM91250
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 19
Features of the RIPE NCC Repository
• Longevity
• RIPE NCC does not depend on competitive research funding
• Generating and keeping Internet measurement data for ~20 years
• Long time-series data
• Much less likely that the repository will disappear
• Emphasis on mirroring rather than replacing other repositories
• Host anonymized versions of data
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 20
Features of the RIPE NCC Repository
• Resources
• RIPE NCC manages servers, infrastructure
• Larger repository can justify a dedicated support staff
• Experience and expertise are important
• Diversity
• Variety of datasets from different collectors
• Increased awareness of new datasets
• One user account can access many different datasets
• Self sign-up for “basic access”
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 21
Features of the RIPE NCC Repository
• Communication
• Bridge the gap between data collectors and users
• Raise awareness of existing data
• Gather feedback from the user community
• Develop relationships with other data collectors
• Links to useful tools and libraries for processing data
• Share expertise as well as data
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 22
Available Datasets
• Data collected by RIPE NCC
• RIS routing database
• Reverse DNS delegations made by RIRs
• Data from external sources
• WITS
• Ex-NLANR data
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 23
Routing Information Server (RIS)
• 16 route collectors peering with 600 BGP routers
• Mostly within the RIPE region
• ~100 peers provide complete routing tables
• Routes are collected and published in MRT format
• Updates every 5 minutes
• Full table dump every 8 hours
• All data collected since 2000 has been retained
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 24
Routing Information Server (RIS)
• Other methods of access
• Last 3 months of data exported to MySQL database
• Weekly statistical reports
• Looking Glass queries
• Tools to query and visualise RIS data
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 25
Reverse DNS Zones
• (Partial) Reverse DNS delegations made by RIRs
• Generated using RIPE DB reverse DNS objects
• ~410,000 reverse DNS objects
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 26
Auckland
• Passive traces taken at the University of Auckland
• Auckland II – VII were previously available through NLANR
• Frequently feature in measurement literature
• Currently available from WITS archive
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 27
Waikato
• Passive traces taken at the University of Waikato
• Long duration continuous traces
• Waikato I is available
• Other Waikato sets will be included at a later date
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 28
NLANR
• Other NLANR datasets that were preserved by WAND
• IPLS (also known as Abilene)
• Leipzig
• Active Measurement Project (AMP)
• Much of this data is also currently available from WITS
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 29
Other Datasets
• Collected by RIPE NCC
• Not currently in the repository but may be added later
• K-root and reverse DNS server statistics and traces
• Hostcount
• TTM
• DNSMON
• AS112
• Other parts of RIPE DB
• These are covered in more detail in the paper
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 30
K-root
• Internet root name service operated by RIPE NCC
• PCAP traces of incoming port 53 traffic (DNS queries)
• 50 hours of traces included in CAIDA's DITL project
• DNS Statistics Collector (DSC)
• Summarises DNS traffic into 1 minute bins
• Generate graphs shown on the K-root website
• Raw data exported to DNS-OARC
• SNMP statistics
• Originate from RIPE NCC in Amsterdam
• Summarised and exported to an RRD
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 31
Reverse DNS
• 4 reverse DNS servers operated by RIPE NCC
• 50,000 queries per second (3x load of K-root)
• High query rate means regular trace collection is infeasible
• DSC used on each of the rDNS servers
• Raw data and graphs only available within RIPE NCC
• Could be made available if there was a need
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 32
AS112
• AS number for RFC 1918 private address space
• http://public.as112.net/
• Dynamic DNS update and rDNS server for AS112
• Hosted by RIPE NCC
• Goal is to measure and analyse DNS updates for invalid addresses
• PCAP trace collected annually and contributed to DITL
• More frequent captures could be scheduled if needed
• DSC data also collected
• Graphs publicly available from RIPE NCC AS112 site
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 33
Hostcount
• Monthly DNS scan of ~100 TLDs within the RIPE region
• Count A and PTR records for both forward and reverse Ipv4
• Also count forward AAAA for IPv6 addresses
• Not exhaustive, due to public zone transfers being disabled
• Statistics published via Hostcount website
• Raw data from 1990-2007 is archived off-line
• Current policy is to discard raw data after statistic extraction
• But this could be reversed if there is a need
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 34
Test Traffic Measurements (TTM)
• Active measurement system of ~100 probes
• Most probes located at ISPs and universities within Europe
• Not all are included in public measurements
• Regular series of active tests
• UDP one-way delay, traceroute, DNSMON, IPv6 PMTU
• Also supports ad-hoc measurements by authorised users
• Ping, HTTP page fetch
• Can also develop and run arbitrary tests
• Results not released outside of RIPE NCC
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 35
Test Traffic Measurements (TTM)
• Bulk data published using CERN ROOT
• Performance graphs on the TTM website
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 36
DNSMON
• Measures the reachability and latency of DNS
• Collected using 60 TTM probes
• Root domain, .com, .net, .org, e164.arpa, 24 CC-TLDs measured
• IPv4 and IPv6 performance measured
• Summary statistics and graphs are publicly available
• Only paying subscribers can access most recent graphs
• Raw data also available upon request
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 37
RIPE DB
• Internet number registration objects for the RIPE region
• IP addresses and AS numbers
• Reverse DNS objects
• Used to create zone files for the reverse DNS service
• Route registry objects
• Used to provide an Internet Routing Registry
• Conforms to RPSL and RFC 2650
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 38
RIPE DB
• Public queries supported via command-line and web
• Daily limit imposed on queries that include personal info
• Bulk data is available via FTP
• Personal details are not included
• Can subscribe to a near real-time mirror of the database
• Restrictions on personal data are very broad
• Can result in inappropriate limitations
• Better access policies and mechanisms should resolve this
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 39
Links
RIS http://www.ripe.net/ris
RIPE DB http://www.ripe.net/db
K-root http://k.root-servers.org
TTM http://www.ripe.net/ttm
Hostcount http://www.ripe.net/is/hostcount/stats
DNSMON http://dnsmon.ripe.net/dns-servmon
AS112 http://www.ripe.net/as112
WITS http://www.wand.net.nz/wits
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 40
Conclusion
• Repository is a 'beta'
• Server exists and some datasets are available for download
• Interested users can be given access
• Looking for feedback and ideas
• Development of policy, particularly for access
• Data collection
• Improving the RIPE datasets to be more useful to researchers
• Acquiring more external datasets
• Contributions of data, analysis tools
© THE UNIVERSITY OF WAIKATO • TE WHARE WANANGA O WAIKATO 41
Contact
http://data-repository.ripe.net