A Longitudinal Characterization of Local and Global BitTorrent

53
March 14, 2012 A Longitudinal Characterization of Local and Global BitTorrent Workload Dynamics Niklas Carlsson Linköping University György Dan KTH Royal Institute of Technology Anirban Mahanti NICTA Martin Arlitt HP Labs and University of Calgary

Transcript of A Longitudinal Characterization of Local and Global BitTorrent

Page 1: A Longitudinal Characterization of Local and Global BitTorrent

March 14, 2012

A Longitudinal Characterization

of Local and Global BitTorrent

Workload Dynamics

Niklas Carlsson Linköping University

György Dan KTH Royal Institute of Technology

Anirban Mahanti NICTA

Martin Arlitt HP Labs and University of Calgary

Page 2: A Longitudinal Characterization of Local and Global BitTorrent

Motivation

Use of Internet for content delivery is massive

… and becoming more so

How to make scalable and efficient?

Server-based and peer-to-peer

Chunk-based approach proven scalable

Files split into smaller chunks

Clients can download from both servers and other

clients (peers)

How to best manage large-scale content

replication systems

E.g., where to place chunks?

Must first understand workload dynamics ...

Page 3: A Longitudinal Characterization of Local and Global BitTorrent

Background: BitTorrent Single file download File split into many smaller chunks

Downloaded from both seeds and downloaders

Distribution paths are dynamically determined Based on data availability

Arrivals

Departures

Downloader

Downloader

Downloader

Downloader

Seed

Seed

Download time

Seed residence

time

Torrent (downloaders and seeds)

Page 4: A Longitudinal Characterization of Local and Global BitTorrent

Background: BitTorrent Multi-tracked torrents

Torrent file “announce-list” URLs

Trackers Register torrent file

Maintain state information

Peers Obtain torrent file

Choose one tracker at random

Announce

Report status

Peer exchange (PEX)

Swarm

Swarm

Page 5: A Longitudinal Characterization of Local and Global BitTorrent

Background: BitTorrent Multi-tracked torrents

Torrent file “announce-list” URLs

Trackers Register torrent file

Maintain state information

Peers Obtain torrent file

Choose one tracker at random

Announce

Report status

Peer exchange (PEX)

Swarm

Swarm

Page 6: A Longitudinal Characterization of Local and Global BitTorrent

Contributions

Longitudinal multi-torrent analysis

48 weeks from two vantage points

Capturing differences in dynamics observed

locally and globally

University campus vs. global tracker-based

Example observations

Campus users download larger files

Campus users early adopters (except music)

High popularity churn

Most popular content peak later

Page 7: A Longitudinal Characterization of Local and Global BitTorrent

Measurement overview Active + passive measurements

Swarm

Longitudinal data

Two vantage points University campus

(ingress/egress)

Global trackers

Popularity dynamics

Page 8: A Longitudinal Characterization of Local and Global BitTorrent

University: tracker communication Passive measurements

Extract HTTP peer-to-tracker

traffic at campus ingress/egress

8

Page 9: A Longitudinal Characterization of Local and Global BitTorrent

University: tracker communication Passive measurements

Extract HTTP peer-to-tracker

traffic at campus ingress/egress

9

Page 10: A Longitudinal Characterization of Local and Global BitTorrent

Global: Tracker scrapes Active measurements

Periodically request the current

state as observed at a large set

of trackers

Page 11: A Longitudinal Characterization of Local and Global BitTorrent

Global: Tracker scrapes Active measurements

Periodically request the current

state as observed at a large set

of trackers

Page 12: A Longitudinal Characterization of Local and Global BitTorrent

Popularity dynamics

Measurement overview Active + passive measurements

Page 13: A Longitudinal Characterization of Local and Global BitTorrent

100

102

104

106

100

102

104

106

RankPopula

rity

Zipf(1e+007,1)

MZipf(1e+007,50,1)

GZipf(2e+005,0.02,1e-005,1)

Head

Trunk

Tail

E.g., Dan & Carlsson [IPTPS 2010]

Previous work Popularity distribution

Popularity distribution statistics Over lifetime

Over different time period

Different sampling methods

Page 14: A Longitudinal Characterization of Local and Global BitTorrent

Summary of datasets

Property University Global Mininova

Trackers Torrents

Downloads HTTP requests

2,371 56,963 1.73 M 249 M

721 11.2 M 37.0 B

--

1,690 911,687

-- --

Start date End date

Frequency

Sep. 15, 2008 Aug. 17, 2009 All requests

Sep. 15, 2008 Aug. 17, 2009 Weekly scrapes

Sep., 2008 Aug., 2009

Twice

Page 15: A Longitudinal Characterization of Local and Global BitTorrent

Summary of datasets

Property University Global Mininova

Trackers Torrents

Downloads HTTP requests

2,371 56,963 1.73 M 249 M

721 11.2 M 37.0 B

--

1,690 911,687

-- --

Start date End date

Frequency

Sep. 15, 2008 Aug. 17, 2009 All requests

Sep. 15, 2008 Aug. 17, 2009 Weekly scrapes

Sep., 2008 Aug., 2009

Twice

48 weeks of overlapping longitudinal data

Page 16: A Longitudinal Characterization of Local and Global BitTorrent

Summary of datasets

Property University Global Mininova

Trackers Torrents

Downloads HTTP requests

2,371 56,963 1.73 M 249 M

721 11.2 M 37.0 B

--

1,690 911,687

-- --

Start date End date

Frequency

Sep. 15, 2008 Aug. 17, 2009 All requests

Sep. 15, 2008 Aug. 17, 2009 Weekly scrapes

Sep., 2008 Aug., 2009

Twice

Many torrents (and downloads) …

Page 17: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

Dataset summary Torrents observed

56,963

11.2 M

Page 18: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

Dataset summary Torrents observed

56,963

11.2 M 90%

Page 19: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

Most of the files observed locally are also observed

in the global dataset

Dataset summary Torrents observed

56,963

11.2 M 90%

Page 20: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

11.2 M

56,963

Page 21: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

11.2 M

56,963

911,687

Page 22: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

11.2 M

56,963

911,687 Mininova screen scrapes also provide us

with size and category information for

some of these files

Page 23: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

11.2 M

56,963

911,687 Mininova screen scrapes also provide us

with size and category information for

some of these files

Page 24: A Longitudinal Characterization of Local and Global BitTorrent

Dataset summary Torrents observed

11.2 M

56,963

911,687 Mininova screen scrapes also provide us

with size and category information for

some of these files

33%

Page 25: A Longitudinal Characterization of Local and Global BitTorrent

Content download characteristics File size distribution, per download

Campus users download larger files

Page 26: A Longitudinal Characterization of Local and Global BitTorrent

Content download characteristics File size distribution, per download

Campus users download larger files

Size difference

Page 27: A Longitudinal Characterization of Local and Global BitTorrent

Content download characteristics Breakdown per category

Campus users download More movies and TV shows

Less music

Page 28: A Longitudinal Characterization of Local and Global BitTorrent

Content download characteristics Breakdown per category

Campus users download More movies and TV shows

Less music

More

Page 29: A Longitudinal Characterization of Local and Global BitTorrent

Content download characteristics Breakdown per category

Campus users download More movies and TV shows

Less music

Less

Page 30: A Longitudinal Characterization of Local and Global BitTorrent

Content download characteristics Breakdown per category

Campus users download More movies and TV shows

Less music

Again, biased towards larger contents ...

Page 31: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Terminology

Time

Dow

nlo

ads

Page 32: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Terminology

Local peak

Time

Dow

nlo

ads

Page 33: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Terminology

Local peak

Time until peak

Time

Dow

nlo

ads

Page 34: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Terminology

Global peak

Local peak

Time

Dow

nlo

ads

Page 35: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Terminology

Global peak

Local peak

Time

Dow

nlo

ads

Difference in peak times

Page 36: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Terminology

Global peak

Time Time

Dow

nlo

ads

Local downloads

before global peak

Page 37: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Downloads relative to global peak

Campus users are generally early adopters of content 70% of downloads before global peak

40% of downloads at least 10 weeks before global peak

Page 38: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Downloads relative to global peak

Campus users are generally early adopters of content 70% of downloads before global peak

40% of downloads at least 10 weeks before global peak

Early

downloads

Page 39: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Downloads relative to global peak

Campus users are generally early adopters of content 70% of downloads before global peak

40% of downloads at least 10 weeks before global peak

40%

70%

Page 40: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Downloads relative to global peak

Campus users are generally early adopters of content Except for music

Perhaps campus users can be used to predict some future popularity ... And used for seeding such content

Page 41: A Longitudinal Characterization of Local and Global BitTorrent

Early adopters Downloads relative to global peak

Campus users are generally early adopters of content Except for music

Perhaps campus users can be used to predict some future popularity ... And used for seeding such content

Exception

Page 42: A Longitudinal Characterization of Local and Global BitTorrent

Better predictor the more popular the content becomes As well as for some niche content ...

Early adopters Downloads relative to global peak

Page 43: A Longitudinal Characterization of Local and Global BitTorrent

Better predictor the more popular the content becomes As well as for some niche content ...

Early adopters Downloads relative to global peak

Early local

peaks!!

Page 44: A Longitudinal Characterization of Local and Global BitTorrent

Time until peak Global popularity peaks ...

The global popularity often peak late for popular content Early flash crowds do not dominate the popularity

Perhaps a sign that rich-gets-richer a better model ...

Page 45: A Longitudinal Characterization of Local and Global BitTorrent

Time until peak Global popularity peaks ...

The global popularity often peak later for popular content Early flash crowds do not dominate the popularity

Perhaps a sign that rich-gets-richer a better model ...

Correlation

Page 46: A Longitudinal Characterization of Local and Global BitTorrent

Time until peak Global popularity peaks ...

The more popular the content The later it peaks ...

Page 47: A Longitudinal Characterization of Local and Global BitTorrent

Time until peak Global popularity peaks ...

The more popular the content The later it peaks ...

Later until

peak

Page 48: A Longitudinal Characterization of Local and Global BitTorrent

Time until peak Global popularity peaks ...

Rich-gets-richer Close to linear from week-to-week

Cumulative total downloads show weaker (sub-linear) rich-gets-richer behavior

Page 49: A Longitudinal Characterization of Local and Global BitTorrent

Time until peak Global popularity peaks ...

Rich-gets-richer Close to linear from week-to-week

Cumulative total downloads show weaker (sub-linear) rich-gets-richer behavior

Linear

Page 50: A Longitudinal Characterization of Local and Global BitTorrent

Hotset analysis Popularity churn

High popularity churn Roughly 50-60% new

videos each week

Some files reoccur

Some video reoccur in hotset

Page 51: A Longitudinal Characterization of Local and Global BitTorrent

Hotset analysis Popularity churn

High popularity churn Roughly 50-60% new

videos each week

Some files reoccur

Some video reoccur in hotset

Similarity between weeks

Similarity with week 20

Page 52: A Longitudinal Characterization of Local and Global BitTorrent

Conclusions Large-scale longitudinal multi-torrent analysis

University campus

Global trackers

Campus users download more large files (TV shows

and movies) and a smaller fraction of music

Campus users are “early adopters”

Except for music

High weekly churn in set of popular files

Most of the popular files peak well after their initial use

Signs of rich-gets-richer behavior

Page 53: A Longitudinal Characterization of Local and Global BitTorrent

Thank you!

Niklas Carlsson Linköping University

György Dan KTH Royal Institute

Anirban Mahanti NICTA

Martin Arlitt HP Labs and University of Calgary