End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network...

23
End-to-end Publishing Using Bittorrent

Transcript of End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network...

Page 1: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

End-to-end Publishing Using Bittorrent

Page 2: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Bittorrent

•Bittorrent is a widely used peer-to-peer network used to distribute files, especially large ones•It has a number of legal uses which separate it from other P2P

Page 3: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Practical Applications

Distributing large filesPodcastingVloggingDisk imagesLegal distribution of movies (see bittorrent.com)

Page 4: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Traditional vs. Bittorrent

•One server provides many clients

•Many clients provide many clients

Page 5: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Terminology

Swarm – clients downloading or uploading a given file through BIttorrentTracker – centralized server that clients connect to to ask for lists of other clients connected to the swarmSeed – A client that has a complete copy of the filePeer (Leecher) – A client that does not have a complete copy of the file

Page 6: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Problem

Torrents that are less popular may eventually “die” when there are no longer any complete copies of the file in the swarm

Page 7: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Everseed

Permanent seed running on the same server as the tracker

Guarantees that there will always be a complete copy of the file

Page 8: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Related Research

The creator of Bittorrent wrote a paper on the process of downloading a file using Bittorrent at http://www.bittorrent.org/protocol.htmlMaintainers of various Bittorrent clients wrote http://wiki.theory.org/BitTorrentSpecification, which is like the official specification except far more in depthOsprey (http://osprey.ibiblio.org/) seems to have thought of something similar, but haven't made any visible progress

Page 9: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Explanation

•The .torrent metadata file provides info about where to find the tracker and about the file being distributed•Client connects to tracker•Tracker gives client a list of other clients•Client then downloads file from other clients (not a centralized server)•Periodic update with tracker for new client list

Page 10: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Goals

•Complete internet publishing solution using Bittorrent•Metadata file generator (.torrent)•Tracker•“Everseed”•Web interface

Page 11: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

.torrent File

•Official documention on bittorrent.org•Metadata on the file to be downloaded (tracker URL, filename, size, checksum hashes)•Stored as “bencoded” strings, integers, lists, dictionaries

Page 12: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Bencoding

•Integer: 6 => “i6e”•String: “hello” => “5:hello”•List: [“hello”,”world”] => “l5:hello5:worlde”•Dictionary: {“hello”:”world”} => “d5:hello5:worlde”

Page 13: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Bencoding implementation

•Python, good string manipulation•Structure of a .torrent file is a dictionary containing string keys and integer, string, list, and dictionary values•Recursion to encode/decode

Page 14: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Tracker

•Makes use of the bencoding algorithm•Handles two types of requests: “announce” and “scrape”•Stores data on peers and torrents in a SQLite database•No performance issues

Page 15: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Network performance

Peer List Size

0 2000

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Peer List Size vs Time (seconds)

Column B

Time (seconds)

Pee

r Lis

t Siz

e

Page 16: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Database performance

0 2000

0

0.1

0.2

0.3

0.4

0.5

Number of Peers Inserted vs. Time (seconds)

Column B

Time (seconds)

Num

ber

of P

eers

Ins

erte

d

Page 17: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Announce requests

Used by a client to announce presence in a Bittorrent swarmClient sends an HTTP GET request to the announce URL in the .torrent fileTracker parses request, urldecodes data about the peerTracker stores data in the database, sends appropriate response as bencoded string in a text/plain documentClient bdecodes string, connects to other clients

Page 18: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Scrape requests•Used by client to obtain info about the torrents the tracker is tracking•Client sends an HTTP GET request to the scrape url found by transforming the announce url•Tracker urldecodes and parses the request•Tracker fetches data about torrent from the database•Tracker returns a bencoded dictionary which the client decodes

Page 19: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Smart Peer List Response

•Seeds often disconnect from other seeds•Tracker can also do this to some extent•Announce requests contain a list of random peers•If a client is seeding, it doesn't need IPs of other seeds•Increased overall swarm performance

Page 20: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Peer List Compression

•Peer list in the tracker response to peer lists is normally ASCII encoded•The peer list can be compressed to 4 bytes for the IP address, 2 bytes for the port•Huge bandwidth savings, ~80%•Greatly enhanced tracker performance•Reduced tracker hardware requirements

Page 21: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Test Client

•Concurrent development of a test Bittorrent client written in Python•Can send both announce and scrape requests•Key-value pairs are easily configurable

Page 22: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Testing

•Generalized method of handling exceptions in the initialization methods•Increased use of try/except statements to improve robustness•Testing with incorrect or missing data

Page 23: End-to-end Publishing Using Bittorrent. Bittorrent Bittorrent is a widely used peer-to- peer network used to distribute files, especially large ones It.

Summary

•Python•Benefits of P2P technology•“Everseed” concept•.torrent files and bencoding•Tracker