CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) [email protected].

39
CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) [email protected]

Transcript of CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) [email protected].

Page 1: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

CIS 6930.5: Federated

Distributed Systems

Adriana Iamnitchi (Anda)

[email protected]

Page 2: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

2CIS6930.5: Federated Distributed Systems (Fall 2006)

Contact Info

Email: [email protected]: ENB 334Office hours: by appointment (email me)Course page: http://www.csee.usf.edu/~anda/CIS6930.5

Page 3: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

3CIS6930.5: Federated Distributed Systems (Fall 2006)

CIS 6930.5: Course Goals

Primary– Gain deep understanding of fundamental

issues that affect design of large-scale federated distributed systems

– Map primary contemporary research themes

– Gain experience in network research Secondary

– By studying a set of outstanding papers, build knowledge of how to present research

– Learn how to read papers & evaluate ideas

Page 4: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

4CIS6930.5: Federated Distributed Systems (Fall 2006)

What I’ll Assume You Know

Basic Internet architecture– IP, TCP, DNS, HTTP

Basic principles of distributed computing– Asynchrony (cannot distinguish between

communication failures and latency)

– Partial global state knowledge (cannot know everything correctly)

– Failures happen. In very large systems, even rare failures happen often

If there are things that don’t make sense, ask!

Page 5: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

5CIS6930.5: Federated Distributed Systems (Fall 2006)

Examples of Distributed Systems

ATT web Gnutella network

The InternetA Sensor Network

Page 6: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

6CIS6930.5: Federated Distributed Systems (Fall 2006)

Definition (a version)

A distributed system is a collection of autonomous, programmable, failure-prone entities that are able to communicate through a communication medium that is unreliable.– Entity=a process on a device (PC, PDA, mote)

– Communication Medium=Wired or wireless network

“Federated” – spanning multiple institutional or network (DNS) domains

Page 7: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

7CIS6930.5: Federated Distributed Systems (Fall 2006)

Outline

Case study (and project ideas): – Volunteer computing: SETI@home and BOINC

– Grid computing

– P2P systems Administravia

Page 8: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

8CIS6930.5: Federated Distributed Systems (Fall 2006)

Page 9: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

9CIS6930.5: Federated Distributed Systems (Fall 2006)

SETI@home Operations

datarecorder

screensavers

WU storage

splitters

DLT tapes

dataserver

science DBuser DB

resultqueue

acct.queue

garbagecollector

tape archive,delete

tape backup

master DBredundancy

checking

RFIelimination

repeatdetection

web site

CGI program

web pagegenerator

Page 10: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

10CIS6930.5: Federated Distributed Systems (Fall 2006)

How does it work?

Fixed-rate data processing task Low bandwidth/computation ratio Independent parallelism Error tolerance

SETI@home

Master-workerarchitecture

Page 11: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

11CIS6930.5: Federated Distributed Systems (Fall 2006)

History and Statistics Conceived 1995, launched April 1999 “scientific experiment that uses Internet-connected

computers in the Search for Extraterrestrial Intelligence (SETI). You can participate by running a free program that downloads and analyzes radio telescope data. “

No ET signals yet, but other results

  Total Last 24 Hours(as of Wed Feb 23 07:04:51)

Users 5,361,313 4,391

Results received 1,779 millions 5 million

Total CPU time 2.2 million years 3610.717 years

Average CPU time/work unit

10 hr 58 min 14.0 sec 6 hr 19 min 30.1 sec

Page 12: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

12CIS6930.5: Federated Distributed Systems (Fall 2006)

Volunteer computing

Also called “public-resource computing” Utilizes idle computing cycles over Internet Other systems:

– Original: GIMPS, distributed.net

– Commercial: United Devices, Entropia, Porivo, Popular Power

– Academic, open-source> Cosm, folding@home

Page 13: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

13CIS6930.5: Federated Distributed Systems (Fall 2006)

None of the popularity of SETI!

ET How to get and retain users (from David Anderson,

the leader of the SETI@home project)– Graphics are important (but monitors do burn in)– Teams: users recruit other users– Keep users informed

Science news System management news Periodic project emails

Reward users:– PDF certificates– Milestone pages and emails– Leader boards (overall, country, …)

Page 14: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

14CIS6930.5: Federated Distributed Systems (Fall 2006)

Millions and millions of computers!(Problems)

Server scalability Dealing with excess CPU time Cheating Bad behavior:

– Team recruitment by spam

– Sale of accounts on eBay Malfunctions Network bandwidth costs money

Page 15: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

15CIS6930.5: Federated Distributed Systems (Fall 2006)

SETI@home: Summary Master-worker design

– Centralized solution>Master=central point of control>Single point of failure>Performance bottleneck

Incentives for participation– Mean sometimes incentives for cheating

Massive (“embarrassing”) parallelism Low bandwidth/computation ratio Users do donate real resources: $1.5M / year

consumed power More information: http://setiathome.ssl.berkeley.edu

Page 16: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

16CIS6930.5: Federated Distributed Systems (Fall 2006)

BOINC Berkeley Open Infrastructure for Network

Computing “Open-source software for volunteer computing and

desktop grid computing. “http://boinc.berkeley.edu/

Project idea: install and configure BOINC on a set of machines at USF to run large embarrassingly parallel applications. – Two candidate applications from mechanical

engineering and physics (code already exists)– Report experience. Think along the following idea:

would it be beneficial to use the administrative desktops for scientific computations at USF?

Page 17: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

17CIS6930.5: Federated Distributed Systems (Fall 2006)

Outline

Case study (and project ideas): – Volunteer computing: SETI@home and BOINC

– Grid computing

– P2P systems Administravia

Page 18: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

18CIS6930.5: Federated Distributed Systems (Fall 2006)

Grid Computing: Current Status The metaphor: power grid Many deployed grids running in

production mode Scientists are the most traditional users Users:

– 100s, 10s of institutions

– Well-established communities Resources:

– Computers, data, instruments, storage, applications

– Owned/administered by institutions Applications: data- and compute-

intensive processing Approach: common infrastructure

Page 19: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

19CIS6930.5: Federated Distributed Systems (Fall 2006)

Why Don’t We Build a Huge Supercomputer?

1

10

100

1000

10000

1 10 100 1000Rank (log scale)

Lin

Pack

per

f.G

FLO

PS (

log

scal

e) . 2001 2000

1999 19981997 19961995

Top500 supercomputer list over time: Zipf distribution: Perf(rank) ≈ rank -k

Parameter 'k' evolution .

-0.84

-0.82

-0.80

-0.78

-0.76

-0.74

-0.72

-0.70

-0.68

1995

1996

1997

1998

1999

2000

2001

2002

2003

Page 20: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

20CIS6930.5: Federated Distributed Systems (Fall 2006)

Impact

Trend: it is increasingly interesting to aggregate the capabilities of the machines in the tail of this distribution.

– A virtual machine that aggregates the last 10 in Top500 would rank 32nd in ’95 but 14th in ‘03

Both Grid and P2P computing are results of this trend:– Grids: focus on assembling (a relatively small number of)

resources to enable controlled, secure resource sharing

– P2P focus: scale, deployability.

Challenge: design services that offer the best of both worlds

complex, secure services, that deliver controlled QoS; are scalable and can be easily deployed.

Parameter 'k' evolution .

-0.84

-0.82

-0.80

-0.78

-0.76

-0.74

-0.72

-0.70

-0.68

1995

1996

1997

1998

1999

2000

2001

2002

2003

Page 21: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

21CIS6930.5: Federated Distributed Systems (Fall 2006)

Outline

Case study (and project ideas): – Volunteer computing: SETI@home and BOINC

– Grid computing

– P2P systems Administravia

Page 22: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

22CIS6930.5: Federated Distributed Systems (Fall 2006)

Peer-to-Peer Systems

Revived (?) by music sharing A variety of applications deployed today Def 1: “A class of applications that take

advantage of resources (e.g., storage, cycles, content) available at the edge of the Internet.”– Edges often turned off, without permanent IP

addresses, etc. Def 2: “A class of decentralized, self-organizing

distributed systems, in which all or most communication is symmetric.”

Lots of other definitions that fit in between

Page 23: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

23CIS6930.5: Federated Distributed Systems (Fall 2006)

P2P Impact (1)

Widespread adoption leading to– KaZaA – 170 millions downloads (3.5M/week)

one of the most popular applications ever! (almost) zero-cost data distribution

… is forcing companies to change their business models

… might impact copyright laws

 

Page 24: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

24CIS6930.5: Federated Distributed Systems (Fall 2006)

0%

20%

40%

60%

80%

100%

Feb.'02 Aug.'02 Feb.'03 Aug.'03 Feb. '04 July'04

Other

Data transfers

Unidentified

File sharing

P2P Impact (2)

Killer application for broadband to consumers– P2P generated traffic may be the single

largest contributor to Internet traffic today

Internet2 traffic statistics

Source: www.internet2.edu

Page 25: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

25CIS6930.5: Federated Distributed Systems (Fall 2006)

Applications (1)

File sharing – The ‘killer’ application to date

– Too many to list them all: Napster, FastTrack (KaZaA, iMesh), Gnutella (LimeWire, Morpheus, BearShare),

Streaming: the user ‘plays’ the data as it arrives

P2P approach

Possible solution: The first few users get the

stream from the server New users get the stream

from the server or from users who are already receiving the stream

source

Oh, I am exhausted!

Client/server approach

Page 26: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

26CIS6930.5: Federated Distributed Systems (Fall 2006)

Applications (2)

Performance benchmarking Problem:– Evaluate the performance of your Web site form end-user

perspective > Multiple views on your site performance

– Generate Internet statistics> Connectivity statistics> Routing errors, routing maps

Backup storage (HiveNet, OceanStore) Collaborative environments (Groove Networks) Instant messaging (Yahoo, AOL) Web serving communities (uServ) Spam filtering Anonymous email Censorship-resistant publishing systems (Ethernity, Freenet

Page 27: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

27CIS6930.5: Federated Distributed Systems (Fall 2006)

P2P Networks: Current Status

Users: – Millions– Anonymous individuals

Resources:– Computing cycles XOR files– Owned/administered (?) by user– Intermittent participation:

> Gnutella: 60 min. (‘01)> MojoNation: 1/6 users always connected

(‘01)> Overnet: 50% nodes available 70% of

time over a week (‘02) Applications: file retrieval or parallel

computations Approach: vertically integrated solutions

(www.slyck.com, 06/14/’06)

???MP2P

???DirectConnect

645,120 Overnet

2,219,539Gnutella

2,848606 FastTrack

3,108,066 eDonkey2K

Network Users

Page 28: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

28CIS6930.5: Federated Distributed Systems (Fall 2006)

Trend: Large, Dynamic, Self-Configuring Grids

Scale & volatility

Functionality &infrastructure

Grids

P2P

•Large scale•Weaker trust assumptions•Ease of integration

•No centralized authority•Intermittent resource/user participation•Diversity in:

•Shared resources•Sharing characteristics

•Variable technical support•Infrastructure (sharable services)

•Support for diverse applications

On Death, Taxes, and the Convergence of Grid and P2P Systems, Foster and Iamnitchi, IPTPS’03

Page 29: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

29CIS6930.5: Federated Distributed Systems (Fall 2006)

Challenges in Distributed Systems

Scale Real problems: spam, denial of service attacks

(and distributed), security, fault tolerance, etc. We’ll look at latest solutions to such problems

proposed in: – Top conferences in systems and networking:

SIGCOMM, OSDI, NSDI

– Top workshops (hot topics): IPTPS, HotOS

– Other venues (Digression: how do you tell when a conference is

top?)

Page 30: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

30CIS6930.5: Federated Distributed Systems (Fall 2006)

Course Organization/Syllabus/etc.

Page 31: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

31CIS6930.5: Federated Distributed Systems (Fall 2006)

Administravia: Grading

Reviewing:30% Discussion leading: 15% Project: 55%

– Aim high!

– Have fun!

Page 32: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

32CIS6930.5: Federated Distributed Systems (Fall 2006)

Administravia:Paper Reviewing (1)

Goals:– Think of what you read– Get used to writing paper reviews

Reviews due by noon before class Be professional in your writing Have an eye on the writing style:

– Clarity– Beware of traps: learn to use them in writing and

detect them in reading– Detect (and stay away from) trivial claims. E.g., 1st sentence in the Introduction: “The tremendous/unprecedented/phenomenal

growth/scale/ubiquity of the Internet…”

Page 33: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

33CIS6930.5: Federated Distributed Systems (Fall 2006)

Administravia:Paper Reviewing (2)

Follow the form provided when relevant. State the main contribution of the paper Critique the main contribution: Rate the significance of the

paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two.

Rate how convincing the methodology is. Do the claims and conclusions follow from the experiments? Are the assumptions realistic? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have

considered? (And, of course, is the paper free of methodological errors?)

Page 34: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

34CIS6930.5: Federated Distributed Systems (Fall 2006)

Administravia:Paper Reviewing (3)

What is the most important limitation of the approach? What are the three strongest and/or most interesting ideas in

the paper? What are the three most striking weaknesses in the paper? Name three questions that you would like to ask the authors. Detail an interesting extension to the work not mentioned in

the future work section. Optional comments on the paper that you’d like to see

discussed in class.

Page 35: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

35CIS6930.5: Federated Distributed Systems (Fall 2006)

Administravia:Discussion leading

Come prepared!– Prepare discussion outline– Prepare questions:

> “What if”s> Unclear aspects of the solution proposed> …

– Similar ideas in different contexts– Initiate short brainstorming sessions

Leaders do NOT need to submit paper reviews Main goals:

– Keep discussion flowing – Keep discussion relevant– Engage everybody (I’ll have an eye on this, too)

Page 36: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

36CIS6930.5: Federated Distributed Systems (Fall 2006)

Administravia:Projects

Combine with your research if relevant to the class Get approval from all instructors if you overlap final

projects:– Don’t sell the same piece of work twice

– You can get more than twice as many results with less than twice as much work

Aim high!– Put one extra month and get a publication out of it

– It is doable (we have proofs) Try ideas that you postponed out of fear: it’s just a

class, not your PhD.

Page 37: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

37CIS6930.5: Federated Distributed Systems (Fall 2006)

Administravia:Project deadlines (tentative)

Sept. 14: 1-page project proposal Oct. 10: 3-page literature survey

– Know relevant work in your problem area

– If implementation project, list tools, similar projects Nov. 13: 5-page Midterm project due

– Have a clear image of what’s possible/doable

– Report preliminary results Last class(es):In-class project presentation

– Demo, if appropriate Dec. 15:

– 10-page write-up

Page 38: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

38CIS6930.5: Federated Distributed Systems (Fall 2006)

Next Class (Wed, August 30) In-class discussion of papers:

– “Automated Worm Fingerprinting”, OSDI ‘04.

– “Planet Scale Software Updates”, SIGCOMM ’06. Discussion of some project ideas Need discussion leader to team up with me for the

class next week: Real systems (1): BitTorrent– Exploiting BitTorrent For Fun (IPTPS’06)– A Case for Efficient Execution of Data-Intense

Applications with BitTorrent on Computational Desktop Grids ()

Page 39: CIS 6930.5: Federated Distributed Systems Adriana Iamnitchi (Anda) anda@cse.usf.edu.

39CIS6930.5: Federated Distributed Systems (Fall 2006)

Questions?