CS 311: Intro to Algorithms COMPSCI 311 Section 1: Introduction to … · 2019-01-25 · COMPSCI...

5
COMPSCI 311 Section 1: Introduction to Algorithms Dan Sheldon University of Massachusetts {Last Compiled: January 25, 2019} CS 311: Intro to Algorithms Instructor: Dan Sheldon Where: ILC S140 When: M/W 2:30-3:45 Discussion Sections: F 12:20–1:10, F 1:25–2:15, Engineering Lab 304. (Please stick to assigned section) TAs: Karine Tung, Jesse Lingeman, Raghavendra Addanki, Subhojyoti Mukherjee Office hours: posted soon. . . Run jointly with Section 2 (Prof. Marius Minea): same TAs, HW, quizzes, midterms, moodle, Piazza, Gradescope; different course websites, slides, finals What is Algorithm Design? How do you write a computer program to solve a complex problem? Computing similarity between DNA sequences Routing packets on the Internet Scheduling final exams at a college Assign medical residents to hospitals Find all occurrences of a phrase in a large collection of documents Finding the smallest number of coffee shops that can be built in the US such that everyone is within 20 minutes of a coffee shop. DNA sequence similarity Input: two n-bit strings s 1 and s 2 s 1 = AGGCTACC s 2 = CAGGCTAC Output: minimum number of insertions/deletions to transform s 1 into s 2 Algorithm: ???? Even if the objective is precisely defined, we are often not ready to start coding right away! What is Algorithm Design? Step 1: Formulate the problem precisely Step 2: Design an algorithm Step 3: Prove the algorithm is correct Step 4: Analyze its running time Important: this is an iterative process, e.g., sometimes you’ll even want to redesign the algorithm to make it easier to prove that it is correct. Course Goals Learn how to apply the algorithm design process. . . by practice! Learn specific algorithm design techniques Greedy Divide-and-conquer Dynamic Programming Network Flows Learn to communicate precisely about algorithms Proofs, reading, writing, discussion Prove when no exact efficient algorithm is possible Intractability and NP-completeness

Transcript of CS 311: Intro to Algorithms COMPSCI 311 Section 1: Introduction to … · 2019-01-25 · COMPSCI...

Page 1: CS 311: Intro to Algorithms COMPSCI 311 Section 1: Introduction to … · 2019-01-25 · COMPSCI 311 Section 1: Introduction to Algorithms Dan Sheldon University of Massachusetts

COMPSCI 311 Section 1: Introduction toAlgorithms

Dan Sheldon

University of Massachusetts

{Last Compiled: January 25, 2019}

CS 311: Intro to Algorithms

I Instructor: Dan SheldonI Where: ILC S140I When: M/W 2:30-3:45I Discussion Sections: F 12:20–1:10, F 1:25–2:15, Engineering

Lab 304. (Please stick to assigned section)I TAs: Karine Tung, Jesse Lingeman, Raghavendra Addanki,

Subhojyoti MukherjeeI Office hours: posted soon. . .

Run jointly with Section 2 (Prof. Marius Minea): same TAs, HW,quizzes, midterms, moodle, Piazza, Gradescope; different coursewebsites, slides, finals

What is Algorithm Design?

How do you write a computer program to solve a complex problem?I Computing similarity between DNA sequencesI Routing packets on the InternetI Scheduling final exams at a collegeI Assign medical residents to hospitalsI Find all occurrences of a phrase in a large collection of documentsI Finding the smallest number of coffee shops that can be built in

the US such that everyone is within 20 minutes of a coffee shop.

DNA sequence similarity

I Input: two n-bit strings s1 and s2

I s1 = AGGCTACCI s2 = CAGGCTAC

I Output: minimum number of insertions/deletions to transforms1 into s2

I Algorithm: ????

I Even if the objective is precisely defined, we are often not readyto start coding right away!

What is Algorithm Design?

I Step 1: Formulate the problem preciselyI Step 2: Design an algorithmI Step 3: Prove the algorithm is correctI Step 4: Analyze its running time

Important: this is an iterative process, e.g., sometimes you’ll evenwant to redesign the algorithm to make it easier to prove that it iscorrect.

Course Goals

I Learn how to apply the algorithm design process. . . by practice!

I Learn specific algorithm design techniquesI GreedyI Divide-and-conquerI Dynamic ProgrammingI Network Flows

I Learn to communicate precisely about algorithmsI Proofs, reading, writing, discussion

I Prove when no exact efficient algorithm is possibleI Intractability and NP-completeness

Page 2: CS 311: Intro to Algorithms COMPSCI 311 Section 1: Introduction to … · 2019-01-25 · COMPSCI 311 Section 1: Introduction to Algorithms Dan Sheldon University of Massachusetts

Prerequisites: CS 187 and 250

I Algorithms use data structuresI Familiarity

I at programming level (lists, stacks, queues, . . . )I with mathematical objects (sets, lists, relations, partial orders)

I Two key notions to revisit:I Recursion: many algorithm design and analysis patterns are

based on recursionI Proofs: correctness of algorithms. contradiction, induction, . . .

Grading Breakdown

I Participation (10%): Discussion section, in-class iClickerquestions

I Homework (30%): Homework (every two weeks, usually dueThursday) and online quiz (every weekend due Monday).

I Midterm 1 (20%): Focus on ~first third of lectures. Thursday,Feb 21, 7pm

I Midterm 2 (20%): Focus on ~second third of lectures.Thursday, Apr 11, 7pm

I Final (20%): Covers all lectures. Thursday, May 8, 3:30pm

Course Information

Course websites:

people.cs.umass.edu/~sheldon/teaching/cs311/

Slides, homework, courseinformation, pointers to allother pages

moodle.umass.edu Quizzes, solutions, gradespiazza.com Discussion forum, contacting

instructors and TA’sgradescope.com Submitting and returning

homework

Announcements: Check UMass email / Piazza regularly for courseannouncements.

Policies

I Online Quizzes: Quizzes must be submitted before 8pm Monday.No late quizzes allowed but we’ll ignore your lowest scoring quiz.

I Homework: Submit via Gradescope by 11:59pm on due date.I Late up to 24 hours: 50% penaltyI Late more than 24 hours: no creditI Each student is allowed to submit one homework up to 24

hours late without penalty.

Collaboration and Academic HonestyI Homework: Collaboration OK (and encouraged) on homework,

but read/attempt on your own first. The writeup and code mustbe your own. Looking at written solutions that are not your own(other students, web) is considered cheating. There will be formalaction if cheating is suspected. You must list your collaboratorsand any printed or online sources at the top of each assignment.

I Online Quizzes: Should be done entirely on your own althoughit’s fine to consult the book and slides as you do the quiz. Again,there’ll be formal action if cheating is suspected.

I Discussions: Groups for the discussion section exercises will beassigned randomly at the start of each session. You mustcomplete the discussion session exercise with your assigned group.

I Exams: Closed book and no electronics. Cheating will result in anF in the course.

I If in doubt whether something is allowed, ask!

Stable Matching Problem. . .

Page 3: CS 311: Intro to Algorithms COMPSCI 311 Section 1: Introduction to … · 2019-01-25 · COMPSCI 311 Section 1: Introduction to Algorithms Dan Sheldon University of Massachusetts

Lloyd Shapley. Stable matching theory and Gale–Shapley algorithm.

Alvin Roth. Applied Gale–Shapley to matching med-school students with

hospitals, students with schools, and organ donors with patients.

2012 Nobel Prize in Economics

31

Lloyd Shapley

original applications:college admissions and

opposite-sex marriage

Alvin Roth

slide credit: Kevin Wayne / Pearson

Stable Matching and College Admissions

I Suppose there are n colleges c1, c2, . . . , cn and n studentss1, s2, . . . , sn.

I Each college has a ranking of all the students and each studenthas a ranking of all the colleges. For simplicity, suppose eachcollege can only admit one student.

I Can we match students to colleges such that everyone is happy?I Not necessarily, e.g., if UMass was everyone’s top choice.

I Can we match students to colleges such that matching is stable?(economist’s view of “good”)I Stability: Don’t want to unmatched college-student pairs to be

incentivivzed to deviate from matching

Problem Formulation

I Input: preference lists for n colleges and n studentsI Output? need definitions first

I Matching: set M of college-student pairs, each college/studentparticipate in at most one pair.

I Perfect matching: each college/student in exactly one pairI Instability or unstable pair (with respect to matching M): a

pair (c, s) /∈ M such thatI (c, s′) ∈ M but c prefers s to s′

I (c′, s) ∈ M but s prefers c to c′

I Stable matching: perfect matching with no instabilities

I Output: a stable matching

Clicker Question 1

1st 2nd 3rd

Atlanta Xavier Yvette ZeusBoston Yvette Xavier ZeusChicago Xavier Yvette Zeus

1st 2nd 3rd

Xavier Boston Atlanta ChicagoYvette Atlanta Boston ChicagoZeus Atlanta Boston Chicago

Which pair is an unstable pair with respect to the matching {A - X,B - Z, C - Y}? (marked in bold above)

A: A - Y

B: B - X

C: B - Z

D: none of the above

Examples

Do stable matchings always exist? Are they unique? Let’s see. . .I Example 1: universal prefs

Colleges

a: 1 2b: 1 2

Students

1: a b2: a b

I M = {(a, 1), (b, 2)}? stableI M = {(a, 2), (b, 1)}? not stable

Examples

I Example 2: inconsistent prefs

Colleges

a: 1 2b: 2 1

Students

1: b a2: a b

Clicker Q2: You are given an arbitrary setof preferences. Does it have more than onestable matching?A. YesB. NoC. It depends on the preference lists

I M = {(a, 1), (b, 2)}?stable

I M = {(a, 2), (b, 1)}?stable

Page 4: CS 311: Intro to Algorithms COMPSCI 311 Section 1: Introduction to … · 2019-01-25 · COMPSCI 311 Section 1: Introduction to Algorithms Dan Sheldon University of Massachusetts

Toward an Algorithm

Let’s use a slightly bigger example to try to develop an algorithm.

Colleges

a: 1 2 3b: 2 1 3c: 1 3 2

Students

1: c a b2: a b c3: a b c

Idea: build M incrementally. What should colleges do? Whatshould students do?

Propose-and-Reject (Gale-Shapley) Algorithm

Initially all colleges and students are freewhile some college is free and hasn’t proposed to every studentdo

Choose such a college cLet s be the highest ranked student to whom c has not

proposedif s is free then

c and s become matchedelse if s is matched to c′ but prefers c to c′ then

c′ becomes unmatchedc and s become matched

else . s prefers c′

s rejects c and c remains freeend if

end while

Analyzing the Algorithm

I Some natural questions:I Can we guarantee the algorithm terminates?I Can we guarantee the every college and student gets a match?I Can we guarantee the resulting allocation is stable?

I Some initial observations:I (F1) Once matched, students stay matched and only“upgrade" during the algorithm.

I (F2) College propose to students in order of college’spreferences.

Can we guarantee the algorithm terminates?

I Yes! Proof. . .I Note that in every round, some college proposes to some

student that they haven’t already proposed to.I n colleges and n students =⇒ at most n2 proposalsI =⇒ at most n2 rounds of the algorithm

Can we guarantee all colleges and students get a match?

I Yes! Proof by contradiction. . .I Suppose not all colleges and students have matches. Then

there exists unmatched college c and unmatched student s.

I s was never matched during the algorithm (by F1)

I But c proposed to every student (by termination condition)

I When c proposed to s, she was unmatched and yet rejected c.Contradiction!

Clicker Question 3

Depending on the problem instance, which of the following canhappen during a run of the Gale-Shapley algorithm?

A: Each student accepts their first offer and never switches.

B: Some student switches their choice more than once during a run.

C: Both A and B can happen for the same problem instance.

D: Both A and B can happen, but only in different probleminstances.

Page 5: CS 311: Intro to Algorithms COMPSCI 311 Section 1: Introduction to … · 2019-01-25 · COMPSCI 311 Section 1: Introduction to Algorithms Dan Sheldon University of Massachusetts

Can we guarantee the resulting allocation is stable?

I Yes! Proof by contradictionI Suppose there is an instability (c, s)

I c is matched to s′ but prefers s to s′

I s is matched to c′ but prefers c to c′

I Did c offer to s? Yes, by (F2), since c offered to s′ who isranked lower

I Did s accept offer from c? Maybe initially, but s musteventually reject c for another college, and, by (F1), s prefersfinal college c′ to c

I Contradiction!

Content delivery networks. Distribute much of world’s content on web.

User. Preferences based on latency and packet loss.

Web server. Preferences based on costs of bandwidth and co-location.

Goal. Assign billions of users to servers, every 10 seconds.

A modern application

34

Algorithmic Nuggets in Content Delivery

Bruce M. Maggs Ramesh K. SitaramanDuke and Akamai UMass, Amherst and Akamai

[email protected] [email protected]

This article is an editorial note submitted to CCR. It has NOT been peer reviewed.The authors take full responsibility for this article’s technical content. Comments can be posted through CCR Online.

ABSTRACTThis paper “peeks under the covers” at the subsystems thatprovide the basic functionality of a leading content deliv-ery network. Based on our experiences in building one ofthe largest distributed systems in the world, we illustratehow sophisticated algorithmic research has been adapted tobalance the load between and within server clusters, man-age the caches on servers, select paths through an overlayrouting network, and elect leaders in various contexts. Ineach instance, we first explain the theory underlying thealgorithms, then introduce practical considerations not cap-tured by the theoretical models, and finally describe what isimplemented in practice. Through these examples, we high-light the role of algorithmic research in the design of com-plex networked systems. The paper also illustrates the closesynergy that exists between research and industry whereresearch ideas cross over into products and product require-ments drive future research.

1. INTRODUCTIONThe top-three objectives for the designers and operators

of a content delivery network (CDN) are high reliability,fast and consistent performance, and low operating cost.While many techniques must be employed to achieve theseobjectives, this paper focuses on technically interesting al-gorithms that are invoked at crucial junctures to provideprovable guarantees on solution quality, computation time,and robustness to failures. In particular, the paper walksthrough the steps that take place from the instant that abrowser or other application makes a request for contentuntil that content is delivered, stopping along the way toexamine some of the most important algorithms that areemployed by a leading CDN.

One of our aims, as we survey the various algorithms, isto demonstrate that algorithm design does not end whenthe last theorem is proved. Indeed, in order to develop fast,scalable, and cost-e↵ective implementations, significant in-tellectual creativity is often required to address practicalconcerns and messy details that are not easily captured bythe theoretical models or that were not anticipated by theoriginal algorithm designers. Hence, much of this paper fo-cuses on the translation of algorithms that are the fruits ofresearch into industrial practice. In several instances, wedemonstrate the benefits that these algorithms provide bydescribing experiments conducted on the CDN.

A typical request for content begins with a DNS queryissued by a client to its resolving name server (cf. Figure 1).The resolving name server then forwards the request to the

Edge%Server%

Client%

Origin%

Authorita4ve%Name%Server%%(Global%and%Local%Load%

Balancing)%

Overlay%Rou4ng%

Content%

DNS%

Figure 1: A CDN serves content in response to aclient’s request.

CDN’s authoritative name server. The authoritative nameserver examines the network address of the resolving nameserver, or, in some cases, the edns-client-subnet provided bythe resolving name server [9], and, based primarily on thisaddress, makes a decision about which of the CDN’s clustersto serve the content from. A variant of the stable marriagealgorithm makes this decision, with the aim of providinggood performance to clients while balancing load across allclusters and keeping costs low. This algorithm is describedin Section 2.

But DNS resolution does not end here. The task of indi-cating which particular web server or servers within the clus-ter will serve the content is delegated to a second set of nameservers. Within the cluster, load is managed using a consis-tent hashing algorithm, as described in Section 3. The webserver address or addresses are returned through the resolv-ing name server to the client so that the client’s application,such as a browser, can issue the request to the web server.The web servers that serve content to clients are called edgeservers as they are located proximal to clients at the “edges”of the Internet. As such, Akamai’s CDN currently has over170,000 edge servers located in over 1300 networks in 102countries and serves 15-30% of all Web tra�c.

When an edge server receives an HTTP request, it checksto see if the requested object is already present in the server’scache. If not, the server begins to query other servers in

ACM SIGCOMM Computer Communication Review 52 Volume 45, Number 3, July 2015

slide credit: Kevin Wayne / Pearson

For Next Time

I Think about:I Would it be better or worse for the students if we ran the

algorithm with the students proposing?I Can a student get an advantage by lying about their

preferences?

I Read: Chapter 1, course policies

I Enroll in Piazza, log into Moodle, and visit the course webpage.