CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology...

189
CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary CM40212: Internet Technology Julian Padget Tel: x6971, E-mail: [email protected] Distributed Progamming Patterns and Middleware November 1, 2011 Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 1 / 58

Transcript of CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology...

Page 1: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

CM40212: Internet Technology

Julian PadgetTel: x6971, E-mail: [email protected]

Distributed Progamming Patternsand Middleware

November 1, 2011

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 1 / 58

Page 2: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Outline

1 Web programmingAppletsServletsCookiesAJAX

2 Distributed programmingMIMDSIMD/SPMD

3 Programming frameworksMPIMap ReduceMemcachedHADOOP

4 Summary

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 2 / 58

Page 3: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 4: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 5: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 6: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 7: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 8: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 9: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 10: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Objectives

Establish scope of programming task: not a singlethread, not a single computer

Several complementary frameworks—achievesimilar/different goals... but just technology

Better to understand programming models and theirstrengths and weaknesses

ExpressivenessSecurityDebuggability(?) and maintainabilityReliability!

Task: to recognize the appropriate abstraction to use(or invent)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 3 / 58

Page 11: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

The coding space

browser

applet

webserver

servlet

databasesetc.

or? and?

laptop/desktop

Condorpool

NationalGrid

Service

LocalCluster

ComputeCloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 4 / 58

Page 12: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

The coding space

browser

applet

webserver

servlet

databasesetc.

or? and?

laptop/desktop

Condorpool

NationalGrid

Service

LocalCluster

ComputeCloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 4 / 58

Page 13: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

The coding space

browser

applet

webserver

servlet

databasesetc.

or? and?

laptop/desktop

Condorpool

NationalGrid

Service

LocalCluster

ComputeCloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 4 / 58

Page 14: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming

Content

1 Web programmingAppletsServletsCookiesAJAX

2 Distributed programming

3 Programming frameworks

4 Summary

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 5 / 58

Page 15: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming

Places to put code

In the browser: applets

In the server: servlets

In the back-end: Perl, Python, PHP, Ruby-on-Rails

Back-end frameworks: Ajax (Backbase, Dojo, Spry)

browser

applet

webserver

servlet

databasesetc.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 6 / 58

Page 16: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming

Places to put code

In the browser: applets

In the server: servlets

In the back-end: Perl, Python, PHP, Ruby-on-Rails

Back-end frameworks: Ajax (Backbase, Dojo, Spry)

browser

applet

webserver

servlet

databasesetc.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 6 / 58

Page 17: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming

Places to put code

In the browser: applets

In the server: servlets

In the back-end: Perl, Python, PHP, Ruby-on-Rails

Back-end frameworks: Ajax (Backbase, Dojo, Spry)

browser

applet

webserver

servlet

databasesetc.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 6 / 58

Page 18: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming

Places to put code

In the browser: applets

In the server: servlets

In the back-end: Perl, Python, PHP, Ruby-on-Rails

Back-end frameworks: Ajax (Backbase, Dojo, Spry)

browser

applet

webserver

servlet

databasesetc.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 6 / 58

Page 19: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applets

Applets can use the browser API to:

Received notification of milestones

Load data files (specified relative to the URL of theapplet or the page in which it is running)

Display status strings

Make the browser display a document

Find other applets running in the same page

Play sounds

Access parameters specified by the user in the<APPLET> tag

Constrained environment for security

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 7 / 58

Page 20: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applets

Applets can use the browser API to:

Received notification of milestones

Load data files (specified relative to the URL of theapplet or the page in which it is running)

Display status strings

Make the browser display a document

Find other applets running in the same page

Play sounds

Access parameters specified by the user in the<APPLET> tag

Constrained environment for security

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 7 / 58

Page 21: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applets

Applets can use the browser API to:

Received notification of milestones

Load data files (specified relative to the URL of theapplet or the page in which it is running)

Display status strings

Make the browser display a document

Find other applets running in the same page

Play sounds

Access parameters specified by the user in the<APPLET> tag

Constrained environment for security

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 7 / 58

Page 22: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applets

Applets can use the browser API to:

Received notification of milestones

Load data files (specified relative to the URL of theapplet or the page in which it is running)

Display status strings

Make the browser display a document

Find other applets running in the same page

Play sounds

Access parameters specified by the user in the<APPLET> tag

Constrained environment for security

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 7 / 58

Page 23: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applets

Applets can use the browser API to:

Received notification of milestones

Load data files (specified relative to the URL of theapplet or the page in which it is running)

Display status strings

Make the browser display a document

Find other applets running in the same page

Play sounds

Access parameters specified by the user in the<APPLET> tag

Constrained environment for security

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 7 / 58

Page 24: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applets

Applets can use the browser API to:

Received notification of milestones

Load data files (specified relative to the URL of theapplet or the page in which it is running)

Display status strings

Make the browser display a document

Find other applets running in the same page

Play sounds

Access parameters specified by the user in the<APPLET> tag

Constrained environment for security

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 7 / 58

Page 25: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applets

Applets can use the browser API to:

Received notification of milestones

Load data files (specified relative to the URL of theapplet or the page in which it is running)

Display status strings

Make the browser display a document

Find other applets running in the same page

Play sounds

Access parameters specified by the user in the<APPLET> tag

Constrained environment for security

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 7 / 58

Page 26: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

The Applet Interface

There are four important Applet methods that enable asubclass to handle major events:

1 init: To initialize the applet each time it’s loaded (orreloaded)

2 start: To start the applet’s execution, such as whenthe applet is loaded or when the user revisits a pagethat contains the applet

3 stop: To stop the applet’s execution, such as when theuser leaves the applet’s page or quits the browser

4 destroy: To perform a final cleanup in preparation forunloading

For example:public class Simple extends JApplet {

. . .

public void init() { . . . }

public void start() { . . . }

public void stop() { . . . }

public void destroy() { . . . }

. . .

}

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 8 / 58

Page 27: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

The Applet Interface

There are four important Applet methods that enable asubclass to handle major events:

1 init: To initialize the applet each time it’s loaded (orreloaded)

2 start: To start the applet’s execution, such as whenthe applet is loaded or when the user revisits a pagethat contains the applet

3 stop: To stop the applet’s execution, such as when theuser leaves the applet’s page or quits the browser

4 destroy: To perform a final cleanup in preparation forunloading

For example:public class Simple extends JApplet {

. . .

public void init() { . . . }

public void start() { . . . }

public void stop() { . . . }

public void destroy() { . . . }

. . .

}

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 8 / 58

Page 28: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Security Restrictions

Applets pose a security risk: downloaded code (fromwhere?) is executed within the browser on clientmachineDifferent browsers have different security policies andpolicies change, so cannot hard-code...Specifically:

Cannot load libraries or define native methodsCannot (normally) read or write files on the clientCannot make network connections except to theoriginating hostCannot start any program on the clientCannot read arbitrary system propertiesApplet created windows must look different

SecurityManager object on browser implements securitypolicies: throws a SecurityException when violation isdetected. Applet can catch the SecurityException andprocess it

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 9 / 58

Page 29: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Security Restrictions

Applets pose a security risk: downloaded code (fromwhere?) is executed within the browser on clientmachineDifferent browsers have different security policies andpolicies change, so cannot hard-code...Specifically:

Cannot load libraries or define native methodsCannot (normally) read or write files on the clientCannot make network connections except to theoriginating hostCannot start any program on the clientCannot read arbitrary system propertiesApplet created windows must look different

SecurityManager object on browser implements securitypolicies: throws a SecurityException when violation isdetected. Applet can catch the SecurityException andprocess it

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 9 / 58

Page 30: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Security Restrictions

Applets pose a security risk: downloaded code (fromwhere?) is executed within the browser on clientmachineDifferent browsers have different security policies andpolicies change, so cannot hard-code...Specifically:

Cannot load libraries or define native methodsCannot (normally) read or write files on the clientCannot make network connections except to theoriginating hostCannot start any program on the clientCannot read arbitrary system propertiesApplet created windows must look different

SecurityManager object on browser implements securitypolicies: throws a SecurityException when violation isdetected. Applet can catch the SecurityException andprocess it

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 9 / 58

Page 31: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Security Restrictions

Applets pose a security risk: downloaded code (fromwhere?) is executed within the browser on clientmachineDifferent browsers have different security policies andpolicies change, so cannot hard-code...Specifically:

Cannot load libraries or define native methodsCannot (normally) read or write files on the clientCannot make network connections except to theoriginating hostCannot start any program on the clientCannot read arbitrary system propertiesApplet created windows must look different

SecurityManager object on browser implements securitypolicies: throws a SecurityException when violation isdetected. Applet can catch the SecurityException andprocess it

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 9 / 58

Page 32: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applet Capabilities

The java.applet package API allows:

Creation of network connections to originating hostDisplay of HTML documents in the same browserInvocation of public methods of other applets on thesame pageLocally-loaded applets are not subject to the samerestrictions

Circumventing the restrictions:

Use a server application on the applet’s hostAllows saving files—on the applet host rather than theclientCommunicate over sockets

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 10 / 58

Page 33: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Applets

Applet Capabilities

The java.applet package API allows:

Creation of network connections to originating hostDisplay of HTML documents in the same browserInvocation of public methods of other applets on thesame pageLocally-loaded applets are not subject to the samerestrictions

Circumventing the restrictions:

Use a server application on the applet’s hostAllows saving files—on the applet host rather than theclientCommunicate over sockets

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 10 / 58

Page 34: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Servlets

Servlets I

Supersedes Common Gateway Interface (CGI) scripts

servlet is a Java class for extending serverscapabilities via a request-response programming model

Servlets run in a container on the host: insulatesservlets from one another and the host from them

All servlets must implement the servlet interface

The servlet life cycle:

1 An incoming request is mapped to a servlet. If there isno servlet instance:

Load the servlet classCreate an instanceInitialize instance via init method

2 Invoke the service method3 Servlet removal achieved via destroy method (includes

finalization)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 11 / 58

Page 35: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Servlets

Servlets IIContainer typically creates a thread for each request:can ensure at most one by implementingSingleThreadModel interface

Otherwise must use synchronized methods and/orsynchronized statements:

public void addName(String name) {

synchronized(this) {

lastName = name;

nameCount++;

}

nameList.add(name);

}

Does not prevent concurrent access to static variablesor external objects

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 12 / 58

Page 36: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Servlets

Sessions

Client state—because HTTP is stateless—e.g. forshopping carts:

Interacts with cookies (q.v.)Session is attribute of HttpServletRequestProvides simple key/value association mechanism

HTTP client cannot signal session is ended, so use atime-to-live counter

If cookies are disabled, session is encoded in URL

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 13 / 58

Page 37: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Servlets

Sessions

Client state—because HTTP is stateless—e.g. forshopping carts:

Interacts with cookies (q.v.)Session is attribute of HttpServletRequestProvides simple key/value association mechanism

HTTP client cannot signal session is ended, so use atime-to-live counter

If cookies are disabled, session is encoded in URL

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 13 / 58

Page 38: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Servlets

Sessions

Client state—because HTTP is stateless—e.g. forshopping carts:

Interacts with cookies (q.v.)Session is attribute of HttpServletRequestProvides simple key/value association mechanism

HTTP client cannot signal session is ended, so use atime-to-live counter

If cookies are disabled, session is encoded in URL

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 13 / 58

Page 39: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Cookies

Cookies I

Invented to solve problem of persistent data

RFC2965, HTTP State Management Mechanism,D.Kristol, L.Montulli, October 2000, ftp://ftp.rfc-editor.org/in-notes/rfc2965.txt

Defines three new headers, Cookie, Cookie2, andSet-Cookie2, to carry state information betweenparticipating origin servers and user agents:

1 User Agent → ServerPOST /acme/login HTTP/1.1

[form data]

User identifies self via a form.2 Server → User Agent

HTTP/1.1 200 OK

Set-Cookie2: Customer="WILE_E_COYOTE"; Version="1"; Path="/acme"

Cookie reflects user’s identity.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 14 / 58

Page 40: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming Cookies

Cookies II3 User Agent → Server

POST /acme/pickitem HTTP/1.1

Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"

[form data]

User selects an item for ”shopping basket”.4 Server → User Agent

HTTP/1.1 200 OK

Set-Cookie2: Part_Number="Rocket_Launcher_0001"; Version="1";

Path="/acme"

Shopping basket contains an item.5 and so on...

Minimum implementation requirements:At least 300 cookiesAt least 4096 bytes per cookieAt least 20 cookies per unique host or domain name

Applications should use as few and as small cookies aspossible, and they should cope gracefully with the lossof a cookie.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 15 / 58

Page 41: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming AJAX

Asynchronous JavaScript and XML

Main feature: allows web applications to retrieve datafrom the server asynchronously in the backgroundwithout interfering with the display and behavior of theexisting page.

Uses XMLHttpRequest object or Remote Scripting

Does not require Javascript, XML or asynchronismPros:

Reduction in client-server traffic: only send changes, notfresh pagesBetter (perceived) response timeReduction in server connections

Cons:Dynamically-created pages 6↔ browser “back” buttonDifficulty in book-markingDifficulty with indexingAccessibility restrictions—e.g. Javascript disabledAbsence of a standard

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 16 / 58

Page 42: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming AJAX

Asynchronous JavaScript and XML

Main feature: allows web applications to retrieve datafrom the server asynchronously in the backgroundwithout interfering with the display and behavior of theexisting page.

Uses XMLHttpRequest object or Remote Scripting

Does not require Javascript, XML or asynchronismPros:

Reduction in client-server traffic: only send changes, notfresh pagesBetter (perceived) response timeReduction in server connections

Cons:Dynamically-created pages 6↔ browser “back” buttonDifficulty in book-markingDifficulty with indexingAccessibility restrictions—e.g. Javascript disabledAbsence of a standard

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 16 / 58

Page 43: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming AJAX

Asynchronous JavaScript and XML

Main feature: allows web applications to retrieve datafrom the server asynchronously in the backgroundwithout interfering with the display and behavior of theexisting page.

Uses XMLHttpRequest object or Remote Scripting

Does not require Javascript, XML or asynchronismPros:

Reduction in client-server traffic: only send changes, notfresh pagesBetter (perceived) response timeReduction in server connections

Cons:Dynamically-created pages 6↔ browser “back” buttonDifficulty in book-markingDifficulty with indexingAccessibility restrictions—e.g. Javascript disabledAbsence of a standard

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 16 / 58

Page 44: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming AJAX

Asynchronous JavaScript and XML

Main feature: allows web applications to retrieve datafrom the server asynchronously in the backgroundwithout interfering with the display and behavior of theexisting page.

Uses XMLHttpRequest object or Remote Scripting

Does not require Javascript, XML or asynchronismPros:

Reduction in client-server traffic: only send changes, notfresh pagesBetter (perceived) response timeReduction in server connections

Cons:Dynamically-created pages 6↔ browser “back” buttonDifficulty in book-markingDifficulty with indexingAccessibility restrictions—e.g. Javascript disabledAbsence of a standard

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 16 / 58

Page 45: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Applets

Servlets

Cookies

AJAX

Distributedprogramming

Programmingframeworks

Summary

Web programming AJAX

Asynchronous JavaScript and XML

Main feature: allows web applications to retrieve datafrom the server asynchronously in the backgroundwithout interfering with the display and behavior of theexisting page.

Uses XMLHttpRequest object or Remote Scripting

Does not require Javascript, XML or asynchronismPros:

Reduction in client-server traffic: only send changes, notfresh pagesBetter (perceived) response timeReduction in server connections

Cons:Dynamically-created pages 6↔ browser “back” buttonDifficulty in book-markingDifficulty with indexingAccessibility restrictions—e.g. Javascript disabledAbsence of a standard

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 16 / 58

Page 46: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Content

1 Web programming

2 Distributed programmingMIMDSIMD/SPMD

3 Programming frameworks

4 Summary

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 17 / 58

Page 47: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Places to put code

Distribute function across network

Distribute data across network

Control: synchronous or asynchronous

laptop/desktop

Condorpool

NationalGrid

Service

LocalCluster

ComputeCloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 18 / 58

Page 48: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Places to put code

Distribute function across network

Distribute data across network

Control: synchronous or asynchronous

laptop/desktop

Condorpool

NationalGrid

Service

LocalCluster

ComputeCloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 18 / 58

Page 49: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Places to put code

Distribute function across network

Distribute data across network

Control: synchronous or asynchronous

laptop/desktop

Condorpool

NationalGrid

Service

LocalCluster

ComputeCloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 18 / 58

Page 50: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Concurrency and Parallelism

Programs components can appear to, or actuallyexecute at the same time new ways for makingmistakes:

1 Synchronization:

Deadlock: suspended waiting for resourcesLivelock: repeatedly checking for resources

2 Race conditions3 Non-determinism

Flynn’s classification identifies four forms:

SISD =Single Instruction

Single Datum

SIMD =Single Instruction

Multiple Data

MISD =Multiple Instruction

Single Datum

MIMD =Multiple Instruction

Multiple Data

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 19 / 58

Page 51: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Concurrency and Parallelism

Programs components can appear to, or actuallyexecute at the same time new ways for makingmistakes:

1 Synchronization:

Deadlock: suspended waiting for resourcesLivelock: repeatedly checking for resources

2 Race conditions3 Non-determinism

Flynn’s classification identifies four forms:

SISD =Single Instruction

Single Datum

SIMD =Single Instruction

Multiple Data

MISD =Multiple Instruction

Single Datum

MIMD =Multiple Instruction

Multiple Data

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 19 / 58

Page 52: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Basic concurrency concepts

What makes it different from sequential computation?

non-determinism:

computations proceed at different ratescommumnications vary in speed and reliability

communication: channels, messages, shared variablesinterference: contention for resourcesatomicity: several (interdependent) steps (must) appearas one and succeed, or fail and all be undone (c.f.finally etc.)cooperation vs. pre-emption: voluntary ceding ofcontrol (suspend) or time-slicingcoordination: mechanisms to balance cooperation orcompetition between tasks while limiting interference

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 20 / 58

Page 53: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Basic concurrency concepts

What makes it different from sequential computation?

non-determinism:

computations proceed at different ratescommumnications vary in speed and reliability

communication: channels, messages, shared variablesinterference: contention for resourcesatomicity: several (interdependent) steps (must) appearas one and succeed, or fail and all be undone (c.f.finally etc.)cooperation vs. pre-emption: voluntary ceding ofcontrol (suspend) or time-slicingcoordination: mechanisms to balance cooperation orcompetition between tasks while limiting interference

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 20 / 58

Page 54: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Basic concurrency concepts

What makes it different from sequential computation?

non-determinism:

computations proceed at different ratescommumnications vary in speed and reliability

communication: channels, messages, shared variablesinterference: contention for resourcesatomicity: several (interdependent) steps (must) appearas one and succeed, or fail and all be undone (c.f.finally etc.)cooperation vs. pre-emption: voluntary ceding ofcontrol (suspend) or time-slicingcoordination: mechanisms to balance cooperation orcompetition between tasks while limiting interference

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 20 / 58

Page 55: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Basic concurrency concepts

What makes it different from sequential computation?

non-determinism:

computations proceed at different ratescommumnications vary in speed and reliability

communication: channels, messages, shared variablesinterference: contention for resourcesatomicity: several (interdependent) steps (must) appearas one and succeed, or fail and all be undone (c.f.finally etc.)cooperation vs. pre-emption: voluntary ceding ofcontrol (suspend) or time-slicingcoordination: mechanisms to balance cooperation orcompetition between tasks while limiting interference

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 20 / 58

Page 56: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Basic concurrency concepts

What makes it different from sequential computation?

non-determinism:

computations proceed at different ratescommumnications vary in speed and reliability

communication: channels, messages, shared variablesinterference: contention for resourcesatomicity: several (interdependent) steps (must) appearas one and succeed, or fail and all be undone (c.f.finally etc.)cooperation vs. pre-emption: voluntary ceding ofcontrol (suspend) or time-slicingcoordination: mechanisms to balance cooperation orcompetition between tasks while limiting interference

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 20 / 58

Page 57: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Basic concurrency concepts

What makes it different from sequential computation?

non-determinism:

computations proceed at different ratescommumnications vary in speed and reliability

communication: channels, messages, shared variablesinterference: contention for resourcesatomicity: several (interdependent) steps (must) appearas one and succeed, or fail and all be undone (c.f.finally etc.)cooperation vs. pre-emption: voluntary ceding ofcontrol (suspend) or time-slicingcoordination: mechanisms to balance cooperation orcompetition between tasks while limiting interference

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 20 / 58

Page 58: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Basic concurrency concepts

What makes it different from sequential computation?

non-determinism:

computations proceed at different ratescommumnications vary in speed and reliability

communication: channels, messages, shared variablesinterference: contention for resourcesatomicity: several (interdependent) steps (must) appearas one and succeed, or fail and all be undone (c.f.finally etc.)cooperation vs. pre-emption: voluntary ceding ofcontrol (suspend) or time-slicingcoordination: mechanisms to balance cooperation orcompetition between tasks while limiting interference

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 20 / 58

Page 59: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Concurrency patterns

Concurrent programs frequently follow one or a mixtureof a few well-known patterns:

producer/consumer: one thread generates data foranother – communication channel becomes source ofcontentionpipeline: general case of producer/consumermaster/slave: break task into independent parts anddistribute, collect and combine resultsdivide and conquer: general case of above, whereslaves become masters of smaller parts

Essential coordination aspect is access to shared data:

Aim: only one task can change data at one timeBut: many can access data – consistency problemSolution: mechanisms for mutual exclusion

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 21 / 58

Page 60: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming

Concurrency patterns

Concurrent programs frequently follow one or a mixtureof a few well-known patterns:

producer/consumer: one thread generates data foranother – communication channel becomes source ofcontentionpipeline: general case of producer/consumermaster/slave: break task into independent parts anddistribute, collect and combine resultsdivide and conquer: general case of above, whereslaves become masters of smaller parts

Essential coordination aspect is access to shared data:

Aim: only one task can change data at one timeBut: many can access data – consistency problemSolution: mechanisms for mutual exclusion

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 21 / 58

Page 61: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming MIMD

Object spaces

Low-level synchronization mechanisms are (very)difficult to use safely

(A) Solution: Decouple sender and receivertemporally – need not co-existspatially – need no information about each other

Origins:Linda [Carriero and Gelernter, 1989]Chemical Abstract Machine (CHAM)[Berry and Boudol, 1990]Unity [Chandy and Misra, 1988]

Conceptually:

message/tuplepool

T1

T2Tn−1

Tn

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 22 / 58

Page 62: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming MIMD

Object spaces

Low-level synchronization mechanisms are (very)difficult to use safely

(A) Solution: Decouple sender and receivertemporally – need not co-existspatially – need no information about each other

Origins:Linda [Carriero and Gelernter, 1989]Chemical Abstract Machine (CHAM)[Berry and Boudol, 1990]Unity [Chandy and Misra, 1988]

Conceptually:

message/tuplepool

T1

T2Tn−1

Tn

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 22 / 58

Page 63: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming MIMD

Object spaces

Low-level synchronization mechanisms are (very)difficult to use safely

(A) Solution: Decouple sender and receivertemporally – need not co-existspatially – need no information about each other

Origins:Linda [Carriero and Gelernter, 1989]Chemical Abstract Machine (CHAM)[Berry and Boudol, 1990]Unity [Chandy and Misra, 1988]

Conceptually:

message/tuplepool

T1

T2Tn−1

Tn

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 22 / 58

Page 64: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming MIMD

Object spaces

Low-level synchronization mechanisms are (very)difficult to use safely

(A) Solution: Decouple sender and receivertemporally – need not co-existspatially – need no information about each other

Origins:Linda [Carriero and Gelernter, 1989]Chemical Abstract Machine (CHAM)[Berry and Boudol, 1990]Unity [Chandy and Misra, 1988]

Conceptually:

message/tuplepool

T1

T2Tn−1

Tn

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 22 / 58

Page 65: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming MIMD

Object space operations

Operations in Linda (cf. Javaspaces[Freeman et al., 1999]):

in(pattern): where pattern = tag(elt1,...,eltn),tag is a literal, elt is constant or a variableaction: match pattern against tuples in pool, select one(if exists) and remove, otherwise blockout(tagged tuple): constructs tuple and stores inpoolrd(pattern): as input but fails instead of blocking ifno match. ≡ testing a semaphore before waiting...equally useless (race condition).eval(expression): creates new taks to evaluateexpression, result is written to pool.

Conceptually attractive, easy to learn, easy to use, butcan be problematic in practice.

Javaspaces: http:

//java.sun.com/developer/Books/JavaSpaces

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 23 / 58

Page 66: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming MIMD

Object space operations

Operations in Linda (cf. Javaspaces[Freeman et al., 1999]):

in(pattern): where pattern = tag(elt1,...,eltn),tag is a literal, elt is constant or a variableaction: match pattern against tuples in pool, select one(if exists) and remove, otherwise blockout(tagged tuple): constructs tuple and stores inpoolrd(pattern): as input but fails instead of blocking ifno match. ≡ testing a semaphore before waiting...equally useless (race condition).eval(expression): creates new taks to evaluateexpression, result is written to pool.

Conceptually attractive, easy to learn, easy to use, butcan be problematic in practice.

Javaspaces: http:

//java.sun.com/developer/Books/JavaSpaces

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 23 / 58

Page 67: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming MIMD

Object space operations

Operations in Linda (cf. Javaspaces[Freeman et al., 1999]):

in(pattern): where pattern = tag(elt1,...,eltn),tag is a literal, elt is constant or a variableaction: match pattern against tuples in pool, select one(if exists) and remove, otherwise blockout(tagged tuple): constructs tuple and stores inpoolrd(pattern): as input but fails instead of blocking ifno match. ≡ testing a semaphore before waiting...equally useless (race condition).eval(expression): creates new taks to evaluateexpression, result is written to pool.

Conceptually attractive, easy to learn, easy to use, butcan be problematic in practice.

Javaspaces: http:

//java.sun.com/developer/Books/JavaSpaces

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 23 / 58

Page 68: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming SIMD/SPMD

Data Parallelism

SIMD: single instruction multiple data; requires specialhardware (or data abstraction)

History:

vector processing – 1970s, 80s, 90s. Manufacturers:Contol Data Corporation (CDC), Cray. Vectorextensions to FORTRAN.array processors – 1980s, 90s. Manufacturers: ICL(Distributed Array Processor), Thinking MachinesConnection Machines, Maspar. Architecture: thousandsof small processors (e.g. 1 bit ALU, 16K memory),local+global connection network. Array extensions toFORTRAN.Languages: C*, FORTRAN*, *Lisp

SPMD: single program multiple data; can be realized ona Condor pool, commodity hardware cluster or computecloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 24 / 58

Page 69: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming SIMD/SPMD

Data Parallelism

SIMD: single instruction multiple data; requires specialhardware (or data abstraction)

History:

vector processing – 1970s, 80s, 90s. Manufacturers:Contol Data Corporation (CDC), Cray. Vectorextensions to FORTRAN.array processors – 1980s, 90s. Manufacturers: ICL(Distributed Array Processor), Thinking MachinesConnection Machines, Maspar. Architecture: thousandsof small processors (e.g. 1 bit ALU, 16K memory),local+global connection network. Array extensions toFORTRAN.Languages: C*, FORTRAN*, *Lisp

SPMD: single program multiple data; can be realized ona Condor pool, commodity hardware cluster or computecloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 24 / 58

Page 70: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

MIMD

SIMD/SPMD

Programmingframeworks

Summary

Distributed programming SIMD/SPMD

Data Parallelism

SIMD: single instruction multiple data; requires specialhardware (or data abstraction)

History:

vector processing – 1970s, 80s, 90s. Manufacturers:Contol Data Corporation (CDC), Cray. Vectorextensions to FORTRAN.array processors – 1980s, 90s. Manufacturers: ICL(Distributed Array Processor), Thinking MachinesConnection Machines, Maspar. Architecture: thousandsof small processors (e.g. 1 bit ALU, 16K memory),local+global connection network. Array extensions toFORTRAN.Languages: C*, FORTRAN*, *Lisp

SPMD: single program multiple data; can be realized ona Condor pool, commodity hardware cluster or computecloud

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 24 / 58

Page 71: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks

Content

1 Web programming

2 Distributed programming

3 Programming frameworksMPIMap ReduceMemcachedHADOOP

4 Summary

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 25 / 58

Page 72: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI summary I

De facto standard for SPMD programming

Point-to-point and collective communications

MPI belongs in layers 5 and higher (session,presentation, application) of the OSI Reference Model

Implementations cover most layers of the referencemodel, with socket and TCP being used in thetransport layer

Functionality:

Virtual topologySynchronizationCommunication

between a set of processes that have been mapped tonodes/servers/computer instances

API extends several widely-used languages

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 26 / 58

Page 73: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI summary IIFeatures:

Point-to-point rendezvous-type send/receiveoperations—synchronous, asynchronous, buffered, andready formsChoice of Cartesian or graph-like logical processtopologyExchange of data between process pairs (send/receiveoperations)Combination of partial results of computations(gathering and reduction operations)Synchronization of nodes (barrier operation)Environmental enquiry operations (number of processesin session, processor identity, neighbouring processes,etc.)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 27 / 58

Page 74: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Primary user is e-Science (teraflops, terabytes)

Distributed systems + small programs = slow

Message overheads > gains from parallelism

MPI supports SPMD + message-passing

Functionality:

library of messaging functionsworks with (unmodified) C, Fortran, etc.even Java, Python and othersMPI is a standard not an implementation

Example:

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 28 / 58

Page 75: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Primary user is e-Science (teraflops, terabytes)

Distributed systems + small programs = slow

Message overheads > gains from parallelism

MPI supports SPMD + message-passing

Functionality:

library of messaging functionsworks with (unmodified) C, Fortran, etc.even Java, Python and othersMPI is a standard not an implementation

Example:

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 28 / 58

Page 76: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Primary user is e-Science (teraflops, terabytes)

Distributed systems + small programs = slow

Message overheads > gains from parallelism

MPI supports SPMD + message-passing

Functionality:

library of messaging functionsworks with (unmodified) C, Fortran, etc.even Java, Python and othersMPI is a standard not an implementation

Example:

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 28 / 58

Page 77: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Primary user is e-Science (teraflops, terabytes)

Distributed systems + small programs = slow

Message overheads > gains from parallelism

MPI supports SPMD + message-passing

Functionality:

library of messaging functionsworks with (unmodified) C, Fortran, etc.even Java, Python and othersMPI is a standard not an implementation

Example:

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 28 / 58

Page 78: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Primary user is e-Science (teraflops, terabytes)

Distributed systems + small programs = slow

Message overheads > gains from parallelism

MPI supports SPMD + message-passing

Functionality:

library of messaging functionsworks with (unmodified) C, Fortran, etc.even Java, Python and othersMPI is a standard not an implementation

Example:

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 28 / 58

Page 79: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Primary user is e-Science (teraflops, terabytes)

Distributed systems + small programs = slow

Message overheads > gains from parallelism

MPI supports SPMD + message-passing

Functionality:

library of messaging functionsworks with (unmodified) C, Fortran, etc.even Java, Python and othersMPI is a standard not an implementation

Example:

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 28 / 58

Page 80: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

1 #include <stdio.h>2 #include <mpi.h>3

4 int main(int argc, char **argv)5 {6 int rc, myrank, nproc, namelen;7 char name[128];8

9 rc = MPI_Init(&argc, &argv);10 if (rc != MPI_SUCCESS) {11 printf ("Error starting MPI program\n");12 MPI_Abort(MPI_COMM_WORLD, rc);13 }14

15 MPI_Comm_rank(MPI_COMM_WORLD, &myrank);16 MPI_Comm_size(MPI_COMM_WORLD, &nproc);17

18 if (myrank == 0) {19 printf("main reports %d procs\n", nproc);20 }21

22 namelen = 128;23 MPI_Get_processor_name(name, &namelen);24 printf("hello world %d from ’%s’\n", myrank);25

26 MPI_Finalize();27 return 0;28 }

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 29 / 58

Page 81: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Notes:

MPI Init(&argc, &argv); – system set up; createsall the network connections;

rc – check it workedMPI COMM WORLD – system can be divided into subsetsof processors called communicators:

The WORLD communicator is all processorsMPI COMM SELF refers to the calling processor

MPI Comm rank – each process in a communicator has aunique rank [0, (size of the communicator −1)]

For WORLD the rank is 0 to (# of processors −1)

MPI Comm size – Size of the communicator

if (myrank == i) – All processors run the same code(SPMD). This is how to get a kind of MIMD

MPI Finalize – Tidy up. Forces receipt of messages.Establishes a barrier synchronization.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 30 / 58

Page 82: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

A basic problem is how to get data from one processorto another

Processor A sends data (integers, floats, strings, etc.)to B

A uses a send function, and B uses a receive function

1 int n[5];2 ...3 if (myrank == 0) {4 MPI_Send(n, 5, MPI_INT, 1, 99, MPI_COMM_WORLD);5 }6 else if (myrank == 1) {7 MPI_Status stat;8 MPI_Recv(n, 5, MPI_INT, 0, 99, MPI_COMM_WORLD, &stat);9 }

This supposes A has rank 0, B rank 1

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 31 / 58

Page 83: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

A basic problem is how to get data from one processorto another

Processor A sends data (integers, floats, strings, etc.)to B

A uses a send function, and B uses a receive function

1 int n[5];2 ...3 if (myrank == 0) {4 MPI_Send(n, 5, MPI_INT, 1, 99, MPI_COMM_WORLD);5 }6 else if (myrank == 1) {7 MPI_Status stat;8 MPI_Recv(n, 5, MPI_INT, 0, 99, MPI_COMM_WORLD, &stat);9 }

This supposes A has rank 0, B rank 1

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 31 / 58

Page 84: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

A basic problem is how to get data from one processorto another

Processor A sends data (integers, floats, strings, etc.)to B

A uses a send function, and B uses a receive function

1 int n[5];2 ...3 if (myrank == 0) {4 MPI_Send(n, 5, MPI_INT, 1, 99, MPI_COMM_WORLD);5 }6 else if (myrank == 1) {7 MPI_Status stat;8 MPI_Recv(n, 5, MPI_INT, 0, 99, MPI_COMM_WORLD, &stat);9 }

This supposes A has rank 0, B rank 1

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 31 / 58

Page 85: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

A basic problem is how to get data from one processorto another

Processor A sends data (integers, floats, strings, etc.)to B

A uses a send function, and B uses a receive function

1 int n[5];2 ...3 if (myrank == 0) {4 MPI_Send(n, 5, MPI_INT, 1, 99, MPI_COMM_WORLD);5 }6 else if (myrank == 1) {7 MPI_Status stat;8 MPI_Recv(n, 5, MPI_INT, 0, 99, MPI_COMM_WORLD, &stat);9 }

This supposes A has rank 0, B rank 1

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 31 / 58

Page 86: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

A sends

n – A pointer to a memory location containing thedata; can be a single variable or an array of values

5 – The number of items to send

MPI INT – The type of the items

1 – The rank of the destination

99 – An integer tag to help identify a particularcommunicatoin in the message logs...

MPI COMM WORLD – The communicator

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 32 / 58

Page 87: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

B receives

n – A pointer to a memory location from where to readthe data

5 – The number of items to read

MPI INT – The type of the items

0 – The rank of the source

99 – The tag on the message receiver wants: useMPI ANY TAG otherwise

MPI COMM WORLD – The communicator

stat – A structure containing the status of thetransfer, in particular the source and tag; and the errortype in case of an error

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 33 / 58

Page 88: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

B receives

n – A pointer to a memory location from where to readthe data

5 – The number of items to read

MPI INT – The type of the items

0 – The rank of the source

99 – The tag on the message receiver wants: useMPI ANY TAG otherwise

MPI COMM WORLD – The communicator

stat – A structure containing the status of thetransfer, in particular the source and tag; and the errortype in case of an error

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 33 / 58

Page 89: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

B receives

n – A pointer to a memory location from where to readthe data

5 – The number of items to read

MPI INT – The type of the items

0 – The rank of the source

99 – The tag on the message receiver wants: useMPI ANY TAG otherwise

MPI COMM WORLD – The communicator

stat – A structure containing the status of thetransfer, in particular the source and tag; and the errortype in case of an error

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 33 / 58

Page 90: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

B receives

n – A pointer to a memory location from where to readthe data

5 – The number of items to read

MPI INT – The type of the items

0 – The rank of the source

99 – The tag on the message receiver wants: useMPI ANY TAG otherwise

MPI COMM WORLD – The communicator

stat – A structure containing the status of thetransfer, in particular the source and tag; and the errortype in case of an error

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 33 / 58

Page 91: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

B receives

n – A pointer to a memory location from where to readthe data

5 – The number of items to read

MPI INT – The type of the items

0 – The rank of the source

99 – The tag on the message receiver wants: useMPI ANY TAG otherwise

MPI COMM WORLD – The communicator

stat – A structure containing the status of thetransfer, in particular the source and tag; and the errortype in case of an error

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 33 / 58

Page 92: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

B receives

n – A pointer to a memory location from where to readthe data

5 – The number of items to read

MPI INT – The type of the items

0 – The rank of the source

99 – The tag on the message receiver wants: useMPI ANY TAG otherwise

MPI COMM WORLD – The communicator

stat – A structure containing the status of thetransfer, in particular the source and tag; and the errortype in case of an error

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 33 / 58

Page 93: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

B receives

n – A pointer to a memory location from where to readthe data

5 – The number of items to read

MPI INT – The type of the items

0 – The rank of the source

99 – The tag on the message receiver wants: useMPI ANY TAG otherwise

MPI COMM WORLD – The communicator

stat – A structure containing the status of thetransfer, in particular the source and tag; and the errortype in case of an error

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 33 / 58

Page 94: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 95: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 96: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 97: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 98: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 99: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 100: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 101: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 102: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 103: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 104: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Types include:

MPI CHAR, MPI SHORT, MPI INT, MPI LONG,MPI FLOAT, MPI DOUBLE, MPI BYTE

MPI Send and MPI Recv are blocking operations:

MPI Send waits until the data has been sentThe buffer n in A can be safely reused when MPI Send

returnsBut the data may not have reached or been read by B

Similarly, MPI Recv waits until the all data is copied

This provides weak synchronisation between A and B

But asynchronism requires careful programming

Messages take a significant time to be transmitted

A send and a receive are not simultaneous

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 34 / 58

Page 105: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Other communication operators include:

MPI Ssend – Waits until the destination has started toreceive the message

MPI Isend – Send, but do not wait – how soon canbuffer be (re-)used?

MPI Irecv – Indicate a buffer for data storage, but donot wait for it; the data transferred asynchronously

MPI Wait – Block until a given non-blocking send orrecv has completed

And lots more

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 35 / 58

Page 106: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Other communication operators include:

MPI Ssend – Waits until the destination has started toreceive the message

MPI Isend – Send, but do not wait – how soon canbuffer be (re-)used?

MPI Irecv – Indicate a buffer for data storage, but donot wait for it; the data transferred asynchronously

MPI Wait – Block until a given non-blocking send orrecv has completed

And lots more

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 35 / 58

Page 107: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Other communication operators include:

MPI Ssend – Waits until the destination has started toreceive the message

MPI Isend – Send, but do not wait – how soon canbuffer be (re-)used?

MPI Irecv – Indicate a buffer for data storage, but donot wait for it; the data transferred asynchronously

MPI Wait – Block until a given non-blocking send orrecv has completed

And lots more

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 35 / 58

Page 108: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Other communication operators include:

MPI Ssend – Waits until the destination has started toreceive the message

MPI Isend – Send, but do not wait – how soon canbuffer be (re-)used?

MPI Irecv – Indicate a buffer for data storage, but donot wait for it; the data transferred asynchronously

MPI Wait – Block until a given non-blocking send orrecv has completed

And lots more

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 35 / 58

Page 109: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Other communication operators include:

MPI Ssend – Waits until the destination has started toreceive the message

MPI Isend – Send, but do not wait – how soon canbuffer be (re-)used?

MPI Irecv – Indicate a buffer for data storage, but donot wait for it; the data transferred asynchronously

MPI Wait – Block until a given non-blocking send orrecv has completed

And lots more

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 35 / 58

Page 110: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 111: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 112: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 113: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 114: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 115: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 116: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 117: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 118: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 119: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 120: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 121: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Simple synchronisation: MPI Barrier(MPI Comm

comm);

This blocks until all the processes in the communicatorhave reached the barrier

Properties of messaging:

MPI messages are in order, but not fair:

A sends M1 then M2 to BB receives M1, then M2

because messages from one source to the samedestination do not overtake each other

However:a prior (but unread) message from A to B may beovertaken by a later message from C to Bthere is no guarantee of order on messages fromdifferent sources

“not fair” means “not guaranteed fair”

Normally events happen as expected, but not always

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 36 / 58

Page 122: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Send and receive are for point-to-point messages: onesource and one destination

There are several others:

broadcastscattergatherreducescan

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 37 / 58

Page 123: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Send and receive are for point-to-point messages: onesource and one destination

There are several others:

broadcastscattergatherreducescan

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 37 / 58

Page 124: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Send and receive are for point-to-point messages: onesource and one destination

There are several others:

broadcastscattergatherreducescan

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 37 / 58

Page 125: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Send and receive are for point-to-point messages: onesource and one destination

There are several others:

broadcastscattergatherreducescan

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 37 / 58

Page 126: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Send and receive are for point-to-point messages: onesource and one destination

There are several others:

broadcastscattergatherreducescan

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 37 / 58

Page 127: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Send and receive are for point-to-point messages: onesource and one destination

There are several others:

broadcastscattergatherreducescan

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 37 / 58

Page 128: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Send and receive are for point-to-point messages: onesource and one destination

There are several others:

broadcastscattergatherreducescan

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 37 / 58

Page 129: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI_Bcast(void* buffer, int count,

MPI_Datatype datatype, int root,

MPI_Comm comm);

Data sent from process with rank root to all otherprocesses (in the communicator)

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 38 / 58

Page 130: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

All processes, including the receivers, should callMPI Bcast with the same value for root

int n[2];

if (myrank == 1) {

n[0] = 23;

n[1] = 42;

}

...

MPI_Bcast(n, 2, MPI_INT, 1, MPI_COMM_WORLD);

All processes will now have the same values for theircopies of n

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 39 / 58

Page 131: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI_Scatter(void* sendbuf,int sendcount,

MPI_Datatype sendtype, void* recvbuf,

int recvcount, MPI_Datatype recvtype,

int root, MPI_Comm comm);

Takes the data sendbuf in processor with rank root, andsends sendcount items from the array to every otherprocessor (and to itself) to be stored in recvbuf:

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 40 / 58

Page 132: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI_Gather(void* sendbuf, int sendcount,

MPI_Datatype sendtype, void* recvbuf,

int recvcount, MPI_Datatype recvtype, int root,

MPI_Comm comm);

Takes sendcount elements of data sendbuf from eachprocessor and puts them in the array recvbuf onprocessor root

MPI Gather is the inverse of MPI Scatter

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 41 / 58

Page 133: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI_Reduce(void* sendbuf, void* recvbuf, int count,

MPI_Datatype datatype, MPI_Op op, int root,

MPI_Comm comm);

Applies a reduction operation op to each value insendbuf, putting the result(s) into recvbuf onprocessor root

Operations include: max, min, +, *, ∧, ∨Can also use programmer-define reduction operators

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 42 / 58

Page 134: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI_Scan(void* sendbuf, void* recvbuf,

int count, MPI_Datatype datatype,

MPI_Op op, MPI_Comm comm);

A prefix scan of the source sendbuf

Processor of rank i gets the reduction of values fromprocessors 0 . . . i stored in its recvbuf

Prefix scans are very useful in parallel algorithms

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 43 / 58

Page 135: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 136: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 137: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 138: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 139: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 140: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 141: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 142: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 143: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

Need to think carefully about messages to getmaximum efficiency

For example, reduce latency through non-blockingoperations

Compute something else while a receive operationcompletes, then go backRequires careful programmingOverlapping communication and computation is a good,but delicate

Also:

messaging has a high overhead: needs (very) largeprogramshard to program effectively: needs experiencelacks dynamism: processor count is fixed and cannotchange during execution

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 44 / 58

Page 144: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI has succeeded for many reasons

An open standard, inviting several competingimplementations

Thus implementations are optimised and efficient

MPI is simple in concept, so straightforward to program(not necessarily easy to program. . . )

MPI is flexible as it contains lots of kinds ofcommunication

MPI is supported by many languages and environments

MPI scales well to very large problems

The MPI standard is still being developed and updated

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 45 / 58

Page 145: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI has succeeded for many reasons

An open standard, inviting several competingimplementations

Thus implementations are optimised and efficient

MPI is simple in concept, so straightforward to program(not necessarily easy to program. . . )

MPI is flexible as it contains lots of kinds ofcommunication

MPI is supported by many languages and environments

MPI scales well to very large problems

The MPI standard is still being developed and updated

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 45 / 58

Page 146: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI has succeeded for many reasons

An open standard, inviting several competingimplementations

Thus implementations are optimised and efficient

MPI is simple in concept, so straightforward to program(not necessarily easy to program. . . )

MPI is flexible as it contains lots of kinds ofcommunication

MPI is supported by many languages and environments

MPI scales well to very large problems

The MPI standard is still being developed and updated

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 45 / 58

Page 147: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI has succeeded for many reasons

An open standard, inviting several competingimplementations

Thus implementations are optimised and efficient

MPI is simple in concept, so straightforward to program(not necessarily easy to program. . . )

MPI is flexible as it contains lots of kinds ofcommunication

MPI is supported by many languages and environments

MPI scales well to very large problems

The MPI standard is still being developed and updated

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 45 / 58

Page 148: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI has succeeded for many reasons

An open standard, inviting several competingimplementations

Thus implementations are optimised and efficient

MPI is simple in concept, so straightforward to program(not necessarily easy to program. . . )

MPI is flexible as it contains lots of kinds ofcommunication

MPI is supported by many languages and environments

MPI scales well to very large problems

The MPI standard is still being developed and updated

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 45 / 58

Page 149: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks MPI

MPI

MPI has succeeded for many reasons

An open standard, inviting several competingimplementations

Thus implementations are optimised and efficient

MPI is simple in concept, so straightforward to program(not necessarily easy to program. . . )

MPI is flexible as it contains lots of kinds ofcommunication

MPI is supported by many languages and environments

MPI scales well to very large problems

The MPI standard is still being developed and updated

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 45 / 58

Page 150: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

The Google problem

Motivation: Large Scale Data Processing

Many tasks: Process lots of data to produce other data

Want to use hundreds or thousands of CPUs... but thisneeds to be easy

MapReduce provides:

Automatic parallelization and distributionFault-toleranceI/O schedulingStatus and monitoring

Content adapted fromhttp://labs.google.com/papers/mapreduce.html, retrieved

20081026.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 46 / 58

Page 151: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

The Google problem

Motivation: Large Scale Data Processing

Many tasks: Process lots of data to produce other data

Want to use hundreds or thousands of CPUs... but thisneeds to be easy

MapReduce provides:

Automatic parallelization and distributionFault-toleranceI/O schedulingStatus and monitoring

Content adapted fromhttp://labs.google.com/papers/mapreduce.html, retrieved

20081026.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 46 / 58

Page 152: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

The Google problem

Motivation: Large Scale Data Processing

Many tasks: Process lots of data to produce other data

Want to use hundreds or thousands of CPUs... but thisneeds to be easy

MapReduce provides:

Automatic parallelization and distributionFault-toleranceI/O schedulingStatus and monitoring

Content adapted fromhttp://labs.google.com/papers/mapreduce.html, retrieved

20081026.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 46 / 58

Page 153: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

The Google problem

Motivation: Large Scale Data Processing

Many tasks: Process lots of data to produce other data

Want to use hundreds or thousands of CPUs... but thisneeds to be easy

MapReduce provides:

Automatic parallelization and distributionFault-toleranceI/O schedulingStatus and monitoring

Content adapted fromhttp://labs.google.com/papers/mapreduce.html, retrieved

20081026.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 46 / 58

Page 154: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

The MapReduce model I

Input & Output: each a set of key/value pairs

Programmer specifies two functions:

map (in_key, in_value) ->

list(out_key, intermediate_value)

Processes input key/value pairProduces set of intermediate pairs

reduce (out_key, list(intermediate_value)) ->

list(out_value)

Combines all intermediate values for a particular keyProduces a set of merged output values (usually justone)

Inspired by similar primitives in LISP and otherlanguages

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 47 / 58

Page 155: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

The MapReduce model II

Fine granularity tasks: many more map tasks thanmachines

Minimizes time for fault recoveryCan pipeline shuffling with map executionBetter dynamic load balancing

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 48 / 58

Page 156: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

The MapReduce model IIIOften use 200,000 map/5000 reduce tasks w/ 2000machines

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 49 / 58

Page 157: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 158: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 159: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 160: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 161: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 162: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 163: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 164: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Map Reduce

Fault tolerance + Redundancy

When processes break: re-executeDetect failure via periodic heartbeatsRe-execute completed and in-progress map tasksRe-execute in progress reduce tasksTask completion committed through master

Master failure: not consideredRobust: task completed even when 1600 out of 1800machines failedSlow workers significantly lengthen completion timeSolution: Near end of phase, spawn backup copies oftasks—whichever one finishes first “wins”Effect: Dramatically shortens job completion timeRewrote Google’s production indexing system into 24MapReduce operationsStatistics: #jobs = 29,423, avg. duration = 634secs,total machine time = 79,186days, input data =3,288TB

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 50 / 58

Page 165: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Memcached

memcached

High-performance, distributed memory object cachePurpose: speed up dynamic web applications byreducing database loadFunctionality: a giant hash table distributed acrossmultiple machinesBehaviour: when the table is full, subsequent insertscause older data to be purged in least recently used(LRU) orderUse case: developed by Danga Interactive forLiveJournal.com:

Site statistics: 20 million+ dynamic page views per dayfor 1 million usersReduced database load close to 0; faster page loadtimes; better resource utilization, and faster access todatabases on a memcache miss.

Summary athttp://en.wikipedia.org/wiki/Memcached

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 51 / 58

Page 166: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Memcached

memcached

High-performance, distributed memory object cachePurpose: speed up dynamic web applications byreducing database loadFunctionality: a giant hash table distributed acrossmultiple machinesBehaviour: when the table is full, subsequent insertscause older data to be purged in least recently used(LRU) orderUse case: developed by Danga Interactive forLiveJournal.com:

Site statistics: 20 million+ dynamic page views per dayfor 1 million usersReduced database load close to 0; faster page loadtimes; better resource utilization, and faster access todatabases on a memcache miss.

Summary athttp://en.wikipedia.org/wiki/Memcached

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 51 / 58

Page 167: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Memcached

memcached

High-performance, distributed memory object cachePurpose: speed up dynamic web applications byreducing database loadFunctionality: a giant hash table distributed acrossmultiple machinesBehaviour: when the table is full, subsequent insertscause older data to be purged in least recently used(LRU) orderUse case: developed by Danga Interactive forLiveJournal.com:

Site statistics: 20 million+ dynamic page views per dayfor 1 million usersReduced database load close to 0; faster page loadtimes; better resource utilization, and faster access todatabases on a memcache miss.

Summary athttp://en.wikipedia.org/wiki/Memcached

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 51 / 58

Page 168: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Memcached

memcached

High-performance, distributed memory object cachePurpose: speed up dynamic web applications byreducing database loadFunctionality: a giant hash table distributed acrossmultiple machinesBehaviour: when the table is full, subsequent insertscause older data to be purged in least recently used(LRU) orderUse case: developed by Danga Interactive forLiveJournal.com:

Site statistics: 20 million+ dynamic page views per dayfor 1 million usersReduced database load close to 0; faster page loadtimes; better resource utilization, and faster access todatabases on a memcache miss.

Summary athttp://en.wikipedia.org/wiki/Memcached

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 51 / 58

Page 169: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Memcached

memcached

High-performance, distributed memory object cachePurpose: speed up dynamic web applications byreducing database loadFunctionality: a giant hash table distributed acrossmultiple machinesBehaviour: when the table is full, subsequent insertscause older data to be purged in least recently used(LRU) orderUse case: developed by Danga Interactive forLiveJournal.com:

Site statistics: 20 million+ dynamic page views per dayfor 1 million usersReduced database load close to 0; faster page loadtimes; better resource utilization, and faster access todatabases on a memcache miss.

Summary athttp://en.wikipedia.org/wiki/Memcached

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 51 / 58

Page 170: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks Memcached

memcached

High-performance, distributed memory object cachePurpose: speed up dynamic web applications byreducing database loadFunctionality: a giant hash table distributed acrossmultiple machinesBehaviour: when the table is full, subsequent insertscause older data to be purged in least recently used(LRU) orderUse case: developed by Danga Interactive forLiveJournal.com:

Site statistics: 20 million+ dynamic page views per dayfor 1 million usersReduced database load close to 0; faster page loadtimes; better resource utilization, and faster access todatabases on a memcache miss.

Summary athttp://en.wikipedia.org/wiki/Memcached

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 51 / 58

Page 171: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP

Distributed processing of large data sets

On clusters of computers

Simple programming model

Scalable

Reliability model in software high-availability onunreliable hardwareComponents:

Avro: A data serialization system.Cassandra: A scalable multi-master database with nosingle points of failure.HBase: A scalable, distributed database that supportsstructured data storage for large tables.Hive: A data warehouse infrastructure that providesdata summarization and ad hoc querying.Mahout: machine learning and data mining library.Pig: A high-level data-flow language and executionframework for parallel computation.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 52 / 58

Page 172: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP

Distributed processing of large data sets

On clusters of computers

Simple programming model

Scalable

Reliability model in software high-availability onunreliable hardwareComponents:

Avro: A data serialization system.Cassandra: A scalable multi-master database with nosingle points of failure.HBase: A scalable, distributed database that supportsstructured data storage for large tables.Hive: A data warehouse infrastructure that providesdata summarization and ad hoc querying.Mahout: machine learning and data mining library.Pig: A high-level data-flow language and executionframework for parallel computation.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 52 / 58

Page 173: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP

Distributed processing of large data sets

On clusters of computers

Simple programming model

Scalable

Reliability model in software high-availability onunreliable hardwareComponents:

Avro: A data serialization system.Cassandra: A scalable multi-master database with nosingle points of failure.HBase: A scalable, distributed database that supportsstructured data storage for large tables.Hive: A data warehouse infrastructure that providesdata summarization and ad hoc querying.Mahout: machine learning and data mining library.Pig: A high-level data-flow language and executionframework for parallel computation.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 52 / 58

Page 174: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP

Distributed processing of large data sets

On clusters of computers

Simple programming model

Scalable

Reliability model in software high-availability onunreliable hardwareComponents:

Avro: A data serialization system.Cassandra: A scalable multi-master database with nosingle points of failure.HBase: A scalable, distributed database that supportsstructured data storage for large tables.Hive: A data warehouse infrastructure that providesdata summarization and ad hoc querying.Mahout: machine learning and data mining library.Pig: A high-level data-flow language and executionframework for parallel computation.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 52 / 58

Page 175: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP

Distributed processing of large data sets

On clusters of computers

Simple programming model

Scalable

Reliability model in software high-availability onunreliable hardwareComponents:

Avro: A data serialization system.Cassandra: A scalable multi-master database with nosingle points of failure.HBase: A scalable, distributed database that supportsstructured data storage for large tables.Hive: A data warehouse infrastructure that providesdata summarization and ad hoc querying.Mahout: machine learning and data mining library.Pig: A high-level data-flow language and executionframework for parallel computation.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 52 / 58

Page 176: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP

Distributed processing of large data sets

On clusters of computers

Simple programming model

Scalable

Reliability model in software high-availability onunreliable hardwareComponents:

Avro: A data serialization system.Cassandra: A scalable multi-master database with nosingle points of failure.HBase: A scalable, distributed database that supportsstructured data storage for large tables.Hive: A data warehouse infrastructure that providesdata summarization and ad hoc querying.Mahout: machine learning and data mining library.Pig: A high-level data-flow language and executionframework for parallel computation.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 52 / 58

Page 177: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP users

Facebook: We use Hadoop to store copies of internal logand dimension data sources and use it as a source forreporting/analytics and machine learning.Currently we have 2 major clusters:

A 1100-machine cluster with 8800 cores and about 12PB raw storage.

A 300-machine cluster with 2400 cores and about 3 PBraw storage.

Each (commodity) node has 8 cores and 12 TB ofstorage.

We are heavy users of both streaming as well as theJava APIs. We have built a higher level datawarehousing framework using Hive

http://wiki.apache.org/hadoop/PoweredBy

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 53 / 58

Page 178: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP users

Facebook: We use Hadoop to store copies of internal logand dimension data sources and use it as a source forreporting/analytics and machine learning.Currently we have 2 major clusters:

A 1100-machine cluster with 8800 cores and about 12PB raw storage.

A 300-machine cluster with 2400 cores and about 3 PBraw storage.

Each (commodity) node has 8 cores and 12 TB ofstorage.

We are heavy users of both streaming as well as theJava APIs. We have built a higher level datawarehousing framework using Hive

http://wiki.apache.org/hadoop/PoweredBy

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 53 / 58

Page 179: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP users

Facebook: We use Hadoop to store copies of internal logand dimension data sources and use it as a source forreporting/analytics and machine learning.Currently we have 2 major clusters:

A 1100-machine cluster with 8800 cores and about 12PB raw storage.

A 300-machine cluster with 2400 cores and about 3 PBraw storage.

Each (commodity) node has 8 cores and 12 TB ofstorage.

We are heavy users of both streaming as well as theJava APIs. We have built a higher level datawarehousing framework using Hive

http://wiki.apache.org/hadoop/PoweredBy

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 53 / 58

Page 180: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

MPI

Map Reduce

Memcached

HADOOP

Summary

Programming frameworks HADOOP

HADOOP users

Facebook: We use Hadoop to store copies of internal logand dimension data sources and use it as a source forreporting/analytics and machine learning.Currently we have 2 major clusters:

A 1100-machine cluster with 8800 cores and about 12PB raw storage.

A 300-machine cluster with 2400 cores and about 3 PBraw storage.

Each (commodity) node has 8 cores and 12 TB ofstorage.

We are heavy users of both streaming as well as theJava APIs. We have built a higher level datawarehousing framework using Hive

http://wiki.apache.org/hadoop/PoweredBy

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 53 / 58

Page 181: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Content

1 Web programming

2 Distributed programming

3 Programming frameworks

4 Summary

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 54 / 58

Page 182: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Summary

Web programming is a special case of distributedprogramming

Large number of tools/frameworks—best learnt on need

Basic principle: code in browser + code in server +communication

Distributed programming in general is hard to debugand hard to get right

Fortunately many problems are embarassingly paralleland—with the right abstractions—can be mappedreliably to stock hardware

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 55 / 58

Page 183: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Summary

Web programming is a special case of distributedprogramming

Large number of tools/frameworks—best learnt on need

Basic principle: code in browser + code in server +communication

Distributed programming in general is hard to debugand hard to get right

Fortunately many problems are embarassingly paralleland—with the right abstractions—can be mappedreliably to stock hardware

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 55 / 58

Page 184: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Summary

Web programming is a special case of distributedprogramming

Large number of tools/frameworks—best learnt on need

Basic principle: code in browser + code in server +communication

Distributed programming in general is hard to debugand hard to get right

Fortunately many problems are embarassingly paralleland—with the right abstractions—can be mappedreliably to stock hardware

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 55 / 58

Page 185: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Summary

Web programming is a special case of distributedprogramming

Large number of tools/frameworks—best learnt on need

Basic principle: code in browser + code in server +communication

Distributed programming in general is hard to debugand hard to get right

Fortunately many problems are embarassingly paralleland—with the right abstractions—can be mappedreliably to stock hardware

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 55 / 58

Page 186: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Summary

Web programming is a special case of distributedprogramming

Large number of tools/frameworks—best learnt on need

Basic principle: code in browser + code in server +communication

Distributed programming in general is hard to debugand hard to get right

Fortunately many problems are embarassingly paralleland—with the right abstractions—can be mappedreliably to stock hardware

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 55 / 58

Page 187: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Resources I

“Never believe one source”

Java applets tutorial:http://java.sun.com/docs/books/tutorial/

deployment/applet/appletsonlyindex.html

Java servlets tutorial http://java.sun.com/j2ee/tutorial/1_3-fcs/doc/Servlets.html

Java threads tutorial:http://java.sun.com/docs/books/tutorial/

essential/concurrency/index.html

Message-passing interface:http://www.open-mpi.org/

Tuple spaces:http://en.wikipedia.org/wiki/Tuple_space

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 56 / 58

Page 188: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

Resources IIJava spaces: http:

//java.sun.com/developer/Books/JavaSpaces

memcached:http://en.wikipedia.org/wiki/Memcached andhttp://www.danga.com/memcached/

RFC2965, HTTP State Management Mechanism,D.Kristol, L.Montulli, October 2000, ftp://ftp.rfc-editor.org/in-notes/rfc2965.txt

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 57 / 58

Page 189: CM40212: Internet Technologyjap/cm40212/slides/05-presentation.pdf · CM40212: Internet Technology Julian Padget Web programming Distributed programming Programming frameworks Summary

CM40212: InternetTechnology

Julian Padget

Web programming

Distributedprogramming

Programmingframeworks

Summary

Summary

References

Berry, G. and Boudol, G. (1990).The chemical abstract machine.In POPL ’90: Proceedings of the 17th ACM SIGPLAN-SIGACTsymposium on Principles of programming languages, pages 81–94, NewYork, NY, USA. ACM Press.

Carriero, N. and Gelernter, D. (1989).Linda in context.Commun. ACM, 32(4):444–458.

Chandy, K. and Misra, J. (1988).Parallel Programming Design: A Foundation.Addison-Wesley.ISBN 0-201-05866-9.

Freeman, E., Hupfer, S., and Arnold, K. (1999).JavaSpaces Principles, Patterns, and Practice.Pearson Education.ISBN: 0201309556,http://java.sun.com/developer/Books/JavaSpaces/.

Julian Padget (CS/Bath) CM40212: Internet Technology November 1, 2011 58 / 58