Post on 04-Apr-2018
7/29/2019 Distributed Systems Lab 13
1/26
1
Distributed Systems Techs13. Google, Amazon and Public WSs
January 11, 2010
7/29/2019 Distributed Systems Lab 13
2/26
2
the most popular search engines around because itprovides a superior number of hits.
The search engine is also good at providing validinformation through the use of indexing and filteringso long as you specify the search criteria clearly.
Given the number of ways that the GoogleAdvanced Search(http://www.google.com/advanced_search) helpsyou look for information, providing clear direction
can be overwhelming to some. The flexibility provided by the interface is part the
reason many power users prefer Google.
7/29/2019 Distributed Systems Lab 13
3/26
3
Google Web Services
mean of accessing Google without going to the Website and performing a search manually.
This Web service provides essential services byhelping you automate the search process andpresenting data in the form that you need, rather thanin the form that Google thinks you need.
Client request information based on any of a number ofsearch criteria.
Google WSs returns the information in a standardized
format. A Google WS application can make it easy to add a
professional search service to your site, making it a lot
more attractive to anyone who visits.
7/29/2019 Distributed Systems Lab 13
4/26
4
Example of received data
12k
... some text highlight) more text ...
True
http://www.mwt.net/ DataCon Services
7/29/2019 Distributed Systems Lab 13
5/26
5
Request limitations of Google WS
according to the license agreement you cant make
more than 1,000 requests per dayat least, notwithout special permission.
The request limitation
ensures the Google servers wont become overloaded, but they also mean you must provide some type of
monitoring in your application to prevent abuse of thelicensing terms.
If you violate the licensing terms, Google WS simplydenies your request.
7/29/2019 Distributed Systems Lab 13
6/26
6
Amazon
allows businesses to "rent" computing power, datastorage and bandwidth on its vast network
platform. Amazon Web Services (AWS) includes:1. Simple Storage Service (S3),
2. Elastic Compute Cloud (EC2),3. Simple Queue Service (SQS),4. Flexible Payments Service (FPS), and
5. SimpleDB to build web-scale business applications. Offers a new paradigm for IT infrastructure:use what you need, as you need it, and pay as you go.
7/29/2019 Distributed Systems Lab 13
7/26
7
Infrastructure in the Cloud
The Web is full of opportunities for companies both large andsmall, but the smaller companies face a difficult problem:
infrastructure. Web appls that are popular and have thousands of users require
significant infrastructure to provide the high performance andsmooth experience that users demand.
Industrial-strength infrastructure is very expensive to buy andmaintain, so smaller companies with fewer users are often forcedto do without.
Amazon offers a solution to this dilemma in the form of
infrastructure WSs: These services allow application developers to avoid altogether
the burden of buying and maintaining physical infrastructure bymaking it possible to rent virtual infrastructure instead.
7/29/2019 Distributed Systems Lab 13
8/26
8
Amazon Simple Storage Service (S3)
http://www.aws.amazon.com/s3
offers secure online storage space for any kind of data, providing
an alternative to building, maintaining, and backing-up storagesystems.
It makes your data accessible to any other applications orindividuals you allow from anywhere on the Web.
There are no limits on how much data you can store in theservice, how long you can store it, or on how much bandwidthyou can use to transfer or publish it.
S3 is a scalable, distributed system that stores your informationreliably across multiple Amazon data centers, and it is able toserve it quickly to massive audiences.
S3 storage application programming interface (API) makes noassumptions about the nature of the data you are storing.
http://www.aws.amazon.com/s3http://www.aws.amazon.com/s37/29/2019 Distributed Systems Lab 13
9/26
9
Amazon Elastic Compute Cloud (EC2)
http://www.aws.amazon.com/ec2
Makes it possible to run multiple virtual Linux servers on demand,providing as many computers as you need to process your data or run
your web appl without having to purchase or rent physical machines. Gives full control over each server with root access to the OS, a
configurable firewall to manage network access, and the freedom toinstall any software you please.
Once you have set up an EC2 server the way you like it, you can saveit permanently as a server image. You can then launch new serversfrom this image to create virtual machines that are preconfigured andready to do your bidding.
The service offers an API to start and stop server instances, applyaccess and networking permissions, and manage your server images.
You manage each individual server using standard Linux tools over a
secure shell session.
http://www.aws.amazon.com/ec2http://www.aws.amazon.com/ec27/29/2019 Distributed Systems Lab 13
10/26
10
Amazon Simple Queue Service (SQS)
http://www.aws.amazon.com/sqs
delivers short messages between any computers or systems withaccess to the Internet, allowing the components of your distributed
web appls to communicate reliably without you having to build ormaintain your own messaging system.
you can send an unlimited no. of messages via an unlimited numberof message queues, and you can configure the performance
characteristics and access permissions for each queue. The service uses a message locking and timeout mechanism that
helps prevent messages from being delivered more than once, whilestill ensuring they will be delivered despite any component failures or
network dropouts. Your messages are stored redundantly across multiple servers and
data centers.
The service's API allows you to send and receive messages, and to
control their full life cycle.
http://www.aws.amazon.com/sqshttp://www.aws.amazon.com/sqs7/29/2019 Distributed Systems Lab 13
11/26
11
Amazon Flexible Payments Service (FPS)
http://www.aws.amazon.com/fps Transfers money between individuals or companies that have Amazon
Payments accounts, allowing you to build applications that provide an onlinestore or that implement a marketplace between customers and third-partyvendors.
With FPS you can make payments from traditional sources, such as creditcards and bank accounts, or from sources internal to Amazon Paymentsaccounts that have lower fees and are designed to make micro-paymenttransactions feasible.
All transactions need to be authorized by everyone involved in thetransaction. The parties involved can impose detailed constraints on transactions, such
as how and when transactions can be performed, how much money can betransferred, and who can send and receive the funds.
Customers interact with your FPS application through an Amazon Paymentsgateway using their Amazon.com account.
Because the transactions are mediated by Amazon, your customers are notrequired to provide you with their personal banking information, and you donot have the burden of securely storing this highly sensitive information.
http://www.aws.amazon.com/fpshttp://www.aws.amazon.com/fps7/29/2019 Distributed Systems Lab 13
12/26
12
Amazon SimpleDB (SimpleDB)
http://www.aws.amazon.com/sdb stores small pieces of textual information in a simple database
structure that is easy to manage, modify and search. If your application relies on a relatively simple database, this
service can replace your traditional relational database (RDBMS)server leaving you with one less piece of infrastructure topurchase and maintain.
SimpleDB is designed to minimize the complexity andadministrative overhead involved in managing your data.
It does not require a pre-defined schema so you can alter thestructure and content of your database whenever you need to.
It indexes every piece of information you store so all your queriesrun quickly.
It stores your data securely, redundantly and safely withinAmazon's network of data centers.
http://www.aws.amazon.com/sdbhttp://www.aws.amazon.com/sdb7/29/2019 Distributed Systems Lab 13
13/26
13
Characteristics of the 5 Amazon WS
They are pay-as-you-go, meaning you pay predictable fees based onhow much or how little you use the service.
There are no initial costs to join, no long-term subscription payments,and the usage fees are low.
The services are highly scalable, performing equally well in modest ormassively demanding usage scenarios. This means that the applications built on them can be similarly scalable and
are able to grow rapidly at short notice without hitting limits imposed byinsufficient infrastructure.
All the services are designed to be highly reliable and fault-tolerant: the services and data resources are distributed across multiple servers and
data centers within Amazon's infrastructure, and they are managed by a company with significant experience and
investments in the operation of a global web business.
To use AWS you first need to register for an account and provide acredit card to be billed for your service usage.
7/29/2019 Distributed Systems Lab 13
14/26
14
APIs: REST for S3 and SQS
AWS infrastructure services are made available through threeseparate APIs: REST, Query, and SOAP.
REST interfaces offered by AWS use only the standard componentsof HTTP request messages to represent the API action that is beingperformed.
These components include:
1. HTTP method: describes the action the request will perform2. Universal Resource Identifier (URI): path and query elements that
indicate the resource on which the action will be performed
3.
Request Headers: pieces of metadata that provide more informationabout the request itself or the requester
4. Request Body: the data on which the service will perform an action
7/29/2019 Distributed Systems Lab 13
15/26
15
APIs: Query interfaces for EC2,SQS,FPS&SimpleDB
Also use the standard components of the HTTP protocol to representAPI actions - however these interfaces use them in a different way.
Query requests rely on parameters, simple name and value pairs, toexpress both the action the service will perform and the data the actionwill be performed on.
When you are using a Query interface, the HTTP envelope servesmerely as a way of delivering these parameters to the service.
To perform an operation with a Query interface, you can express theparameters in the URI of a GET request, or in the body of a POSTrequest.
The method component of the HTTP request merely indicates where in
the message the parameters are expressed, while the URI may or maynot indicate a resource to act upon.
Query interfaces can be considered REST-like, because although theydo things differently, they still only use standard HTTP message
components to perform operations.
7/29/2019 Distributed Systems Lab 13
16/26
16
APIs: SOAP interfaces for all 5 WS
Use XML documents to express the action that will be performed and the data thatwill be acted upon.
These SOAP XML documents are constructed as another layer on top of theunderlying HTTP request, such that all the information about the operation is moved
out of the HTTP message and encapsulated in the SOAP message instead. For operations performed with a SOAP interface, the HTTP components of the
request message are nearly irrelevant: all that is important is the XML document sentto the service as the body of the request.
The valid structure and content of SOAP messages are defined in a WSDL documentthat describes the operations the service can perform, and the structure of the inputand output data documents the service understands.
To create a client program for a SOAP interface, you will typically use a third-partytool to interpret the WSDL document and generate the client stub code necessary tointeract with the service.
The approach used in the SOAP interfaces are very different from those used by the
REST and Query interfaces. Operations expressed in SOAP messages are completely divorced from the underlying HTTP
message used to transmit the request, and the HTTP message components, such as methodand URI, reveal nothing about the operation being performed.
7/29/2019 Distributed Systems Lab 13
17/26
17
Example of XML doc returned by S3 WS using SOAP
listing of our data storage buckets
1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b
jamesmurty
oreilly-aws2007-09-14T08:20:49.000Z
my-bucket2007-09-24T08:39:30.000Z
7/29/2019 Distributed Systems Lab 13
18/26
18
S3
Data model is very simple, comprising only two kinds of storage resource:objects and buckets. Objects store data and metadata, and
Buckets are containers that can hold an unlimited number of objects. Provides access control mechanisms that allow you to keep your information
private or make it public and accessible to anyone on the Internet. Access control settings are configured using a list of rules that describe who will be
granted access to a resource and the kinds of access that will be permitted.
Access control settings can be applied to both bucket and object resources.
Resources are identified using standard URIs. Such as http://s3.amazonaws.com/bucket-name/object-name.
Allows resources to be accessed using alternative domain names. E.g.http://www.mysite.com/object-name.
The data is stored redundantly within this architecture, spread across multiple
physical servers and across multiple data centers in different locations. Drawbacks
S3 Objects cannot be manipulated like standard files
Changes take time to propagate
S3 requests will fail occasionally
S3's IP addresses may change over time
7/29/2019 Distributed Systems Lab 13
19/26
19
REST interface of S3
Acting on S3 resources with HTTP methods
Resource GET HEAD PUT DELETE POST
S3 Service
List your
buckets - - - -
Object
Retrieve the
object's
data and
metadata
Retrieve the
object's
metadata
Create or
replace
the object
Delete the
object
Create or
replace
the object
Bucket
List thebucket's
objects -
Create the
bucket
Delete the
bucket -
Access
Control
List -
ACL (for
a Bucket
or Object
resource)
Retrieve ACL
settings -
Apply new
ACL
settings - -
7/29/2019 Distributed Systems Lab 13
20/26
20
S3 Applications
Can use it as a basic online file repository for backing up files, for web site hosting, as the basisfor a network-mounted filesystem, or as a distribution network.
Share Large Files. Use the service as a repository for sharing files that are too large to include in an email.
There are a number of online services already available to do this job, but many charge monthly
subscription fees if you need to share very large files; with S3 you can do this yourself at little cost. To share a file, you will need to upload the file to S3 and send a URI link to the S3 object in an email.
Because your files may contain private information, a signed URI link to the object is generated so that onlythe people who receive the link from you can access it. An advantage of using a signed URI is that you can choose how long the link will remain valid.
S3 Filesystem e.g. with ElasticDrive. S3 offers an unlimited data store on top of which other filesystem interface abstractions can be built.
Some of these tools are designed to make S3 storage resources accessible to existing network-based toolsthat do not recognize S3 for example, as a FTP or a Web-based Distributed Authoring and Versioning (WebDAV) service
Others aim to make the storage space in S3 available as a lower-level filesystem resource.
Mediated Access to S3 e.g. with JetS3t. Effective platform for sharing information, when its simple access control mechanisms meet your needs. Some scenarios are difficult to achieve with ACL settings alone, such as if you wish to make your S3
storage available to your customers or colleagues to use when they do not have their own AWS account. In such cases you may need to provide your own intermediate service to mediate access to your S3 storage.
The JetS3t Java library to mediate third-party access to your S3 storage. These tools include a client-side application, for interacting with S3 to upload and download files, and a server-side
Gatekeeper component that decides whether the client, or user, should be authorized to perform these operations.
7/29/2019 Distributed Systems Lab 13
21/26
21
EC2 key components
1. Instances.
are the VMs that run in the EC2 environment and
perform computing tasks that would typically be done by physicalservers.
based on a Xen-compatible Linux kernel
2. Environment.
Instances run in the EC2 environment, which providesconfigurable access control, contextual data, and otherinformation that instances need to do their work.
3. Amazon Machine Images(AMIs)
are files that capture a complete snapshot of an EC2 instance ata point in time, including its software, configuration, andpotentially even its data.
serve as the boot disk for the instances you launch.
7/29/2019 Distributed Systems Lab 13
22/26
22
EC2 instance types
Resource Small Large Extra Large
Platform 32-bit x86 64-bit x86 64-bit x86
CPU rating
1 ECU (1 virtual
core)
4 ECUs (2 virtual
cores, 2 ECUs
each)
8 ECUs (4 virtual
cores of 2 ECUs
each)
Memory (RAM) 1.7 GB 7.5 GB 15 GB
Storage(ephemeral) 150 GB
840 GB (two 420GB partitions)
1680 GB (four 420GB partitions)
Storage (root
partition) 10 GB 10 GB 10 GB
I/O Performance Moderate High High
Instance Type
Name m1.small m1.large m1.xlarge
7/29/2019 Distributed Systems Lab 13
23/26
23
EC2 applications Use the virtual servers provided by EC2 to do most things a physical server can do, from hosting
web sites or appls to creating clusters of servers for on-demand processing of large data sets.
Dynamic DNS. How to make your instance accessible via a user-friendly domain name that your users can remember?
With standard servers: purchase a domain name and configuring the DNS settings for that domain to refer toyour server's IP address. this approach is only really workable if your server has a static IP address that does not change over time.
EC2 does not allow network addresses to be statically assigned to instances. Start an EC2 instance => VM is assigned IP and DNS addresses that will only refer to the instance for as long as it is running. Use a dynamic DNS service to associate your domain name with your EC2 instance instead of standard DNS.
Dynamic DNS services are designed for situations in which a server's address changes every so often, and they will propagateaddress changes to the public much more quickly than standard DNS.
On-Demand VPN Server with OpenVPN. Advantage of EC2: you can start and stop server instances as you need them and only pay for the time the
server is running. This capability is most often useful for increasing and decreasing the number of servers you have running in response to
changing demands on a web appl.
How to set up an EC2 instance to run a Virtual Private Network (VPN) server that you can use to secure yournetwork traffic when you access the Internet over an untrusted network? It is becoming increasingly common for people to access the Internet through public access points, such as WiFi hotspots,
wired networks provided by hotels, or the internal networks of companies you may be visiting. The best way to protect your data when using an untrusted network is to use a VPN to encrypt it. Open-source VPN server OpenVPN (http://openvpn.net/): Configure the server to use a secret key such that only you, the owner of the key, can connect to it. Once we have configured our instance, it will allow a client computer to connect over a secure channel, and it will relay all
network traffic to the public Internet on behalf of the client. Whenever you need to access the Internet over an untrusted network, you can fire up this instance and create your own
personal VPN to protect your network traffic.
http://openvpn.net/http://openvpn.net/7/29/2019 Distributed Systems Lab 13
24/26
24
Public available WS -1
Blogging services:
MSN Spaces, Akismet, TypePad, FeedBurner, FeedBlitz,Weblogs.com, Technocrati etc
Bookmark services: del.icio.us, Simpy, Blogmarks, Ma.gnolia etc
Financial services:
Blinksale, StrikeIron Historical Stock Quotes, Dun and Bradstreet
Credit Check, Netaccounts etc Mapping services:
Google Maps, Yahoo!Maps, AcrWeb, FeedMaps BlogMap,Microsoft MapPoint, MapQuests OpenAPI, Map24 AJAX,
Microsofts Virtual Earth etc Music/Video Services:
SeeqPod, Phapsody, Last.fm, YouTube, Dave.TV etc
7/29/2019 Distributed Systems Lab 13
25/26
25
Public available WS - 2
News/Weather services:
NewsCloud, NewsIsFree, NewsGator, BBC, WeatherBug etc
Photo services: Flickr, SmugMug, Pixagogo, Faces.com, Snipshot etc
References services:
RealIEDA Reverse Phone Lookup, ISBNdb, Urban Dictionary, SRCDemographics, StrikeIron US Census, StrikeIron Residential Lookup
Search services:
Google AJAX Search API, Yahoo!Search, Windows Live Search etc
Shopping services: Amazon, DataUnison eBay Research, UPC Database, eBay, CNET
7/29/2019 Distributed Systems Lab 13
26/26
26
Public available WS - 3
English Standard Version Bible Lookup read Bible online
Amnesty International freedom of expression
411Sync keyword searches through mobile technologies
Windows Live Custom Domains manage user base
Sunlight Labs clerical information (e.g. phones) Food Candy social networking sys for gourmands
Facebook social networking sys for online contacts
etc