Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon...

13
Web Log Analysis Using Amazon Web Services Carol Alexandru Fabian Christoffel Louis-Marie Loe Robert Sharp

Transcript of Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon...

Page 1: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Web Log AnalysisUsing Amazon Web Services

Carol AlexandruFabian ChristoffelLouis-Marie LoeRobert Sharp

Page 2: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web
Page 3: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Amazon EC2 Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.

Amazon EC2 SpotSpot Instances allow you to name your own price for Amazon EC2 computing capacity. You simply bid on spare Amazon EC2 instances and run them whenever your bid exceeds the current Spot Price, which varies in real-time based on supply and demand. The Spot Instance pricing model complements the On-Demand and Reserved Instance pricing models, providing potentially the most cost-effective option for obtaining compute capacity, depending on your application.

Source: http://aws.amazon.com/ec2/

Page 4: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Amazon CloudFrontAmazon CloudFront is a web service for content delivery. It integrates with other Amazon Web Services to give developers and businesses an easy way to distribute content to end users with low latency, high data transfer speeds, and no commitments.

Amazon S3Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers.

Source: http://aws.amazon.com/s3/ http://aws.amazon.com/elasticmapreduce/

Page 5: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Amazon EMRAmazon Elastic MapReduce (Amazon EMR) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

Amazon RDSAmazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.

Source: http://aws.amazon.com/cloudfront/ http://aws.amazon.com/rds/

Page 6: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web
Page 7: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web
Page 8: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web
Page 9: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Pros and Cons of Amazon Web Log Setup

Pros:● user pays always a market-conform price for

computing power (your competitors are not able to produce more cost-effective than you as long as they stick to amazon services)

● user does not have to worry about scalability of service while cost grow smoothly with amount of logs

Cons:● logs analyzation is not performed in realtime ● if purchasing power of user is low, the log analyzation

may never complete since always only few spot instances are deployed for the job (number of collected logs on average is higher than processed logs on average)

Page 10: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

No intermediate storage of raw logs but online processing

Pros:● realtime analyzes● no need to store raw logs but only results of analysisCons:● if the number of collected logs increases you have to buy

additional (spot) instances immediately -> higher costs than with proposed setup.

Alternative Amazon Web Log Setup

Page 11: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Solutions by other providers

Loggly● syslog sends logs directly

*.* @logs.loggly.com:[PORT]● Or you send logs via HTTP● And there's a JS drop-in available● You pay for daily log volume & retention time

Source: https://app.loggly.com/pricing http://wiki.loggly.com/loggingconfiguration

Page 12: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Solutions by other providers

Google Analytics (Urchin)● opaque backend● log collection via JS drop-in

Piwik● open source clone of Google Analytics● same concept

Build your own?● Apache Hive + Hadoop● Cloudbase + Hadoop

Source: http://www.google.com/analytics/ http://piwik.org/

http://www.devx.com/Java/Article/48100http://www.saurabhnanda.com/2009/07/using-hive-for-weblog-analysis.html

Page 13: Web Log Analysis › edu › lsci2012 › homework1 › groupB.pdf · Amazon CloudFront Amazon CloudFront is a web service for content delivery. It integrates with other Amazon Web

Solutions by other providerscompared with the Amazon method

● Most providers offer a complete service, including charts and prepared analyses

● Amazon provides a bare bones model which you will have to utilize yourself○ It's just one way of using their versatile architecture○ It's still "build your own", but without any hardware○ You could use other VM-providers for the same job