Cloud Computing, Big Data, & CDN Emerging Technologies
-
Upload
adrian-lopez -
Category
Documents
-
view
222 -
download
0
Transcript of Cloud Computing, Big Data, & CDN Emerging Technologies
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
1/212
Cloud Introduction
Cloud Computing
1
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
2/212
Cloud Computing
What does Cloud Computing do?
• Provides online data storage
• Enables configuration and accessing of online applications
• Provides a variety of software usage
• Provides computing platform and computing infrastructure
2
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
3/212
Cloud Computing
Application Example
• Using Gmail on my smartphone to check e-mails
• Receive an e-mail with a MS Power Point attachment file
• However, MS Power Point and Windows OS is not installed
on my smartphone!
• Google Drive service’s Google Docs, Sheets, and Slides
can be used to open the file
3
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
4/212
Cloud Computing
What is a Cloud?
• Cloud can provide services through a public or private
Network or the Internet, where the service hosting system isat a remote location
• Cloud can support various applications
• E-mail, Web Conferencing, Games, Database
Management, CRM (Customer Relationship Management),
etc.
4
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
5/212
Cloud Computing
Cloud Models
5
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
6/212
Cloud Models
• Public Cloud
˗ Enables public systems and service access˗ Open architecture (e.g., e-mail)
˗ Could be less secure due to openness
• Private Cloud
˗ Enables service access within an organization
˗ Due to its private nature, it is more secure
Cloud Computing
6
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
7/212
• Community Cloud
˗ Cloud accessible by a group of organizations
• Hybrid Cloud
˗ Hybrid Cloud = Public Cloud + Private Cloud
˗ Private cloud supports critical activities
˗ Public cloud supports non-critical activities
Cloud Computing
Cloud Models
7
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
8/212
Cloud Computing
Cloud Service Models
Ø SaaS: Software as a Service
Ø PaaS: Platform as a ServiceØ IaaS: Infrastructure as a Service
The lower service model supports the
management, computing power, security
of its upper service model
8
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
9/212
Cloud Computing
Software as a Service (SaaS)• Provides a variety of software applications as a service to
end users
Platform as a Service (PasS)
• Provides a program executable platform for applications,
development tools, etc.
Infrastructure as a Service (IaaS)• Provides the fundamental computing and security
resources for the entire cloud
• Backup storage, computing power, VM (Virtual Machines),
etc.
9
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
10/212
Cloud Computing
Cloud Service Models
• There are many other service models
• XaaS = Anything as a Service
• NaaS
N for Network as a Service
• DaaS
D for Database as a Service
• BaaS
B for Business as a Service
• etc.
10
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
11/212
Cloud Computing
Cloud Benefits
11
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
12/212
Cloud Computing
Characteristics
12
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
13/212
REFERENCES
Cloud Computing
13
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
14/212
• K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading
Computation Save Energy?,” Computer , vol. 43, no. 4, pp. 51–56, Apr. 2010.
• Wikipedia, http://www.wikipedia.org
• Apple, iCloud, https://www.icloud.com
• Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]
• Virtualization, Cisco’s IaaS cloud,http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg
[Accessed June 1, 2015]
• Tutorialspoint, Cloud computing,
http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf
[Accessed June 1, 2015]
References
14
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
15/212
Image sources• AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web
Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons
• iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons
• MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons
References
15
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
16/212
Cloud Service Models
Cloud Computing
16
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
17/212
Cloud Computing
Cloud Service Models
Ø SaaS: Software as a Service
Ø PaaS: Platform as a ServiceØ IaaS: Infrastructure as a Service
The lower service model supports the
management, computing power, security
of its upper service model
17
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
18/212
IaaS
IaaS (Infrastructure as a Service)
• Infrastructure support over the Internet
• Cloud’s Computing & Storage Resources• Computing Power
• Storage Services
• Software Packages & Bundles
• VLAN (Virtual Local Area Network)
• VM (Virtual Machine) Features
18
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
19/212
IaaS
VM (Virtual Machine) Administration
• IaaS enables control of computing resources through
Administrative Access to VMs
Server Virtualization features
• Access to computing resources are enabled by
Administrative Access to VMs
• VM Administrative Command examples
• Save data on cloud server
• Start web server• Install new application
19
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
20/212
IaaS
IaaS Procedures
20
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
21/212
IaaS
IaaS Benefits
• Flexible and Efficient Renting of Computer & Server
Hardware
• Rentable Resources
• VM, Storage, Bandwidth,IP Addresses, Monitoring Services, Firewalls,
etc.
• Rent Payment Basis
• Resource type
• Usage time• Service packages
21
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
22/212
IaaS
IaaS Benefits
• Portability & Interoperability with
Legacy Applications
• Enables portability based on infrastructureresources that are
used through Internet connections
• Enables a method to maintain interoperability with
legacy applications and workloads
between IaaS clouds
22
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
23/212
PaaS
PaaS
(Platform as a Service)
• Provides development &
deployment tools for
application development
• Provides runtime
environment for apps.
23
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
24/212
Stand Alone
Development
Environment
Application
Delivery-Only
Environment
Open Platform
as a Service
Add-on
Development
Facilities
Cloud Services
PaaS Types
24
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
25/212
PaaS
PaaS Types
• Application Delivery-Only Environment
• Provides on-demand scaling & application security
• Stand-Alone Development Environment• Provides an independent platform for a specific function
• Open Platform as a Service
• Provides open source software to run applications for
PaaS providers
• Add-On Development Facilities• Enables customization to the existing SaaS platforms
25
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
26/212
PaaS
PaaS Benefits
26
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
27/212
PaaS
Benefits
• Lower Administrative Overhead
• User does not need to be involved in any
administration of the platform
• Lower Total Cost of Ownership
• User does not need to purchase any hardware,
memory, or server
27
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
28/212
PaaS
• Scalable Solutions
• Application resource demand based automatic
resource scale control
• More Current System Software
• Cloud provider needs to maintain software
upgrades & patch installations
Benefits
28
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
29/212
SaaS
SaaS (Software as a Service)
• Provides software applications as a service to the
user
• Software that is deployed on a cloud server which
is accessible through the Internet
29
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
30/212
SaaS
Characteristics
• On Demand Availability
• Cloud software is available anywhere that the
cloud is reachable via Internet• Easy Maintenance
• No user software upgrade or maintenance needed
All supported by the cloud
• Flexible Scale Up or Scale Down
• Centralized Management & Data
30
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
31/212
SaaS
Characteristics• Enables a Shared Data Model
• Multiple users can share a single
data model and database
• Cost Effectiveness• Pay based on usage
• No risk in buying the wrong software
• Multitenant Programming Solutions
• Multiple programmers are ensured to use the same
software versionNo version mismatch problems
31
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
32/212
Software-as-a-service
Open SaaS
Applications
32
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
33/212
REFERENCES
Cloud Computing
33
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
34/212
• K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading
Computation Save Energy?,” Computer , vol. 43, no. 4, pp. 51–56, Apr. 2010.
• Wikipedia, http://www.wikipedia.org
• Apple, iCloud, https://www.icloud.com
• Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]
• Virtualization, Cisco’s IaaS cloud,
http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg
[Accessed June 1, 2015]
• Tutorialspoint, Cloud computing,
http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf
[Accessed June 1, 2015]
References
34
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
35/212
Image sources• AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web
Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons
• iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons
• MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons
References
35
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
36/212
Cloud Services
Cloud Computing
36
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
37/212
Cloud Services
Google Cloud
• Google App Engine
˗ Released as a preview in April 2008
˗ PaaS (Platform as a Service) for web applications˗ Provides automatic scaling based on resource
demands and server load
• Google Cloud Storage˗ Launched in May 2010
˗ Online file storage service
37
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
38/212
Cloud Services
Google Cloud
• Google BigQuery˗ Released in April 2012
˗ Data analysis tool that uses SQL-like queries toprocess big datasets in seconds
• Google Compute Engine˗ Released in June 2012
˗ IaaS (Infrastructure as a Service) support
to enable on demand launching of VMs (VirtualMachines)
38
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
39/212
Cloud Services
Google Cloud
• Google Cloud Endpoints˗ Released in November 2013
˗ Tool to create services inside App Engine˗ Easily connects from Android, iOS, and JavaScript
clients
• Google Cloud DNS (Domain Name System)˗ DNS service supported by the Google Cloud
39
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
40/212
Cloud Services
• Google Cloud Datastore˗ NoSQL (No Structured Query Language) data storage
• Google Cloud SQL (Structured Query Language)˗ Released in February 2014
as GA (General Availability)
˗ Fully managed MySQL database
Google Cloud
40
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
41/212
Cloud Services
Amazon S3 (Simple Storage Service)
• Online file storage web service offered by Amazon Web
Services
• Public web service released in the United States in March
2006 and in Europe in November 2007
• Provides storage through
web services interfaces
(REST, SOAP, and BitTorrent)
41
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
42/212
Cloud Services
Amazon Cloud Drive
• Amazon Cloud Drive was released in
March 2011
• Web storage application from Amazon
• Storage Space Characteristics˗ Can be accessed from up to eight specific devices (e.g.,
mobile devices & different computers) and by using
different browsers on the same computer
42
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
43/212
Cloud Services
Amazon Cloud Drive
• Cloud Player (Originally bundled)
˗ Users can play music in their Cloud Drive from any
computer or Android device
˗ Music browsing based on song titles, albums, artists,
genres (website only), and playlists
43
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
44/212
Cloud Services
Amazon Cloud Drive Options
• Unlimited Photos
˗ Unlimited storage for photos & raw data files˗ 5 gigabytes of video storage
• Unlimited Everything˗ Unlimited storage for photos, videos, documents, and
various files types
44
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
45/212
Cloud Services
iCloud
• Developed by Apple, Inc.
• Public release in October 2011
• Cloud Storage & Cloud Computing
• Operating system˗ OS X (10.7 Lion or later)
˗ Microsoft Windows 7 or later
˗ iOS 5 or later
45
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
46/212
Cloud Services
iCloud replaces MobileMe
• Subscription-based collection of Apple’s online
services and software
• MobileMe was replaced by iCloud
• MobileMe ceased services in
June 2012
• MobileMe users were allowed transfers to iCloud
until
July 2012
46
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
47/212
Cloud Services
iCloud Features
• Email, Contacts, and Calendars
• Find My Friends
• Backup & Restore˗ Back up feature for device settings & data
˗ iOS 5 or later required
• Find My iPhone˗ Enables a user to track the location of an iOS device or
Mac
˗ Formerly a feature of MobileMe
47
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
48/212
Cloud Services
• Can manage lost or stolen Apple devices
• Back to My Mac˗ Enables remote log in to other computers that have
Back to My Mac installed (using the same Apple ID)
• iWork for iCloud˗ Apple's iWork suite (Pages, Numbers, and Keynote)
made available on a web interface
iCloud Features
48
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
49/212
Cloud Services
iCloud Features
• Photo Stream˗ Can store most recent 1,000 photos
˗ Free storage for up to 30 days
• iCloud Photo Library˗ Stores all photos at original resolution
˗ Stores photo metadata
• Storage (Introduced in 2011)
˗ 5 GB of free storage per account
49
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
50/212
Cloud Services
• iCloud Drive˗ Can save photos, videos, documents, and apps
• iCloud Keychain˗ Secure database for Website and Wi-Fi
password
˗ Secure Credit card & Debit card management for
quick access and auto-fill
iCloud Features
50
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
51/212
Cloud Services
• iTunes Match˗ iTunes music library scan and match tracks
function˗ Serves tracks copied from CDs or other sources
iCloud Features
51
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
52/212
REFERENCES
Cloud Computing
52
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
53/212
• K. Kumar and Y. H. Lu, “Cloud Computing for Mobile Users: Can Offloading
Computation Save Energy?,” Computer , vol. 43, no. 4, pp. 51–56, Apr. 2010.
• Wikipedia, http://www.wikipedia.org
• Apple, iCloud, https://www.icloud.com
• Google, Google Cloud, https://cloud.google.com/products [Accessed June 1, 2015]
• Virtualization, Cisco’s IaaS cloud,
http://www.virtualization.co.kr/data/file/01_2/1889266503_6f489654_1.jpg[Accessed June 1, 2015]
• Tutorialspoint, Cloud computing,
http://www.tutorialspoint.com/cloud_computing/cloud_computing_tutorial.pdf
[Accessed June 1, 2015]
References
53
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
54/212
Image sources• AWS Simple Icons Storage Amazon S3 Bucket with Objects, By Amazon Web
Services LLC [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via
Wikimedia Commons
• iCloud Logo, By EEIM (Own work) [Public domain], via Wikimedia Commons
• MobileMe Logo, By Apple Inc. [Public domain], via Wikimedia Commons
References
54
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
55/212
Big Data ExamplesBig Data
55
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
56/212
Big Data
New FLU Virus Starts in the U.S.!
• H1N1 flu virus (which has combined virus elements of thebird and swine (pig) flu) started to spread in the U.S. in2009
• U.S. CDC (Centers for Disease Control and Prevention) wasonly collecting diagnostic data of Medical Doctors once aweek
• Using the CDC information to find how the flu wasspreading would have an approximate2 week lag, which is far too slow compared to the speed ofthe virus spreading
56
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
57/212
Big Data
New FLU Virus Starts in the U.S.!
• What vaccine was needed?• How much vaccine was needed?• Where was the vaccine needed?
• Vaccine preparation and delivery plans couldnot be setup fast enough to safely prevent thevirus from spreading out of control
57
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
58/212
Big Data
• Fortunately, Google published a paper abouthow they could predict the spread of the winter
flu in the U.S. accurately down to specificregions and states
• This paper was published in the journal Naturea few weeks before the H1N1 virus made the
headline news
New FLU Virus Starts in the U.S.!
58
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
59/212
Big Data
• Millions of the most common search terms andMillions of different mathematical models were testedon Google’s database• Google receives more than 3 billion search queries
a day
• Analysis system was set to look for correlationbetween the frequency of certain search queues and
the spread of the flu over time and space
New FLU Virus Starts in the U.S.!
59
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
60/212
Big Data
• Google’s method of analysis did not use dataprovided from hospitals or Medical Doctors
• Google used Big Data analysis on the most commonsearch terms people use
• Google’s system proved to be more accurate andfaster than analyzing government statistics
New FLU Virus Starts in the U.S.!
60
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
61/212
Big Data
Wal-Mart
• Wal-Mart’s Data Warehouse• Stores 4 petabytes (4 1015) of data
• Records every single purchase• Approximately 267 million
transactions a day from 6000stores worldwide is recorded
61
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
62/212
Big Data
• Wal-Mart’s Data Analysis• Focused on evaluating the effectiveness of
pricing strategies and advertising campaigns
• Seeking for improvement methodsin inventory management and supply chains
Wal-Mart
62
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
63/212
Big Data
Recommendation System using Big Data
• Based on data analysis of simple elements
• What users made purchases in the past
• Which items do they have in their virtualshopping cart
• Which items did customers rate and like
• What influence did the rating have on other
customers to make a purchase
63
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
64/212
Big Data
Amazon.com
• Amazon.com’s Recommendation System• Item-to-Item Collaborative Filtering Algorithm
• Personalization of the Online Store
Customized to each customer
• Each customer’s store is based on the customer’spersonal interest• Example: For a new mother, the store will display
baby supplies and toys
64
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
65/212
Big Data
Citibank
• Bank operations in 100 countries
• Big Data analysis on the database of basic financial
transactions can enable Global insight oninvestments, market changes, trade patterns, andeconomic conditions
• Many companies (e.g., Zara, H&M, etc.) work withCitibank to locate new stores and factories
65
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
66/212
Big Data
Product Development & Sales
• For example, a Smartphone takes significant timeand money to manufacture
• In addition, the duration of popularity for a newSmartphone is limited
• To maximize sales, a company needs to manufacture just the right amount of products and sell them in the
right locations
66
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
67/212
Big Data
Product Development & Sales
• Too much will result in leftovers and abig waste for the company!
•Too less will result in a lost opportunity for company profitand growth!
• Big Data analysis can help find how many smartphonesand where the products could be popular based oncommon search terms that people use Use this to alsoestimate how many products could be sold in a certain
location But why is this difficult?
67
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
68/212
REFERENCES
Big Data
68
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
69/212
• V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think . Houghton Mifflin Harcourt, 2013.
• T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
• J. Venner, Pro Hadoop. Apress, 2009.
• S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,
Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,
vol. 52, no. 2, Winter 2011.• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
• G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.
References
69
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
70/212
• J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
• S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
• M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
• X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEETransactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.
2014.
• Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-
Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–
410, Jun/Jul. 2013.
References
70
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
71/212
• I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE
Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,
2012.
• M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database
Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
• IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-data.html [Accessed June 1, 2015]
• Hadoop Apache, http://hadoop.apache.org
• Wikipedia, http://www.wikipedia.org
Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
• Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
References
71
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
72/212
Big Data's 4 VsBig Data
72
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
73/212
Big Data
Big Data’s 4 V Big Challenges
• Volume – Data Size
• Variety – Data Formats
• Velocity – Data Streaming Speeds
• Veracity – Data Trustworthiness
73
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
74/212
Big Data
Volume – Data Size
• 40 Zettabytes (1021) of data is predicted to be createdby 2020
• 2.5 Quintillionbytes (1018) of data are created everyday
• 6 Billion (109) people have mobile phones• 100 Terabytes (1012) of data (at least) is stored by
most U.S. companies• 966 Petabytes (1015) was the approximate storage size
of the American manufacturing industry in 2009
74
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
75/212
Big Data
Variety – Data Formats
• 150 Exabytes (1018) was the estimated size of data forhealth care throughout the world in 2011
• More than 4 Billion (10
9
) hours each month are used inwatching YouTube• 30 Billon contents are exchanged every month on
Facebook• 200 Million monthly active users exchange 400 Million
tweets every day
75
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
76/212
Big Data
Velocity – Data Streaming Speeds
• 1 Terabytes (1012) of trade information is exchangedduring every trading session at the New York StockExchange
• 100 sensors (approximately) are installed in moderncars to monitor fuel level, tire pressure, etc.
• 18.9 Billion network connections are predicted to
exist by 2016
76
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
77/212
Big Data
Veracity – Data Trustworthiness
• 1 out of 3 business leaders have experienced trustissues with their data when trying to make a
business decision
• $3.1 Trillion (1012) a year is estimated to be wastedin the U.S. economy due to poor data quality
77
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
78/212
Big Data
New technology is needed to overcome these4 V Big Data Challenges
• Volume – Data Size
• Variety – Data Formats
• Velocity – Data Streaming Speeds
• Veracity – Data Trustworthiness
78
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
79/212
REFERENCES
Big Data
79
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
80/212
• V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think . Houghton Mifflin Harcourt, 2013.
• T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
• J. Venner, Pro Hadoop. Apress, 2009.
• S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,
Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,
vol. 52, no. 2, Winter 2011.• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
• G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.
References
80
f
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
81/212
• J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
• S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
• M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
• X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEETransactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.
2014.
• Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-
Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–
410, Jun/Jul. 2013.
References
81
R f
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
82/212
• I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE
Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,
2012.
• M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database
Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
• IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-data.html [Accessed June 1, 2015]
• Hadoop Apache, http://hadoop.apache.org
• Wikipedia, http://www.wikipedia.org
Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
• Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
References
82
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
83/212
HADOOPBig Data
83
H d
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
84/212
Hadoop
Data Storage, Access, and Analysis
• Hard drive storage capacity has tremendouslyincreased
• But the data read and write speeds to and from thehard drives have not significantly improved yet
• Simultaneous parallel read and write of data withmultiple hard disks requires advanced technology
84
H d
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
85/212
Hadoop
• Challenge 1: Hardware Failure
• Challenge 2: Cost
˗ When using many computers for data storage and
analysis, the probability that one computer will fail isvery high
˗ To avoid data loss or computed analysis informationloss, using backup computers and memory is needed,which helps the reliability, but is very expensive
Data Storage, Access, and Analysis
85
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
86/212
Hadoop
• Challenge 3: Combining Analyzed Data
˗ Combining the analyzed data is very difficult
˗ If one part of the analyzed data is not ready, then theoverall combining process has to be delayed
˗ If one part has errors in its analysis, then the overallcombined result may be unreliable and useless
Data Storage, Access, and Analysis
86
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
87/212
Hadoop
Hadoop
• Hadoop is a Reliable Shared Storage and Analysis System
• Hadoop = HDFS + MapReduce + α
˗ HDFS provides Data Storage˗ HDFS: Hadoop Distributed FileSystem
˗ MapReduce provides Data Analysis˗ MapReduce = Map + Reduce
Function Function
87
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
88/212
Hadoop
HDFS: Hadoop Distributed FileSystem
• DFS (Distributed FileSystem) is designed for storagemanagement of a network of computers
• HDFS is optimized to store huge files with streamingdata access patterns
• HDFS is designed to run on clusters of generalcomputers
88
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
89/212
Hadoop
HDFS: Hadoop Distributed FileSystem
• HDFS was designed to be optimal in performancefor a WORM (Write Once, Read Many times) pattern,
which is a very efficient data processing pattern
• HDFS was designed considering the time to read thewhole dataset to be more important than the timerequired to read the first record
89
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
90/212
Hadoop
HDFS
• HDFS clusters use 2 types of nodes
• Namenode (master node)
• Datanode (worker node)
90
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
91/212
Hadoop
HDFS: Namenode
• Manages the filesystem namespace
• Maintains the filesystem tree and the metadata for all thefiles and directories in the tree
• Stores on the local disk using 2 file forms• Namespace Image• Edit Log
91
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
92/212
Hadoop
HDFS: Datanodes
• Workhorse of the filesystem
• Store and retrieve blocks when requested by theclient or the namenode
• Report back to the namenode periodically with listsof blocks that were stored
92
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
93/212
Hadoop
MapReduce
• MapReduce is a program that abstracts the analysis
problem from stored data
• MapReduce transforms the analysis problem into acomputation process that uses a set of keys andvalues
93
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
94/212
Hadoop
MapReduce System Architecture
• MapReduce was designed for tasks that consume
several minutes or hours on a set of dedicated trustedcomputers connected with a broadband high-speednetwork managed by a single master data center
94
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
95/212
Hadoop
MapReduce Characteristics
• MapReduce uses a somewhat brute-force data analysis
approach
• The entire dataset (or a big part of the dataset) isprocessed for every query•
Batch Query Processor model
95
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
96/212
adoop
MapReduce Characteristics
• MapReduce enables the ability to run an ad hoc query
against the whole dataset within a scalable time
• Many distributed systems combine data from multiplesources (which is very difficult), but MapReduce doesthis in a very effective and efficient way
96
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
97/212
p
Technical Terms used in MapReduce
• Seek Time is the delay in finding a file
• Transfer Rate is the speed to move a file
• Transfer Rate has improved significantly more (i.e.,now has much faster transfer speeds) compared toimprovements in Seek Time (i.e., still relatively slow)
97
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
98/212
p
MapReduce
• MapReduce gains performance enhancement throughoptimal balancingof Seeking and Transfer operations
• Reduce Seek operations• Effectively use Transfer operations
• In the next lecture, we will compare MapReduce with atraditional RDBMS (Rational Database ManagementSystem)
98
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
99/212
REFERENCES
Big Data
99
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
100/212
• V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think . Houghton Mifflin Harcourt, 2013.
• T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
• J. Venner, Pro Hadoop. Apress, 2009.
• S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,
Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,
vol. 52, no. 2, Winter 2011.
• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
• G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.
100
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
101/212
• J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
• S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
• M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
• X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE
Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.
2014.
• Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-
Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–
410, Jun/Jul. 2013.
101
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
102/212
• I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEETransactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,
2012.
• M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database
Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
• IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]• Hadoop Apache, http://hadoop.apache.org
• Wikipedia, http://www.wikipedia.org
Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
• Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
102
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
103/212
MapReduce vs.RDBMS
Big Data
103
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
104/212
• RDBMS (Rational Database Management System)Characteristics
• RDBMS is good for updating a small proportion of abig database
• RDBMS uses a traditional B-Tree, which is highlydependent in the time required to perform seekoperations
MapReduce vs. RDBMS
104
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
105/212
MapReduce vs. RDBMS
• MapReduce Characteristics
• MapReduce is good for updating all (or a majority) ofa big database
• MapReduce uses Sort and Merge to rebuild thedatabase, which depends more on transfer operations
105
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
106/212
• RDBMS is good for applications that require thedatasets of the database to be very frequently updated
(e.g., point queries or small dataset updates)
• MapReduce is better for WORM (Write Once and ReadMany times) based data applications
• MapReduce is a complementary system to RDBMS
MapReduce vs. RDBMS
106
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
107/212
MapReduce vs. RDBMSRDBMS MapReduce
Data Size Gigabytes (109) Petabytes (1012)
Access Interactive & Batch Batch
Updates Read & Write Many Times WORM (Write Once,Read Many Times)
DataStructure Static Schema Dynamic Schema
Integrity High Low
Scalability Nonlinear Linear
107
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
108/212
MapReduce vs. RDBMS: Data Types
• Structured Data: Data that has a formal defined structure (e.g.,XML documents or database tables)
• Semi-Structured Data: Data that has a looser format where thedata structure is used as a guide and may be ignored
• Unstructured Data: Data that does not have any formalstructure (e.g., plain text or image data)
108
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
109/212
MapReduce vs. RDBMS: Data Types
• MapReduce is very effective on unstructured and semi-structured data
• Why?
• MapReduce interprets data during the dataprocessing sessions
• MapReduce does not use intrinsic properties of thedata as input keys or input values. The parametersusedare selected by the person analyzing the data
109
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
110/212
MapReduce vs. RDBMS: Scalability
• MapReduce has a programming model that is linearlyscalable
• MapReduce Functions: 2 types• Map function• Reduce function
• Both of these functions define aKey-Value pair mapping relation(e.g., Key-Value pair 1 Key-Value pair 2)
110
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
111/212
Hadoop Release SeriesFeature 1.x 0.22 2.X
Secure authentication Yes No Yes
Old configuration names Yes
New configuration names No Yes Yes
Old MapReduce API Yes Yes Yes
New MapReduce APIYes (with some
missing libraries)Yes Yes
MapReduce 1 runtime (Classic) Yes Yes No
MapReduce 2 runtime (YARN) No No Yes
HDFS Federation No No Yes
HDFS High-Availability No No Yes
Release 2.6.0 became available Nov. 2014
111
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
112/212
Hadoop Release Series
• 2.x includes several major new features
• MapReduce 2 is the new MapReduce runtime
implemented on a new system called YARN• YARN
• Yet Another Resource Negotiator • General resource management system for
running distributed applications
112
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
113/212
• HDFS Federation partitions the HDFS namespaceacross multiple namenodes• Enables improved support for clusters with very
large numbers of files
• HDFS High-Availability feature uses standbynamenodes for backup, and therefore, the namenodeis no longer a potential SPOF (Single Point of Failure)
Hadoop Release Series
113
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
114/212
REFERENCESBig Data
114
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
115/212
• V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform howwe live, work, and think . Houghton Mifflin Harcourt, 2013.
• T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
• J. Venner, Pro Hadoop. Apress, 2009.
• S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,
Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,
vol. 52, no. 2, Winter 2011.
• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creatingrevolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
• G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.
115
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
116/212
• J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
• S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
• M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
• X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE
Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.2014.
• Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-
Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–
410, Jun/Jul. 2013.
116
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
117/212
• I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEETransactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,
2012.
• M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database
Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
• IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]• Hadoop Apache, http://hadoop.apache.org
• Wikipedia, http://www.wikipedia.org
Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
• Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
117
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
118/212
MapReduceBig Data
118
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
119/212
Hadoop
• Hadoop is a Reliable Shared Storage and Analysis System
• Hadoop = HDFS + MapReduce + α
˗ HDFS provides Data Storage˗ HDFS: Hadoop Distributed FileSystem
˗ MapReduce provides Data Analysis˗ MapReduce = Map Function + Reduce Function
119
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
120/212
Scaling Out
• Scaling out is done by the DFS (Distributed FileSystem),where the data is divided and stored in distributedcomputers & servers
• Hadoop uses HDFS to move the MapReduce computationto several distributed computing machinesthat will process a part of thedivided data assigned
120
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
121/212
Jobs
• MapReduce job is a unit of work that needs to beexecuted
• Job types: Data input, MapReduce program,
Configuration Information, etc.
• Job is executed by dividing it into one of two types oftasks
• Map Task
• Reduce Task
121
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
122/212
Node types for Job execution
• Job execution is controlled by 2 types of nodes• Jobtracker
• Tasktracker
• Jobtracker coordinates all jobs
• Jobtracker schedules all tasks and assigns the tasksto tasktrackers
122
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
123/212
• Tasktracker will execute its assigned task• Tasktracker will send a progress reports to the Jobtracker
• Jobtracker will keep a record of the progress of all jobs executed
123
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
124/212
Data flow
• Hadoop divides the input into input splits (or splits)suitable for the MapReduce job
• Split has a fixed-size
• Split size is commonly matched to the size of a HDFSblock (64 MB) for maximum processing efficiency
124
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
125/212
• Map Task is created for each split
• Map Task executes the map function for all recordswithin the split
• Hadoop commonly executes the Map Task on thenode where the input data resides
Data flow
125
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
126/212
Data flow
• Data-Local Map Task
• Data locality optimization
does not need to use the cluster network• Data-local flow process shows why the
Optimal Split Size = 64 MB HDFS Block Size
126
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
127/212
• Rack-Local Map Task • A node hosting the
HDFS block replicas fora map task’s input splitcould be running other map tasks
• Job Scheduler will look for a free map slot on
a node in the same rack as one of the blocks
Map Task
HDFS Block
Node
Rack
Data Center
Data flow
127
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
128/212
• Off-Rack Map Task
• Needed when theJob Schedulercannot perform data-local or rack-local map tasks
• Uses inter-rack network transfer
Data flow
128
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
129/212
Map
• Map task will write its output to the local disk• Map task output is not the final output, it is only the
intermediate output
Reduce• Map task output is processed by Reduce Tasks to produce
the final output• Reduce Task output is stored in HDFS
• For a completed job, the Map Task output can bediscarded
129
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
130/212
Single Reduce Task
• Node includes Split, Map, Sort, and Output unit• Light blue arrows show data transfers in a node
• Black arrows show data transfers between nodes130
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
131/212
Single Reduce Task
• Number of reduce tasks is specifiedindependently, and is not based onthe size of the input
131
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
132/212
Combiner Function
• User specified function to run on the Map outputForms the input to the Reduce function
• Specifically designed to minimize the data transferredbetween Map Tasks and Reduce Tasks
• Solves the problem of limited network speed on thecluster and helps to reduce the time in completingMapReduce jobs
132
MapReduce
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
133/212
Multiple Reducer
• Map tasks partition their output, each creating onepartition for each reduce task
• Each partition may use many keys and keyassociated values
• All records for a key are kept in a single partition
133
MapReduce
M lti l R d
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
134/212
Multiple Reducers
• Shuffle process is used in the data flow
between the Map tasks and Reduce tasks
Shuffle
134
MapReduce
Z R d
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
135/212
Zero Reducer
• Zero reducer usesno shuffle process
• Applied when all of theprocessing can be carriedout in parallel Map tasks
135
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
136/212
REFERENCESBig Data
136
• V Mayer Schönberger and K Cukier Big data: A revolution that will transform how
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
137/212
• V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform howwe live, work, and think . Houghton Mifflin Harcourt, 2013.
• T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
• J. Venner, Pro Hadoop. Apress, 2009.
• S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,
Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,
vol. 52, no. 2, Winter 2011.
• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creatingrevolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
• G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.
137
• J R GalbRaith "Organizational Design Challenges Resulting From Big Data "
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
138/212
• J. R. GalbRaith, Organizational Design Challenges Resulting From Big Data,Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.
• S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
• M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
• X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE
Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.2014.
• Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-
Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–
410, Jun/Jul. 2013.
138
• I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE
Transactions on Knowledge and Data Engineering vol 24 no 10 pp 1904 1916
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
139/212
Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,
2012.
• M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database
Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
• IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]• Hadoop Apache, http://hadoop.apache.org
• Wikipedia, http://www.wikipedia.org
Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
• Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
139
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
140/212
HDFSBig Data
140
HDFS
Hadoop
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
141/212
Hadoop
• Hadoop is a Reliable Shared Storage and Analysis System
• Hadoop = HDFS + MapReduce + α
˗ HDFS provides Data Storage˗ HDFS: Hadoop Distributed FileSystem
˗ MapReduce provides Data Analysis˗ MapReduce = Map Function + Reduce Function
141
HDFS
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
142/212
• DFS (Distributed FileSystem) is designed for storage
management of a network of computers
• HDFS is optimized to store large terabyte size fileswith streaming data access patterns
HDFS: Hadoop Distributed FileSystem
142
HDFS
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
143/212
HDFS: Hadoop Distributed FileSystem
• HDFS was designed to be optimal in performance fora WORM (Write Once,
Read Many times) pattern
• HDFS is designed to run on clusters of generalcomputers & servers from multiple vendors
143
HDFS
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
144/212
HDFS Characteristics
• HDFS is optimized for large scale and high throughputdata processing
• HDFS does not perform well in supporting applicationsthat require minimum delay (e.g., tens of millisecondsrange)
144
HDFS
Blocks
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
145/212
Blocks
• Files in HDFS are divided into block size chunks 64Megabyte default block size
• Block is the minimum size of data that it can read or write
• Blocks simplifies the storage and replication processProvides fault tolerance & processing speedenhancement for larger files
145
HDFS
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
146/212
HDFS
• HDFS clusters use 2 types of nodes
• Namenode (master node)
• Datanode (worker node)
146
HDFS
Namenode
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
147/212
Namenode
• Manages the filesystem namespace• Namenode keeps track of the datanodes that have
blocks of a distributed file assigned
• Maintains the filesystem tree and the metadata for allthe files and directories in the tree
• Stores on the local disk using 2 file forms• Namespace Image• Edit Log
147
HDFS
Namenode
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
148/212
Namenode
• Namenode holds the filesystem metadata in its memory
• Namenode’s memory size determines the limit to thenumber of files in a filesystem
• But then, what is Metadata?
148
HDFS
Metadata
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
149/212
Metadata
• Traditional concept of the library card catalogs
• Categorizes and describes the contents and context of
the data files
• Maximizes the usefulness of the original data file bymaking it easy to find and use
149
HDFS
Metadata Types
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
150/212
Metadata Types
• Structural Metadata• Focuses on the data structure's design and
specification
• Descriptive Metadata• Focuses on the individual instances of application
data or the data content
150
HDFS
Datanodes
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
151/212
Datanodes
• Workhorse of the filesystem
• Store and retrieve blocks when requested by the client
or the namenode
• Periodically reports back to the namenode with lists ofblocks that were stored
151
HDFS
Client Access
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
152/212
Client Access
• Client can access the filesystem (on behalf of the user )by communicating with the namenode and datanodes
• Client can use a filesystem interface (similar to a POSIX(Portable Operating System Interface)) so the user codedoes not need to know about the namenode anddatanodes to function properly
152
HDFS
Namenode Failure
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
153/212
• Namenode keeps track of the datanodes that have blocksof a distributed file assigned Without the namenode, thefilesystem cannot be used
• If the computer running the namenode malfunctions thenreconstruction of the files (from the blocks on thedatanodes) would not be possible Files on thefilesystem would be lost
153
HDFS
Namenode Failure Resilience
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
154/212
Namenode Failure Resilience
• Namenode failure prevention schemes
1. Namenode File Backup
2. Secondary Namenode
154
HDFS
1. Namenode File Backup
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
155/212
• Back up the namenode files that form the persistentstate of the filesystem’s metadata
• Configure the namenode to write its persistent state
to multiple filesystems
Synchronous and atomic backup
• Common backup configuration Copy to Local Diskand Remote FileSystem
155
HDFS
2. Secondary Namenode
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
156/212
• Secondary namenode does not act the same way as thenamenode
• Secondary namenode periodically merges the
namespace image with the edit log to prevent the edit logfrom becoming too large
• Secondary namenode usually runs on a separatecomputer to perform the merge process because thisrequires significant processing capability and memory
156
HDFS
Hadoop 2.x Release Series HDFS Reliability
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
157/212
Enhancements
• HDFS Federation
• HDFS HA (High-Availability)
157
HDFS
HDFS Federation
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
158/212
• Allows a cluster to scale by adding namenodes
• Each namenode manages a
namespace volume and a block pool • Namespace volume is made up of the metadata for
the namespace• Block pool contains all the blocks for the files in the
namespace
158
HDFS
HDFS Federation
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
159/212
• Namespace volumes are all independent
• Namenodes do not communicate with each other
• Failure of a namenode is also independent to othernamenodes• A namenode failure does not influence the
availability of another namenode’s namespace
159
HDFS
HDFS High-Availability
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
160/212
• Pair of namenodes (Primary & Standby) are set to be inActive-Standby configuration
• Secondary namenode stores the latest edit log entriesand an up-to-date block mapping
• When the primary namenode fails, the standbynamenode takes over serving client requests
160
HDFS
HDFS High-Availability
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
161/212
• Although the active-standby namenode can takeoveroperation quickly (e.g., few tens of seconds), to
avoid unnecessary namenode switching, standbynamenode activation will be executed after asufficient observation period(e.g., approximately a minute or a few minutes)
161
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
162/212
REFERENCESBig Data
162
• V. Mayer-Schönberger, and K. Cukier, Big data: A revolution that will transform how
we live, work, and think . Houghton Mifflin Harcourt, 2013.• T. White, Hadoop: The Definitive Guide. O'Reilly Media, 2012.
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
163/212
, p y ,
• J. Venner, Pro Hadoop. Apress, 2009.
• S. LaValle, E. Lesser, R. Shockley, M. S. Hopkins, and N. Kruschwitz, “Big Data,
Analytics and the Path From Insights to Value,” MIT Sloan Management Review ,
vol. 52, no. 2, Winter 2011.
• B. Randal, R. H. Katz, and E. D. Lazowska, "Big-data Computing: Creating
revolutionary breakthroughs in commerce, science and society," Computing
Community Consortium, pp. 1-15, Dec. 2008.
• G. Linden, B. Smith, and J. York. "Amazon.com Recommendations: Item-to-Item
Collaborative Filtering," IEEE Internet Computing , vol. 7, no. 1, pp. 76-80, Jan/Feb.
2003.
163
• J. R. GalbRaith, "Organizational Design Challenges Resulting From Big Data,"
Journal of Organization Design, vol. 3, no. 1, pp. 2-13, Apr. 2014.• S. Sagiroglu and D. Sinanc, “Big data: A review,” Proc. IEEE International
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
164/212
g g g
Conference on Collaboration Technologies and Systems, pp. 42-47, May 2013.
• M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Networks and
Applications, vol. 19, no. 2, pp. 171-209, Jan. 2014.
• X. Wu, X. Zhu, G. Q. Wu, and W. Ding, ‘‘Data Mining with Big Data,’’ IEEE
Transactions on Knowledge and Data Engineering , vol. 26, no. 1, pp. 97–107, Jan.
2014.
• Z. Zheng, J. Zhu, and M. R. Lyu, ‘‘Service-Generated Big Data and Big Data-as-a-
Service: An Overview,’’ Proc. IEEE International Congress on Big Data, pp. 403–
410, Jun/Jul. 2013.
164
• I. Palit and C.K. Reddy, “Scalable and Parallel Boosting with MapReduce,” IEEE
Transactions on Knowledge and Data Engineering , vol. 24, no. 10, pp. 1904-1916,2012.
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
165/212
• M.-Y Choi, E.-A. Cho, D.-H. Park, C.-J Moon, and D.-K. Baik, “A Database
Synchronization Algorithm for Mobile Devices,” IEEE Transactions on Consumer
Electronics, vol. 56, no. 2, pp. 392-398, May 2010.
• IBM, What is big data?, http://www.ibm.com/software/data/bigdata/what-is-big-
data.html [Accessed June 1, 2015]
• Hadoop Apache, http://hadoop.apache.org• Wikipedia, http://www.wikipedia.org
Image sources• Walmart Logo, By Walmart [Public domain], via Wikimedia Commons
• Amazon Logo, By Balajimuthazhagan (Own work) [CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
165
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
166/212
CDN IntroductionCDN (Content Delivery Network)
166
CDN
Table of Contents
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
167/212
• CDN Motivation & Structure
• CDN Procedures
• Hierarchical Content Delivery Model
• CDN Market & Major Service Providers
• CDN Research & Development
167
CDN
CDN Motivation
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
168/212
• CDN is a network constructed from a group ofstrategically placed and geographically distributedcaching servers
• CDN is one of the most efficient solutions for CPs (ContentProviders) in serving a large number of user devices, forreduction in content download time and network traffic
168
CDN
CDN Motivation
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
169/212
• Network traffic that is accessed by mobile users (e.g., smartdevices) is rapidly increasing
• Mobile network performance is highly dependent on the
content download of multimedia data and applications
• Several mobile network operators have suffered from serviceoutage or performance deterioration due to the significantincrease in use of mobile devices
169
CDN
CDN Structure
Using CDN, both content
download time and networktraffic are reduced
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
170/212
ContentProvider
CachingServer
User
Content request and delivery route with CDNContent request and delivery route without CDN
Storepopular
contents inadvance
170
CDN
CDN in Mobile Networks
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
171/212
• Mobile communication networks have a stronger needfor both reduced traffic load and content delivery timecompared to broadband backbone networks where
capacity is abundant such that traffic load reduction maynot be as much of a critical issue
171
CDN
CDN Structure
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
172/212
• CDN usually consists of the CP (Content Provider) andcaching servers
• Caching servers are distributed in the networkcontaining selected copies of identical contents that theCP stores
• CP possesses all contents to serve
172
CDN
CDN Structure• When a user requests a content to its nearest
hi th d li th
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
173/212
caching server , the server can delivery thecontent if the requested content is in its cache
• Otherwise the caching server redirects theuser’s request to the remotely located CP
173
CDN
CDN Procedures• When a user requests a content to its nearest caching server, theserver can delivery the content if the requested content is in its
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
174/212
server can delivery the content if the requested content is in itscache
174
CDN
CDN Procedures• If the requested content is not in the local server’s cache,
t t t i di t d t th t l l t d CP
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
175/212
content request is redirected to the remotely located CP
175
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
176/212
CDN
Content Aging Procedure
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
177/212
• Each content has a content update periodTTL (Time to Live)
• Few seconds for on-line trading• Few seconds for auction information• 24 hours or more for movies
177
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
178/212
REFERENCES
CDN
178
• “Content Delivery Functional Architecture in NGN,” Telecommunication
Standardization Sector of ITU, White Paper, Sep. 2010.
• “Content delivery networks: Market dynamics and growth perspectives,” Informa
Telecoms & Media, White Paper, Oct. 2012.
• Cisco Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
179/212
• Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-
index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]
• Akamai, http://www.akamai.com/index.html/
• LimeLight, http://www.limelight.com/• Level 3, http://www.level3.com/
• CDNetworks, http://www.us.cdnetworks.com/
179
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
180/212
CDN HierarchicalContent Delivery
CDN (Content Delivery Network)
180
Hierarchical Content Delivery
Hierarchical Content Delivery
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
181/212
• It is not possible for a caching server to save allcontents that the CP (Content Providers) serves
• Retrieving contents from the remotely located CP cancause a long content download time. In addition, alarge amount of traffic will be generated by eachserver in support of the content’s packet routing
181
Hierarchical Content Delivery
• For the given cache size of each server, it is important
Hierarchical Content Delivery
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
182/212
g , pto maximize the hit rate of the local caching serversuch that the requested contents do not have to beretrieved from the CP
• To accomplish this objective in the Internet in ascalable way, hierarchical cooperative content deliverytechniques are used in providing content delivery tolocal caching servers
182
Hierarchical Content Delivery
• CD & LCF (Content Distribution & Location ControlF nctions) controls the o erall content deli er process
Hierarchical Content Delivery
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
183/212
Functions) controls the overall content delivery process,and has all content IDs of the CDN
• CCF (Cluster Control Function) controls multiple CDPFs(Content Delivery Processing Functions) and savescontent IDs of the cluster
• CDPF stores and delivers the contents to the users
183
Hierarchical Content Delivery
Hierarchical Content Delivery Network Example
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
184/212
184
Hierarchical Content Delivery
Content Delivery Procedures
C 1
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
185/212
• Case 1• Requested content is in the local cluster
• Content request message is delivered to the CCF
• CCF sends a session request message to theCDPF to deliver the content to the user
• CDPF delivers the content to the user
185
Hierarchical Content Delivery
Content Delivery Procedures
• Case 1 Procedures
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
186/212
186
Hierarchical Content Delivery
Content Delivery Procedures• Case 2
• Requested content is not in the local cluster but
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
187/212
Requested content is not in the local cluster, butanother local cluster (i.e., target cluster) has thecontent
• Procedures• Content request message is redirected from
the local cluster to the CD & LCF
• Continued…
187
Hierarchical Content Delivery
• Case 2• Procedures Continued…
Content Delivery Procedures
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
188/212
Procedures Continued…• CD & LCF checks if the requested content is
in theother cluster
• Requested content can be delivered from thetarget cluster to the user directly, or throughthe local cluster (the local cluster can storethe requested content)
188
Hierarchical Content Delivery
Content Delivery Procedures• Case 2 Procedures
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
189/212
189
Hierarchical Content Delivery
• Case 3Wh th t d t t i t i th CDN
Content Delivery Procedures
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
190/212
• When the requested content is not in the CDN• Content request message is sent from the
CD & LCF to the CP
• CP delivers the content to the user throughthe local cluster • The requested content can be stored in
the local cluster
190
Hierarchical Content Delivery
Content Delivery Procedure
• Case 3 Procedures
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
191/212
191
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
192/212
REFERENCES
CDN
192
• “Content Delivery Functional Architecture in NGN,” Telecommunication
Standardization Sector of ITU, White Paper, Sep. 2010.
• “Content delivery networks: Market dynamics and growth perspectives,” InformaTelecoms & Media, White Paper, Oct. 2012.
• Cisco, Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,
http://www cisco com/c/en/us/solutions/collateral/service provider/visual networking
References
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
193/212
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-
index-vni/white_paper_c11-520862.pdf [Accessed June 1, 2015]
• Akamai, http://www.akamai.com/index.html/
• LimeLight, http://www.limelight.com/
• Level 3, http://www.level3.com/
• CDNetworks, http://www.us.cdnetworks.com/
193
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
194/212
CDN MarketCDN (Content Delivery Network)
194
CDN Market
Measuring the CDN Market Value
• There are many ways to evaluate the value of the CDN market
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
195/212
• Evaluation is related to the diverse range of CDN industryparticipants• Example of industry participants
• CSP (Communications Service Provider)• Industry manufacturers• CDN service providers• Content provider
195
CDN Market
• For communication service providers, the CDN’s value
Measuring the CDN Market Value
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
196/212
includes improving retail service delivery and supportingtheir efforts to win and retain customers
• For industry manufacturers, the market value is related tothe demand from telcos, content providers and otherbusinesses
196
CDN Market
CDN Market Size• 2014 CDN Market size was $3.71 billion• CDNs Market Components
• Content delivery technologies, hardware, analytics,
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
197/212
• CDN Market Estimations• Expectations to grow to $12.16 billion by 2019
• Predicted 26.3% CAGR (Compound Annual Growth Rate) from2014~2019
Co te t de e y tec o og es, a d a e, a a yt cs,monitoring, encoding, transparent caching, DRM(Digital Rights Management), CMS (Content
Management System), OVP (Online Video Platform),etc.
197
CDN Market
CDN Service Providers
• Akamai has about 110,000 servers over the world.Akamai's service includes cloud computing, HD video
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
198/212
delivery, etc.
• Amazon Cloudfront delivers static and streamingcontents. Amazon Cloudfront works seamlessly withother Amazon Web and Cloud Service solutions• S3 (Simple Storage Service)• EC2 (Elastic Compute Cloud)
198
CDN Market
• CDNetworks has POPs (Point of Presences) in 6continents, including 20 POPs in China. World’s
CDN Service Providers
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
199/212
continents, including 20 POPs in China. World s3rd largest, and Asia’s #1, full-service provider
• Level 3 supports a comprehensive encoding suitefor video data, and intelligent traffic managerservices (i.e., load balance)
199
CDN Market
• Limtlight has 6,000 servers at 75 POPs (Points ofPresence), and more than 30 regional content delivery
CDN Service Providers
-
8/17/2019 Cloud Computing, Big Data, & CDN Emerging Technologies
200/212
Presence), and more than