School of Computing National University of Singaporetbma/teaching/cs4226y16_past/07-CDN.pdf · but...
-
Upload
truongmien -
Category
Documents
-
view
215 -
download
0
Transcript of School of Computing National University of Singaporetbma/teaching/cs4226y16_past/07-CDN.pdf · but...
Richard T. B. Ma School of Computing
National University of Singapore
Content Delivery Networks
CS 4226: Internet Architecture
Motivation
Serving web content from one location scalability -- “flash crowd” problem reliability performance
Key ideas: cache content & serve requests from multiple severs at network edge reduce demand on site’s infrastructure provide faster services to users
The middle mile problem
The last mile problem is solved by high levels of global broadband penetration but imposes a new question of scale by demand
The first mile is easy in terms of performance and reliability
Get stuck in the middle
Inside the Internet
Tier 1 ISP Tier 1 ISP
Large Content Distributor Large Content
Distributor
IXP IXP
Tier 1 ISP
Tier 2 ISP
Tier 1 ISP Tier 1 ISP
Large Content Distributor Large Content
Distributor
IXP IXP
Tier 1 ISP
Tier 2 ISP
Tier 2 ISP
Tier 2 ISP
Tier 2 ISP Tier 2
ISP Tier 2 ISP
Tier 2 ISP
Tier 2 ISP
Inside the Internet
The middle mile problem
The last mile problem is solved by high levels of global broadband penetration but imposes a new question of scale by demand
The first mile is easy in terms of performance and reliability
Stuck in the middle, potential solutions: “big data center” CDNs highly distributed CDNs how about P2P?
The challenge
The fat file paradox though bits are transmitted at a speed of light,
distance between user and server is critical latency and throughput are coupled due to TCP
Distance (Server to User)
Network RTT
Packet Loss
Throughput 4GB DVD Download Time
Local: <100 mi. 1.6 ms 0.6% 44 Mbps (high quality HDTV)
12 min.
Regional: 500–1,000 mi. 16 ms 0.7% 4 Mbps (basic HDTV)
2.2 hrs.
Cross-continent: ~3K mi. 48 ms 1.0% 1 Mbps (TV) 8.2 hrs.
Multi-continent: ~6K mi. 96 ms 1.4% 0.4 Mbps (poor)
20 hrs.
Major CDNs (by ‘15 revenue)
Limelight $174M $120M of CDN
Level 3 $8B
$235M of CDN tier-1 transit provider
Akamai $1.03B $700M of CDN
Amazon $6B
$1.8B of CDN, but big % on storage
cloud provider
EdgeCast $180M $125M of CDN
Fastly $60M
$9M of CDN
Rest of smaller regional CDNs (MaxCDN,
CDN77 etc.) $100M combined.
Major CDNs (by ‘15 revenue)
Highwinds $135M $95M of CDN
ChinaCache $270M
$81M of CDN also a cloud provider
Reference
Cheng Huang, Angela Wang, Jin Li and Keith W. Ross, “Measuring and Evaluating Large-Scale CDNs”, Internet Measurement Conference 2008.
Erik Nygren, Ramesh K. Sitaraman and Jennifer Sun, ” The Akamai Network: A Platform for High-erformance Internet Applications ,” ACM SIGOPS Operating Systems Review 44(3), July 2010.
How can we understand a CDN?
We don’t know their internal structures but could “infer” via a measurement approach
We know that CDNs use a DNS trick for example, end-user types www.youtube.com resolve IP address via local DNS (LDNS) server LDNS queries YouTube’s authoritative DNS YouTube uses CDN if returns a CNAME like
• a1105.b.akamai.net or move.vo.llnwd.net • LDNS then queries CNAME’s authoritative DNS
server and get the IP address of the content server
DNS records DNS: distributed db storing resource records (RR)
Type=NS (Name Server) name is domain (e.g.,
foo.com) value is hostname of
authoritative name server for this domain
RR format: (name, value, type, ttl)
Type=A (Address) name is hostname value is IP address
Type=CNAME (Canonical NAME) name is alias name for some
“canonical” (the real) name www.ibm.com is really servereast.backup2.ibm.com value is canonical name
Type=MX (Mail eXchange)
value is name of mailserver associated with name
Content Server Assignment
The returned content server will be close to the issuing local DNS (LDNS) server
Measurement Framework
Assumptions: CDN chooses nearby content server based on
the location of LDNS that originates the query the same LDNS might get different content
servers for the same query at different times
1. Determine all the CNAMEs of a CDN
2. Query a large number of LDNSs all over the world, at different times of the day, for all of the CNAMEs found in step 1
Finding CNAMEs and LDNSs
Find all the CNAMEs of a CDN use over 16 million web hostnames a DNS query tells if it resolves to a CNAME whether the CNAME belongs to the target CDN thousands of CNAMEs for Akamai and Limelight
Locate a large # of distributed LDNSs need open recursive DNS servers use over 7 million unique client IP addresses and
over 16 million web hostnames reverse DNS lookup and test trial DNS queries
Open recursive DNS servers
many different DNS servers map into same IP addresses
obtain 282,700 unique open recursive DNS servers
Measurement Platform
300 PlanetLab nodes, 3 DNS queries per second more than 1 day for the measurement
The Akamai Network
Type (a): returns 2 IP addresses, different for different locations hundreds of IPs behind a CNAME, ~11,500 content servers
Type (c): returns only 1 IP address; 20-100 IPs for each CNAME guesses virtualization used for isolated environments
Type # of CNAMES
# of IPs Usage
(a) *.akamai.net 1964 ~11,500 conventional content distribution
(b) *.akadns.net 757 A few per CNAME
load balancing for customers who has their own networks
(c) *.akamaiedge.net 539 ~36,000 dynamic content distribution/secure service
The Akamai Network
~27K content servers, ~6K also run DNS 60% in the US, 90% in top 10 countries flat distribution in ISPs: 15% in top 7
The Limelight Network Easier as it is an Autonomous System (AS)
obtain the IP addresses of the AS only ~4K servers
Measuring performance
Two metrics
availability: how reliable are the CDN servers? delay: how fast content can be retrieved?
Performance results are controversial do the metrics sufficiently match overall
system performance goals? how does performance metric map to specific
customer performance perception? both Akamai and Limelight issued statements to
“correct” the research results
Availability
Monitor all servers for 2 months, ping once every hour
If a server does not respond in 2 consecutive hours, considered “down”
But does “down” server necessarily affect availability?
Akamai’s statement
Availability cannot be reflected based on server uptime alone
Akamai’s CDN has more servers but not necessarily harder to maintain
The use of open-resolvers miss many Akamai servers, hence over-estimating delay in Akamai case
Akamaiedge is not a “virtualized network”
Limelight’s statement
Overall performance can’t be represented by just two dimensions (availability & delay)
Server downtime does not necessarily affect availability; suggested some way to measure and claim in the 99.9% range
RTT of a packet can’t represent delay for objects; suggest use different object sizes
More authoritative performance study should be based on customer trial
Akamai vs. Limelight Akamai Limelight
# of servers ~27K ~4K # of clusters 1158 18
95 percentile delay ~100ms ~200ms average delay ~30ms ~80ms
penetration in ISPs high low cost high low
complexity high low approach highly
distributed “big data center”
Facts about Akamai (2014-2015)
CDN company evolved from MIT research invent better ways to deliver Internet content tackle the "flash crowd" problem
Earns over US$1B revenue in 2015, 25% of the whole CDN market
Runs on 150,000 servers in 1,200 networks across 92 countries
Internet delivery challenge
5% traffic for the largest network
Over 650 networks to reach 90%
“Long tail” distribution of traffic
% of access traffic from top networks
Other challenges
Peering point congestion little economic incentive for middle mile
Inefficient routing protocols how does BGP work?
Unreliable networks de-peering between ISPs
Inefficient communication protocols Scalability App limitations and slow rate of adoption
Delivery network as a virtual network
Works as an overlay compatible transparent to users adaptive to changes
The untaken clean-slate approach adoption problem development cost
The Akamai Network at ~2010
A large distributed system, consists of ~ 60000 servers ~ 1000 networks ~ 70 countries
Can also be regarded as multiple delivery networks for different types of content static web streaming media dynamic applications
Anatomy of Delivery Network
edge servers global deployment thousands of cites
mapping system assigns requests to edge servers use historic data system conditions
Anatomy of Delivery Network
transport system move content from
origin to edge may cache data
communication and control system disseminate status
and control message configuration update
Anatomy of Delivery Network
data collection and analysis collect and process
data, e.g., logs used for monitoring,
analytics, billing …
management portal customer visibility &
fine-grained control update edge servers
System Design Principles
Goals: scalable and fast data collection & management safe, quick & consistent configuration updates enterprise visibility & fine-grained control
Assumption: a significant number of failures is expected to be occurring at all times machine, rack, cluster, connectivity or network philosophy: failures are normal and the delivery
network must operate seamlessly despite them
System Design Principles
Design for reliability ~100% end-to-end availability full redundancy and fault tolerance protocols
Design for scalability handle large volumes of traffic, data, control …
Limit the necessity for human management automatic, needed to scale, respond to faults
Design for performance improve bottleneck, response time, cache hit
rate, resource utilization and energy efficiency
Streaming and content delivery
Architectural considerations for cacheable web content and streaming media
Principle: minimize long-haul communication through the
middle-mile bottleneck of the Internet feasible by pervasive, distributed architectures
where servers sit as “close” to users as possible
Key question: how distributed it needs to be?
How distributed it needs to be?
Akamai’s approach: deploy server clusters not only in in Tier 1 and Tier 2 data centers also in network edges, thousands of locations more complexity and costs
Reasons: highly fragmented Internet traffic, e.g., top 45
network only account for half of access traffic distance between server and users is the
bottleneck for video throughput due to TCP P2P is not good for management and control
Video-grade scalability
Content providers’ problem YouTube receives 2 billion views per day high rates for video, e.g., 2-40 Mbps for HDTV need to scale with user requests high capital and operational costs to over-
provision so as to absorb spikes on-demand
Akamai’s throughput 3.45 Tbps in April 2010 ~ 50-100 Tbps throughput now needed
Akamai’s challenges need consider throughput along entire path bottlenecks everywhere
original data centers, peering points, network’s backhaul capacity, ISP’s upstream connectivity
a data center’s egress capacity has little impact on real throughput to end users
even 50 well-provisioned, connected data centers cannot achieve ~100 Tbps
IP-layer multicast does not work in practice, needs its own transport system
Transport system for content
Tiered content distribution target for “cold” or infrequently-accessed
efficiency cache strategy with high hit rates
well-provisioned and highly connected “parent” clusters are utilized
original servers are offloaded in the high 90’s
helpful in flash crowds for large objects
Transport system for streaming
An overlay network for live streaming once a stream is captured & encoded, it’s sent
to a cluster of servers called the entrypoint
automatic failover among multiple entrypoints
within an entrypoint cluster, distributed leader election is used to tolerate machine failure
publish-subscribe (pub-sub) model: • entrypoint publishes available streams, and each edge
server subscribes to streams that it requires
Transport system for streaming
An overlay network for live streaming reflectors act as intermediaries between the
entrypoints and the edge clusters
scaling: enables rapidly replicating a stream to a large number of edge clusters to serve popular events
quality: provides alternate paths between each entrypoint and edge cluster, enhancing end-to-end quality via path optimization
Application delivery network
Target for dynamic web application and non-cacheable content
Two complementary approaches speed up long-haul communications by using the
Akamai platform as a high-performance overlay network, i.e., the transport system
pushes application logic from the origin server out to the edge of the Internet
Transport system for app acceleration Path optimization
overcome BGP, collect topology & performance data from mapping system
dynamically select potential intermediate nodes for a particular path, or multiple paths
~30-50% performance improvement by overlay used also for packet loss reduction
Middle East cable cut in 2008
Transport system for app acceleration
Transport protocol optimizations proprietary transport-layer protocol use pools of persistent connections to eliminate
connection setup and teardown overhead optimal TCP window sizing with global knowledge intelligent retransmission after packet loss
Application optimizations parse HTML and prefetch embedded content content compression reduces # of roundtrips implement app logic at edge, e.g., authentication
Distributing applications to the edge
EdgeComputing Services of Akamai
E.g., deploy and execute request-driven Java J2EE apps on Akamai’s edge servers
Not all apps can be run entirely on the edge
Some use cases content aggregation/transformation static databases data collection complex applications
Other platform components
Edge server platform
Mapping system
Communications and control system
Data collection and analysis system
Additional systems and services
Edge server platform
Functionalities controlled by metadata
origin server location and response to failures
cache control and indexing
access control
header alteration (HTTP)
EdgeComputing
performance optimization
Mapping system
Global traffic director uses historic and real-time data about the
health of the Akamai network and the Internet
objective: create maps that are used to direct traffic on the Akamai network in a reliable, efficient, and high performance manner
a fault-tolerant distributed platform: run in multiple independent sites and leader-elect based on the current health status of each site
two parts: scoring system + real-time mapping
Mapping system Scoring system creates the current Internet topology collects/processes data: ping, BGP, traceroute monitors latency, loss, connectivity frequently
Real-time mapping creates the actual maps used to direct end
users’ requests to the best edge servers selects intermediates for tiered distribution
and the overlay network first step: map to cluster
• Based on scoring system info, updated every minute second step: map to server
• Based on content locality, load changes, and etc.
Communications and control system
Real-time distribution of status and control information small real-time message throughout the net solution: pub-sub model
Point-to-point RPC and web services Dynamic configuration updates
quorum-based replication … another whole paper Key management infrastructure Software/machine config. management
Data collection and analysis system
Log collection over 10 million HTTP/sec 100TB/day compression, aggregation, pipeline and filter … reporting and billing
Real-time data collection and monitoring a distributed real-time relational database that
supports SQL query … another whole paper Analytics and Reporting
enable customers to view traffic & performance uses log and Query system, & e.g., MapReduce