Performance - cse.tkk.fi · web performance translates directly to dollars and cents—e.g., a...

Performance

Mobile Cloud Computing 14.11.2014 Jukka K. Nurminen

Today

•  General

•  What is performance? •  Why does it matter? •  What can you do?

•  Especially for mobile web

What is Performance? •  Response time

–  Processing •  Mobile & Server

–  Communication •  Latency vs. bandwidth

–  Feedback •  Real response time vs. hiding the delay

•  Resource use –  Especially for concurrent apps –  Memory and cache use, CPU load

•  Money? •  Energy

–  Specific for mobile –  Visible only after longer time of use –  => harder to use for marketing

Bandwidth and Latency

•  Latency the delay between the sender and the receiver decoding it, this is mainly a function of the signals travel time, and processing time at any nodes the information traverses “How fast can you drive?”

•  Bandwidth commonly measured in bits/second is the maximum rate that information can be transferred “How many lanes does the road have”

•  Extreme example “Sneakernet” –  Copy data to DVD disks, load them to a truck –  Hours of latency but very high and cheap bandwidth

Jukka K. Nurminen

4

Time is money

Delay User perception 0–100 ms Instant 100–300 ms Small perceptible delay 300–1000 ms Machine is working 1,000+ ms Likely mental context switch 10,000+ ms Task is abandoned

Well-publicized studies from Google, Microsoft, and Amazon all show that web performance translates directly to dollars and cents—e.g., a 2,000 ms delay on Bing search pages decreased per-user revenue by 4.3%! Similarly, an Aberdeen study of over 160 organizations determined that an extra one-second delay in page load times led to 7% loss in conversions, 11% fewer page views, and a 16% decrease in customer satisfaction!

What can an end-user do?

•  Get a faster device •  Get a faster network connection •  Get more efficient software, e.g. browser

•  All of these approaches have problems

What can a developer do?

•  Ensure that enough server resources are available –  Own or cloud providers

•  Do the application in a smart way –  Appropriate algorithms and good code –  Consider the special needs of different platforms e.g. mobile

•  Hard to manage multiple versions •  Responsive web design

–  but still long way to go

–  Use new platform features? •  Can bring major simplifications and resource savings (e.g.

websocket) •  May not work with old platforms

Difficulty: Compatibility slows down adoption of new features Time to have the feature in the browser + Time when users have the latest browser = Many years •  This is easier if you have a dedicated client

–  but then the users have to install your app

HTML5test.com 11/13/2014 HTML5test - How well does your browser support HTML5?

http://html5test.com/results/desktop.html 1/1

HTML5test.com score over the years

Chrome Firefox Internet Explorer Maxthon Opera Safari

Jan 2010Jan 2009 Jan 2011 Jan 2012 Jan 2013 Jan 20140

100

200

300

400

500

600

Score (points)

World wide device shipments

Mobile-first?

Web structure

Firewalls NATs

Browser Architecture

•  Generate and submit web requests to web servers •  Accept responses from web servers and produce visual

presentations out of it •  Render the results

Techniques for responsive web apps Push technologies AJAX

Push Technologies

•  How can browser know if something changed on the server?

•  What if we wanted to browser to react to changes that happen at the server side? –  E.g. new email arrives, location of an object changes

•  Simple (and inefficient way) –  Browser polls the server

Push technologies •  In normal cases the client (=browser) is the initiating

party making the request. •  A number of alternative possibilities (sometimes comet

is the umbrella term) –  A set of “hacks”

•  Leaving connection open •  Long polling (a query that only returns when results are available)

–  XMPP based pub-sub techniques and other protocols

•  With HTML5 –  WebSocket –  Server-sent events

AJAX Introduction

•  Rationale: –  Why should a complete web page be loaded, if only a small

portion of it needs updating? –  Why should all the data be loaded to a page, if the user is likely

not to go through it all? •  Although the acronym is “Asynchronous JavaScript and

XML” it is increasingly common to return JSON rather than XML

AJAX Asynchronous JavaScript and XML •  Enables more interactive web applications •  Moves data in the background

2. Run server side code to generate proper response to request 3. Send the response in JSON or XML

1. Submit XMLHttpRequest

4. Response processing typically modifies DOM tree, which influences what the user sees

Mobile Web Clients

sdfa

Challenges and Opportunities

•  Limited bandwidth •  Battery consumption of communication and computing •  Amount of data influences may influence phone bill

–  And operators are more sensitive to the amounts of data –  Although flat rate tariffing is spreading

•  Mobile display size and other limitations of the mobile UI Opportunities: •  Mobile first •  Context (e.g. location) specific services •  Always-on •  Access to human user •  Novel innovations

Separate content for mobile devices - Response web design

Impact of mobile versions www.websiteoptimization.com

An average web application, as of early 2013, is composed of the following: 90 requests, fetched from 15 hosts, with 1,311 KB total transfer size HTML: 10 requests, 52 KB Images: 55 requests, 812 KB JavaScript: 15 requests, 216 KB CSS: 5 requests, 36 KB Other: 5 requests, 195 KB

Proxy to transcode the content

•  Compresses the page content •  Pros:

–  Much less data needs to be sent –  The resulting page fits the mobile

screen –  Does not require much

processing on the phone side •  Cons:

–  Not the same page as what was created => functionality may suffer

–  All data goes through Opera servers => single point of failure

RabbIT proxy

•  Compress text pages to gzip streams. This reduces size by up to 75%

•  Compress images to 10% jpeg. This reduces size by up to 95%

•  Remove advertising •  Remove background images •  Cache filtered pages and images •  Uses keepalive if possible •  Easy and powerful configuration •  Multi threaded solution written in java •  Modular and easily extended •  Complete HTTP/1.1 compliance

Web Perfomance Based on Ilya Grigori: High Performance Browser Networking Available: http://chimera.labs.oreilly.com/books/1230000000545 Sections 10-12 A good source of modern web related information

Latency is the main bottleneck, not bandwidth

What if we could reduce cross-atlantic RTTs from 150 ms to 100 ms? This would have a larger effect on the speed of the internet than increasing a user’s bandwidth from 3.9 Mbps to 10 Mbps or even 1 Gbps.

www.webpagetest.org

www.aalto.fi

www.aalto.fi (with www.webpagetest.org)

First View

Second View

www.google.com searching for “Aalto”

How to speed things up?

Modern browser techniques to speed up things •  Resource pre-fetching and prioritization •  Document, CSS, and JavaScript parsers may communicate extra information to

the network stack to indicate the relative priority of each resource: blocking resources required for first rendering are given high priority, while low-priority requests may be temporarily held back in a queue.

•  DNS pre-resolve •  Likely hostnames are pre-resolved ahead of time to avoid DNS latency on a

future HTTP request. A pre-resolve may be triggered through learned navigation history, a user action such as hovering over a link, or other signals on the page.

•  TCP pre-connect •  Following a DNS resolution, the browser may speculatively open the TCP

connection in an anticipation of an HTTP request. If it guesses right, it can eliminate another full roundtrip (TCP handshake) of network latency.

•  Page pre-rendering •  Some browsers allow you to hint the likely next destination and can pre-render

the entire page in a hidden tab, such that it can be instantly swapped in when the user initiates the navigation.

Some techniques

•  Critical resources such as CSS and JavaScript should be discoverable as early as possible in the document.

•  CSS should be delivered as early as possible to unblock rendering and JavaScript execution.

•  Noncritical JavaScript should be deferred to avoid blocking DOM and CSSOM construction.

•  The HTML document is parsed incrementally by the parser; hence the document should be periodically flushed for best performance.

HTTP/1.1 performance improvement mechanisms •  Persistent connections to allow connection reuse •  Chunked transfer encoding to allow response streaming •  Request pipelining to allow parallel request processing •  Byte serving to allow range-based resource requests •  Improved and much better-specified caching

mechanisms

Persistent connections

Saving: (N-1) * RTT N = number of resources (avg 90)

Pipelined: Additional requests sent before replies arrive

Parallel processing at server side: Head of line blocking as replies cannot be multiplexed

Alternative to pipelining

•  Up to N parallel TCP connections to from a client to a server –  N is browser dependent, typically 6

•  Sharding: resources split under multiple host names (which could even reside on same server) –  Allows to have any number of parallel TCP connections (more

than 6) •  What number is optimal?

–  Application specific –  Latency and bandwidth specific –  Each new host names causes a DNS lookup (+ TLS handshake

in case of HTTPS)

Header overhead

•  Average header overhead: 500-800 bytes per HTTP request

•  With cookies much more (can be limited to be < 8 KB by servers or proxies)

Headers: 352 bytes Payload: 15 bytes 96% protocol overhead

Reduce number of loaded resources

•  Eliminate unnecessary requests •  Concatenation

–  Multiple JavaScript or CSS files are combined into a single resource. •  Spriting

–  Multiple images are combined into a larger, composite image. •  Inlining

–  Embed JavaScript, CSS, and other resources to HTML •  Drawbacks

–  Cache performance •  Single update invalidates whole cache

–  Loading of unnecessary resources –  JavaScript and CSS parsing only when whole file is downloaded (HTML

processed incrementally) –  Complicated management + search for optimal strategy

HTTP 2.0

Jukka K. Nurminen

Heavily influenced by Chapter 12 of High performance browser networking

4.2.2014

SPDY & HTTP2.0

•  By Google, since 2009 •  Goals:

–  Target a 50% reduction in page load time (PLT). –  Avoid the need for any changes to content by website authors. –  Minimize deployment complexity, avoid changes in network

infrastructure. –  Develop this new protocol in partnership with the open-source

community. –  Gather real performance data to (in)validate the experimental

protocol.

•  “When we download the top 25 websites over simulated home network connections, we see a significant improvement in performance—pages loaded up to 55% faster.”

For more see e.g. http://www.webpronews.com/google-spdy-gaining-adoption-2012-01

SPDY Today

•  Supported in Chrome, Firefox, and Opera browsers (maybe more)

•  Many large web destinations (e.g., Google, Twitter, Facebook) offer SPDY to compatible clients

•  Many people are using SPDY (but they don’t know that) •  Work towards HTTP 2.0 standard starts

–  Based on SPDY lessons learned

HTTP 2.0 standard

•  All HTTP 1.1. concepts are available –  HTTP methods, status codes, URIs, and header fields

•  All existing applications can be delivered without modification.

•  HTTP 2.0 main focus is on performance •  HTTP 2.0 modifies how the data is formatted (framed)

and transported between the client and server •  Standardization is still on-going •  Heavily influence by Google and its SPDY protocol

HTTP 2.0 targets •  Substantially and measurably improve end-user perceived

latency in most cases, over HTTP 1.1 using TCP. •  Address the "head of line blocking" problem in HTTP. •  Not require multiple connections to a server to enable

parallelism, thus improving its use of TCP, especially regarding congestion control.

•  Retain the semantics of HTTP 1.1, leveraging existing documentation, including (but not limited to) HTTP methods, status codes, URIs, and where appropriate, header fields.

•  Clearly define how HTTP 2.0 interacts with HTTP 1.x, especially in intermediaries.

•  Clearly identify any new extensibility points and policy for their appropriate use.

Performance

Compatibility

Extensibility

Binary Framing Layer

Terminology

•  All communication is performed with a single TCP connection. •  The stream is a virtual channel within a connection, which

carries bidirectional messages. Each stream has a unique integer identifier (1, 2, …, N).

•  The message is a logical HTTP message, such as a request, or response, which consists of one or more frames.

•  The frame is the smallest unit of communication, which carries a specific type of data—e.g., HTTP headers, payload, and so on. –  Frames use binary encoding, and header data is compressed

Request and Response Multiplexing

•  HTTP 1.x: Only one response can be delivered at a time (response queuing) per connection –  Head-of-line blocking, inefficient TCP use

•  HPPT 2.0: full request and response multiplexing

Request Prioritization

•  Each stream can be assigned a 31-bit priority value –  0 represents the highest priority stream.

•  Client and server can apply different strategies to process individual streams, messages, and frames in an optimal order

•  Strategies –  HTML document itself is critical to construct the DOM; the CSS is

required to construct the CSSOM; JavaScrip is often needed for both. Remaining resources, such as images, are often fetched with lower priority

–  Modern browsers prioritize requests based on type of asset, its location on the page, and even learned priority from previous visits—e.g., if the rendering was blocked on a certain asset in a previous visit, then the same asset may be prioritized higher in the future.

One connection per origin

•  HTTP 2.0 connections are persistent, and only one connection should be used between the client and server –  No more multiple TCP connections like in HTTP1.1

•  Less overhead –  fewer sockets to manage along the connection path, smaller memory

footprint, and better connection throughput. •  Consistent prioritization between all streams •  Better compression through use of a single compression context •  Improved impact on network congestion due to fewer TCP

connections •  Less time in slow-start and faster congestion and loss recovery •  Most HTTP transfers are short and bursty, whereas TCP is

optimized for long-lived, bulk data transfers. By reusing the same connection between all streams, HTTP 2.0 is able to make more efficient use of the TCP connection.

Server Push

•  Push additional resources to the client with client explicitly asking for them –  Typical web application consists of dozens of resources, all of which are discovered by the

client by examining the document provided by the server. As a result, why not eliminate the extra latency and let the server push the associated resources to the client ahead of time?

•  Pushed content goes to browser cache –  Invisible to client application

•  Different strategies to apply server push

Header compression

•  HTTP 1.x: –  Header sent sent as plain text –  Adds around 500–800 bytes of overhead per request, and

kilobytes more if HTTP cookies are required

•  HTTP 2.0 –  Only changes to previous request are sent. HTTP 2.0 uses

"header tables" on both the client and server to track and store previously sent key-value pairs.

–  Header tables persist for the entire HTTP 2.0 connection and are incrementally updated both by the client and server.

–  Each new header key-value pair is either appended to the existing table or replaces a previous value in the table.

Initial SPDY mechanism did not work because of security problems •  Early versions of SPDY used zlib, with a custom

dictionary, to compress all HTTP headers, which delivered 85%–88% reduction in the size of the transferred header data, and a significant improvement in page load time latency

•  In the summer of 2012, a "CRIME" security attack was published against TLS and SPDY compression algorithms, which could result in session hijacking. As a result, the zlib compression algorithm was disabled

Other features

•  Flow-control in a rather similar fashion as in TCP •  Application Layer Protocol Negotiation (ALPN) is used to

discover and negotiate HTTP 2.0 support as part of the regular HTTPS negotiation

Status of standardization

•  Lots of open issues still •  Performance is not always as good as has been the

target •  Using SPDY as an intermediate option still makes a lot

of sense •  If you only develop web apps little need to care. If

performance of those apps is important good to understand what is happening

Performance - cse.tkk.fi · web performance translates directly to dollars and cents—e.g., a...

Documents

Transcript of Performance - cse.tkk.fi · web performance translates directly to dollars and cents—e.g., a...