JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
-
Upload
jazoon13 -
Category
Technology
-
view
407 -
download
2
description
Transcript of JAZOON'13 - Abdelmonaim Remani - The Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani@PolymathicCoder
Creative Commons Attribution Non-Commercial License 3.0 Unported
The graphics and logos in this presentation belong to their rightful owner
• Platform Architect at just.me Inc.
• JavaOne RockStar and frequent speaker at many developer events and
conferences including JavaOne, JAX, OSCON, OREDEV, 33rd Degree, etc...
• Open-source advocate and contributor
• Active Community member• The NorCal Java User Group• The Silicon Valley Dart Meetup
Bio:
http://about.me/PolymathicCoder
Twitter: @PolymathicCoder
Email: abdelmonaim.remani
@gmail.com
SlideShare:
http://www.slideshare.net/PolymathicCoder/
| @PolymathicCoder
About Me
Follow @PolymathicCoder
http://speakerscore.com/jazoon-scalability
• The Economies of Scale
• “In microeconomics, economies of scale are the cost
advantages that enterprises obtain due to size [...] often
operational efficiency is [...] greater with increasing
scale [...]” -Wikipedia
| @PolymathicCoder
The Title of the Talk
Let’s Go!
• Only the enterprise worried about scalability• The rise of social and the abundance of mobile
• An exponential growth of internet traffic• The creation of a spoiled user-base
• I want to see the closest Moroccan restaurants to my current location on a map along with consumer ratings and whether any of my friends has recently checked-in in the last 30 days
• The lines are blurred between consumer applications and the enterprise applications
| @PolymathicCoder
Blurred Lines…
Scalability is everyone’s problem…
| @PolymathicCoder
The Bar Is Higher!
What is
Scalability?
• The ability of an application to handle an increasing amount of work without performance degradation
• Not a good definition! It implies:• You’ll need to scale forever
• Scalability is relative; It is bound by one’s specific needs
• You’ll need to be fully scalable from day one• Scalability is evolutionary; It is a gradual process
• There are no external constraints• Unrealistic
| @PolymathicCoder
The Common Definition
• The ability of an application to gracefully evolve within the constraints of its ecosystem in order to handle the maximum potential amount of work without performance degradation
• Work?• Simultaneous requests
• Performance degradation?• Increased latency or decreased throughput
| @PolymathicCoder
A Better Definition
• Don’t be surprised if• Your application supports one
million users• You add one more feature• 500,000 user load crashes your
system or renders it unusable
| @PolymathicCoder
A Black Art!
Latency Is Your Enemy
• To scale is to reduce latency• To reduce latency is to address bottlenecks• To scale is to address bottlenecks
• The usual suspects• The CPU• The Storage I/O• The Network I/O
• Inter-related
| @PolymathicCoder
Syllogismo
OvercomingThe CPU
Bottleneck
• Nothing affects the CPU more than the instructions it is summoned to execute
• This is about your application• How it is written (Architecture, code base,
etc..)• How it is deployed
| @PolymathicCoder
Overcoming the CPU Bottleneck
A Scalable
Architecture
• “Things that people perceive as hard-to-change” -Martin Flower• http://martinfowler.com/ieeeSoftware/
whoNeedsArchitect.pdf• Decision you commit to; the ones that will be
stuck with you forever
| @PolymathicCoder
Architecture?
• Choose the right technologies• Platform• Languages
• Frameworks• Libraries
• Make the right abstractions• Loosely-coupled components
• Functional abstractions• Technical abstractions• Make sure that the latter is subordinate to the former and not the
other way around
| @PolymathicCoder
Be Wise… Think Twice…
Write Good
Code
• Think your algorithms through and mind their complexity (Asymptotic Complexity, Cyclomatic Complexity, etc…)
• SOLIDify your design• Single Responsibility, Open-Closed, Liskov Substitution,
Interface Segregation, and Dependency Inversion• Understand the limitation of your technology and
leverage its strengths
| @PolymathicCoder
Write Good Code
• Obsess with testing• TDD/BDD
• Tools• Static code analyzers (PMD, FindBugs, etc…)• Profilers (Detect memory leaks, bottlenecks, etc…)
• Etc…
| @PolymathicCoder
Quality… Quality…
Quality!
• Read• The Classics (The Mythical Man-Mouth, etc…)• GoF’s “Design Patterns”• Eric Evans’ “Domain-Driven Design”• Every book by Martin Fowler• Uncle Bob’s “Clean Code”• Josh Bloch’s “Effective Java”• Brian Goetz’s “Java Concurrency in Practice”• Tech Papers/Blogs• Etc...
| @PolymathicCoder
Know Thy S#!t
The
Inevitable
You’ll end up with…
At best…The fading tradition of making cow dung piles
http://news.ukpha.org/2011/01/the-fading-tradition-of-making-cow-dung-piles/
| @PolymathicCoder
You do all that…
| @PolymathicCoder
Still better than…
• What is it?• The quick-and-dirty you are not proud of• What you would have done differently haven't you had
time• It’s a matter of time before it starts to smell really
bad• What to do?
• The fact you recognize it as debt is good thing in itself• Keep tabs and refactor often• Cut the right corners
• Don’t mortgage architecture (Don’t lock yourself out) | @PolymathicCoder
Technical Debt
Write Code That Scales
Up
• Vertical Scaling (Scaling Up)• On a single-node system• Adding more computing resources to the node (Getting
a beefier machine)• Writing code to harness the full power of the one
node
| @PolymathicCoder
Vertical Scaling
• Writing concurrent code of simultaneously executing code
• Simple business logic within containers is already multi-threaded
• Executing complex business logic within a reasonable time
• Break it into smaller steps• Execute them in parallel• Aggregate data back
| @PolymathicCoder
Parallelism At The Node Level
• Moore’s Law• Performance gain is automatically realized by software
(Code is faster on faster hardware)• Nothing is forever…
• The era of the multi-core chip• We need to write code to take advantage of all
cores
| @PolymathicCoder
Easier Said Than Done…
• Synchronize state across threads across multiple cores• Good luck!
• Relay on frameworks and libraries (Fork/Join, Akka, etc…)
• Go immutable• Not always straightforward or possible
• Go functional (Scala, Clojure, etc…)
| @PolymathicCoder
Easier Said Than Done…
• Amdahl’s Law• Throwing more cores does not necessarily result in
performance gain• Diminishing return at some point no matter how many
cores you throw in
| @PolymathicCoder
It Gets More Interesting…
• Leverage Probabilistic data structures and algorithms• Bloom Filters, Quotient filters, etc…
• Go Reactive• http://www.reactivemanifesto.org/• RxJava, Spring Reactor, etc…
| @PolymathicCoder
Miscellaneous
Write Code That Scales
Out
• Horizontal Scaling• On a distributed system (A cluster)• Adding more nodes
• Writing code to harness the full power of the cluster
| @PolymathicCoder
Horizontal Scaling
• A typical cluster consists of• A number of identical application server nodes behind
a load balancer
| @PolymathicCoder
Topology
• A typical cluster consists of• A number of identical application server nodes behind
a load balancerA number?
• It depends on how many you actually need and can afford
• Elastic Scaling / Auto-Scaling• The number of live nodes within the cluster shrinks and
grows depending on the load• New ones are provisioned or terminated as needed
| @PolymathicCoder
Topology
• A typical cluster consists of• A number of identical application server nodes behind
a load balancerIdentical?
• Application nodes are cloned off of image files (Ex. AWS Ec2 AMIs, etc...)
• Configuration Management tool (Chef, Puppet, Salt, etc...)
| @PolymathicCoder
Topology
• A typical cluster consists of• A number of identical application server nodes behind
a load balancer
Load balancer?• Load is evenly distributed across live nodes
according to some algorithm (Round-Robin typically)
| @PolymathicCoder
Topology
• Session data• Session Replication• Session Affinity / Sticky Session
• Requests from the same client are routed to the same node
• When the node dies, the session data dies with it• Shared Session / Distributed Session
• Session data is in a “centralized” location• Go Stateless
• No session data (Any node would do)
| @PolymathicCoder
Managing State
• Leverage Map/Reduce• “A programming model for processing large
data sets with a parallel, distributed algorithm on a cluster”
• Apache Hadoop
| @PolymathicCoder
Parallelism At The Cluster Level
• How to HTTPS?• End at load balancer• Wildcard SSL
• Distributed Lock Manager (DLM)• Synchronize access to shared resources
• (Google Chubby, Apache Zookeeper, etc…)• Distributed Transactions
• X/Open XA
| @PolymathicCoder
Miscellaneous
Deployment
• Multiple Environments• Development, Test, Stage, and Production• Automatic Configuration Management
• Practice Continuous Delivery• Leverage The Cloud
• IaaS, PaaS, SaaS, and NaaS
| @PolymathicCoder
Deployment
OvercomingThe Storage
I/OBottleneck
• The storage I/O is usually the most significant
| @PolymathicCoder
The Storage I/O Bottleneck
The Persistent
Datastore
• Relational of course!• Normalized schema guaranteeing data integrity• ACID Transactions• No biased towards specific access patterns• Flexible query language
• As datasets grow• Scale up (Buy beefier machines)• Database tuning / query optimization• Create materialized views• De-normalize• Etc…
| @PolymathicCoder
What Datastore to Use?
• No other choice but scaling out RDBMS• Master/Slave clusters• Sharding
• Failed big time!• RDBMS is designed to run on one machine• Eric Brewer’s CAP Theorem of distributed systems
• Pick 2 out of 3: Consistency, Availability, and Partition Tolerance
• The relational model is designed to favor CA, hence can never support P
| @PolymathicCoder
Mucho Data!
• A wide range of specialized datastores with the goal of addressing the challenges of the relational model
• “The whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for” –Eric Evans
• A wide variety• Key-Value Datastores• Columnar Datastores• Document Datastores• Graph Datastores
| @PolymathicCoder
NoSQL
• Within the application• Data is complex and accessed in many different ways• Why should we fit it into one storage model?
• Polyglot Persistence is about• Leveraging multiple data stores based on the specific
way the data is stored and accessed• For more info:
• Checkout my talk on YouTube from JAX Conf 2012• “The Rise of NoSQL and Polyglot Persistence”
• http://bit.ly/PCWtWi
| @PolymathicCoder
Polyglot Persistence
Caching
• A cache is typically a simple key-value data structure• Instead of incurring the overhead of data retrieval or
computation every time, you check the cache first• You can’t cache everything, caches can be configured to
use multiple algorithms depending on the use case (LRU, LFU, Bélády's Algorithm, etc...)
• Use aggressively!• What to cache?
• Frequently accessed data (Session data, feeds, etc…)• Results of intensive computations
| @PolymathicCoder
Caching
• Where to cache?• On disk
• File System: Slow and sequential access• DB: A bit better (Data is arranged in structures
designed for efficiant access, indexes, etc…)• Generally a terrible idea (SSDs make things a bit
better)• In-Memory: Fast and random access, but volatile• Something in between: Persistence caches (Redis,
etc…)• What type of cache?
• Local, Replicated, Distributed, and Clustered| @PolymathicCoder
Caching
• How to cache?• Most caches implement a very simple interface• Always attempt to get from cache first using a key
• If it is a hit, you saved yourself the overhead• If it is a miss, compute or read from the data store then
put in cache for subsequent gets• When you update you can evict stale data• You can set a TTL when you put
• Many other common operations...
| @PolymathicCoder
Caching
• Caching Query Results• Key: Hash of the query itself• How about parameterized queries?
• Key: Hash of the query itself + Hash of parameter values
• Method/Function Memoization• Key: Method name• How methods with parameters?
• Key: Hash of the method name + Hash of parameter values
• Caching Objects• Key: Identity of the object| @PolymathicCoder
Caching Patterns
• Time-series datasets (Ex. Real-time feed)• Most of the time pseudo/near real-time is enough• Use caching to throttle access to resources
• Cache query result with a t expiry• Fresh data is only read every t
| @PolymathicCoder
Caching Patterns
• Profile your code to assess what to cache, and whether you need to to begin with
• Stale state might bite you hard• Incoherence: Inconsistent copies of objects cached with
multiple keys• Stale nested aggregates
• Network overhead of misses might outweighs the performance gain of hits
• Consider writing/updating cache when writing/updating the persistence store
| @PolymathicCoder
Caching Gotchas
• EhCache• Memcahed• Oracle Coherence• Redis
• A persistence NoSQL datastore• Built-in data structures like sets and lists• Supports intelligent keys and namespaces
| @PolymathicCoder
Featured Solutions
OvercomingThe Network
I/OBottleneck
• The Network I/O is can bring you down as much
| @PolymathicCoder
The Network I/O
Bottleneck
Asynchronous
Processing
• Resource-intensive tasks cannot be handled practically during an HTTP session
• Synchronous processing is overused and not necessary most of the time
| @PolymathicCoder
Asynchronous Processing
• Pseudo-Asynchronous Processing• Flow
• Process data / operations in advance• User requests data or operation• Respond synchronously with pre-processed result
• Sometimes not possible (Dynamic content, etc...)
| @PolymathicCoder
Asynchronous Processing Patterns
• True Asynchronous Processing• Flow
• User request data or operation• Acknowledge
• Ex. A REST that return an “202 Accepted” HTTP status code
• Do Processing at your own convenience• Allow the user to check progress
• Optionally notify when processing is completed
| @PolymathicCoder
Asynchronous Processing Patterns
• Leverage Job/Work/Task Queues• JMS (Java Messaging Service) – JSR 914• AMQP (Advanced Message Queuing Protocol): RabbitMQ, ActiveMQ,
etc…• AWS SQS• Redis Lists• Etc…
• Task Scheduling• Jobs triggered periodically (Cron, Quartz, etc…)
• Batch Processing
| @PolymathicCoder
Techniques
Content Delivery
Network
• Static content• Binary (Video, Audio, etc…)• Web objects (HTML, JavaScript, CSS, etc…)
• Do NOT serve through your application server• Use a CDN
• “A large distributed system of servers deployed in multiple data centers across the internet”• Akamai• AWS CloudFront
| @PolymathicCoder
Content Delivery Network (CDN)
• Dirty Caches• script.js is a script file deployed on CDN• Multiple copies of script.js will be replicated across all
edge nodes of the CDN• Clients/browsers will their own copies of script.js locally• We update script.js• Since the new and old version have the same URI
• New clients will be served the old version by the CDN
• Old clients will continue to use the old version from their local cache
| @PolymathicCoder
CDN Gotchas
• Dirty Caches• What to do?
• Simply append version number to file names• script-v1.js, script-v2.js, etc…
• Force invalidation of all copies on edge nodes• Set HTTP caching headers properly
| @PolymathicCoder
CDN Gotchas
Domain Name
Service
• Do NOT rely on your free domain name registrar DNS• Use a scalable DNS solution
• AWS Route 53• DynECT• UltraDNS• Etc…
• Domain Sharding• Browsers limit the number of connections per host (Max of 6 usually)
• Creating multiple subdomains (CNAME entries) allow for more resources to be downloaded in parallel
• Watch out for: DNS lookup overhead, HTTPS cost, Browser’s Same-Origin Policy, etc…
| @PolymathicCoder
Domain Name Service (DNS)
Remoting
• In a SOA (Service Oriented Architecture)• RPC calls to multiple services• Data Exchange (Plain vs. Binary)
• SOAP / REST with XML or JSON• Google Protocol Buffers, Apache Thrift, Apache Avro,
etc…• Protocol
• JMS• HTTP• SPDY
| @PolymathicCoder
Remoting
QualifyingScalability
• Instrumentation: Bake it into the code early• Monitoring
• Health (Application / Infrastructure)• Key Performance Indicators (KPIs)
• Number of request handled, throughput, latency, Apdex Index, etc ...
• Logs• Testing
• Load/Stress testing
| @PolymathicCoder
Qualifying Scalability
Disaster Recovery
• Goal• Fault-tolerant system• Restore service and recover data ASAP in case of a
disaster• Be proactive
• Develop a Disaster Recovery Plan (DRP)• Practice and test your DRP by doing failure drills
| @PolymathicCoder
When Disaster Hits…
Scaling Teams
• Hiring• Always hire top talent
• You are as strong as your weakest link• Develop a process to bring people in
• Turnkey Hardware/Software Setup (Vagrant, etc...)• Arrange for proper access/accounts
• Develop a knowledge base (Architecture documentation, FAQs, etc...)
• Development Process• Be Agile• Refine in the spirit of Six Sigma
| @PolymathicCoder
Scaling Teams
• Team Structure• Small is good• Form ad-hoc teams from pools of Agile breeds
• Product Owners• Team Members
• Team Lead (Scrum Master)• Engineers• QAs
• Architecture Owners
• Give them ownership of their DevOps
| @PolymathicCoder
Scaling Teams
The Take-home
• The early-bird gets the worm• Design to scale from day one• Plan for capacity early
• Your needs determine how scalable “your scalable” needs to be• Do not over-engineer
• Do not bite more than you can chew• Building scalable system is process
• Commit to a road map around bottlenecks• Guided by planned business features
• Learn from others’ experiences (Twitter, Netflix, etc...) | @PolymathicCoder
The Take-home Message
Work smarter not harder…
| @PolymathicCoder
Take it slow… You’ll get there…
Questions?
Thanks for the attention!
Follow @PolymathicCoder
[email protected]://blog.polymathiccoder.com
http://speakerscore.com/jazoon-scalability