APPLICATION ARCHITECTURE FOR THE REST OF USPresented by
M N Islam Shihan
Introduction
Target Audience What is Architecture?
Architecture is the foundation of your application
Applications are not like Sky Scrappers Enterprise Vs Personal Architecture
Why look ahead in Architecture? Adaptability with Growth Maintainability Requirements never ends
Enterprise Architecture (cont…)
Security Responsiveness Extendibility Availability Load Management Distributed Computation Caching Scalability
Security
Security (cont…)
Think about Security first of all Network Security: Implement Firewall &
Reverse Proxy for your network SQL Injection: Never forget to escape
field values in your queries XSS (Cross Site Scripting): Never trust user
provided (or grabbed from third party data sources) data and display without sanitizing/escaping
CSRF (Cross Site Request Forgery): Never let your forms to be submitted from third party sites
Security (cont…)
DDOS (Distributed Daniel of Services): Enable real time monitoring of access to detect and prevent DDOS attacks
Session fixation: Implement session key regeneration for every request
Always hash your security tokens/cookies with new random salts per request/session basis (or in an interval)
Stay tuned and up-to-date with security news and releases of all of your used tools and technologies
Responsiveness
Responsiveness (cont…)
Web applications should be as responsive as Desktop Applications
Plan well and apply good use of JavaScript to achieve Responsiveness
Detect browsers and provide separate response/interface depending on detected browser type
Implement unobtrusive use of JavaScript Implement optimal use of Ajax Use Comet Programming instead of Polling Implement deferred/asynchronous processing of
large computations using Job Queue
Extendibility
Implement and use robust data access interface, so that they can be exposed easily via web services (like REST, SOAP, JSONP)
Use architectural patterns & best practices SOA (Service Oriented Architecture) MVC (Model View Controller)
Modular architecture with plug-ability Allow hooks and overrides through
Events
Availability
Availability (cont…)
Implement well planned Disaster Recovery policy Use version control for your sources Use RAID for your storage devices Keep hot standby fallback for each of your
primary data/content servers Perform periodical backup of your source
repository, files & data Implement periodical archiving of your old
data Provide mechanism to the users to switch between
current and archived data when possible
Load Management
Load Management (cont…)
Monitor and Benchmark your servers periodically and find pick usage time
Optimize to support at least 150% of pick time load
Use web servers with high I/O performance Introduce load balancer to distribute loads
among multiple application Servers Start with software (aka. reverse proxy) then
grow to use hardware load balancer only if necessary
Use CDNs to serve your static contents Use public CDNs to serve the open source
JavaScript or CSS files when possible
Caching
To Cache Or Not to Cache? Analyze the nature of content and response generated
by your application very well What to cache? Analyze and set proper expiry time Invalidate cache whenever content changes Partial caching will also bring you speed When caching is bad?
Understand various types of web caches Browser cache Proxy cache Gateway cache
Caching (cont…)
Implement server side caching Runtime in-memory cache
Per request: Global variables Shared: Memcached
Persistent Cache Per Server: File based, APC Shared: Db based, Redis
Optimizers and accelerators: eAccelerator, XCache Reverse proxy/gateway cache
Varnish cache
Distributed Computing
Scalability
What the heck is this? Scalability is the soul of enterprise
architecture Scalability pyramid
Scalability (cont…)
Vertical Scalability (scaling up)
Scalability (cont…)
Horizontal Scalability (scaling out)
Scalability (cont…)
Scalability
Scaling up (vertical) vs. Scaling out (horizontal)
Scalability
Database Scalability Vertical: Add resource to server as needed
In most cases produce single point of failure Horizontal: Distribute/replicate data
among multiple servers Cloud Services: Store your data to third
party data centers and pay with respect to your usage
Scalability (cont…)Scaling Database
Scaling options Master/Slave
Master for Write, Slaves for Read Cluster Computing
Single storage with multiple server node Table Partitioning
Large tables are split among partitions Federated Tables
Tables are shared among multiple servers Distributed Key Value Stores Distributed Object DB Database Sharding
Scalability (cont…)Database Sharding
Smaller databases are easier to manage
Smaller databases are faster
Database sharding can reduce costs
Need one or multiple well define shard functions
"Don't do it, if you don't need to!" (37signals.com)
"Shard early and often!" (startuplessonslearned.blogspot.com)
Scalability (cont…)Database Sharding
High-transaction database applications
Mixed workload database usage Frequent reads, including complex
queries and joins Write-intensive transactions (CRUD
statements, including INSERT, UPDATE, DELETE)
Contention for common tables and/or rows
General Business Reporting Typical "repeating segment" report
generation Some data analysis (mixed with
other workloads)
Identify all transaction-intensive tables in your schema.
Determine the transaction volume your database is currently handling (or is expected to handle).
Identify all common SQL statements (SELECT, INSERT, UPDATE, DELETE), and the volumes associated with each.
Develop an understanding of your "table hierarchy" contained in your schema; in other words the main parent-child relationships.
Determine the "key distribution" for transactions on high-volume tables, to determine if they are evenly spread or are concentrated in narrow ranges.
When appropriate? What to analyze?
Scalability (cont…)Database Sharding
Challenges Reliability
Automated backups Database Shard redundancy Cost-effective hardware redundancy Automated failover Disaster Recovery
Distributed queries Aggregation of statistics Queries that support comprehensive reports
Scalability (cont…)Database Sharding
Challenges (cont…) Avoidance of cross-shard joins Auto-increment key management Support for multiple Shard Schemes
Session-based sharding Transaction-based sharding Statement-based sharding
Determine the optimum method for sharding the data Shard by a primary key on a table Shard by the modulus of a key value Maintain a master shard index table
Scalability (cont…)Database Sharding
Example Bookstore schema showing how data is sharded
Tools
Application framework Load balancer with multiple application servers Continuous integration Automated Testing
TDD (Test Driven Development) BDD (Behavior Driven Development)
Monitoring Services Servers Error Logging Access Logging
Content Data Networks (CDN) FOSS
Think Ahead
Think Ahead (cont…)
Understand business model Analyze requirement in greatest detail Plan for extendibility Be agile, do incremental architecture Create/use frameworks SQL or NoSQL? Sharding or clustering or both? Cloud services?
Guidelines
Enrich your knowledge: Read, read & read. Read anything available : jokes to religions.
Follow patterns & best practices Mix technologies
Don’t let your tools/technologies limit your vision
Invent/customize technology if required Use FOSS
Don’t expect ready solutions Find the closest match Customize as needed
Guidelines (cont…)Database Optimization
Use established & proven solutions MySQL PostgreSQL MongoDB Redis Memchached CouchDB
Understand and utilize indexing & full-text search Use optimized DB structure & algorithms
Modified Preorder Tree Traversal (MPTT) Map Reduce
ORM or not?
Guidelines (cont…)Database Optimization
Optimize your queries One big query is faster than repetitive
smaller queries Never be lazy to write optimized queries
One Ring to Rule `em All Use Runtime In Memory Cache Filtering in-memory cached dataset is
much faster than executing a query in DB
Guidelines (cont…)One Ring to Rule `em All
Perform Selection, then Projection, then Join
A B C
1,000 records 1000,000 records 1000,000,000 records
A simple exampleWrite a standard SQL query to find all records with fields A.a1,
B.b1 and C.c1 from tables A (id, a1,a2, a3, …,aP), B (id, a_id, b1, b2, b3, …, bQ), and C(id, b_id, c1, c2, c3, …,cR) given that A.aX, B.bY and C.cZ will match ‘X’, ‘Y’ and ‘Z’ values respectively.
Assume all tables A, B, C has primary keys defined by id column and a_id and b_id are the foreign keys in B from A and in C from B respectively.
a_id
GuidelinesOne Ring to Rule `em All (cont…)
Solution 1SELECT A.a1, B.b1, C.c1
FROM A, B, C
WHERE A.id = B.a_id AND B.id = C.b_id
AND A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’
Why it Sucks?•Remembered the size of A, B and C tables?•Cross product of tables are always memory extensive, why?• A x B x C will have 1,000 x 1,000,000 x 1,000,000,000 records
with (P +1) + (Q +2) + (R +2) fields• Can you imagine the size of in-memory result set of joined
tables?• It will be HUGE
GuidelinesOne Ring to Rule `em All (cont…)
Solution 2SELECT A.a1, B.b1, C.c1
FROM A
INNER JOIN B ON A.id = B.a_id
INNER JOIN C ON B.id = C.b_id
WHERE A.aX = ‘X’ AND B.bY = ‘Y’ AND C.cZ = ‘Z’
Why it still Sucks?•A B C will produce (1,000 x 1,000,000) records to perform A B and then produce another (1,000 x 1,000,000,000) records to compute (A B) C and then it will filters the records defined by WHERE clause.•The number of fields, that is P+1 in A, Q+2 in B and R+2 in C will also contribute in memory consumption.•It is optimized but still be HUGE with respect to memory consumption and computation
GuidelinesOne Ring to Rule `em All (cont…)
Optimal SolutionSELECT A.a1, B.b1, C.c1
FROM (SELECT id, a1 FROM A WHERE aX = ‘X’) as A
INNER JOIN ( SELECT id, b1, a_id FROM B WHERE bY = ‘Y’) as B ON A.id = B.a_id
INNER JOIN ( SELECT id, c1, b_id FROM C WHERE cZ = ‘Z’) as C ON B.id = C.b_idWhy this solution out performs?
•Let’s keep the explanation as an exercise
Reference : Tools
Security Nmap: http://nmap.org/ Nikto: http://cirt.net/Nikto2 List of Tools: http://sectools.org/
Caching APC: http://php.net/manual/en/book.apc.php XCache: http://xcache.lighttpd.net/ eAccelerator: http://sourceforge.net/projects/eaccelerator/ Varnish Cache: https://www.varnish-cache.org/ MemCached: http://memcached.org/ Redis: http://redis.io/
Load Balancer HAProxy: http://haproxy.1wt.eu/ Pound: http://www.apsis.ch/pound/
Reference : Tools (cont…)
NoSQL MongoDB: http://www.mongodb.org/ CouchDB: http://couchdb.apache.org/ A complete list: http://nosql-database.org/
Distributed Computing GearMan: http://gearman.org/
Message Queue/Job Server RabitMQ: http://www.rabbitmq.com/ ActiveMQ: http://activemq.apache.org/
Monitoring Nagios: http://www.nagios.org/
Testing Selenium: http://seleniumhq.org/ Cucumber: http://cukes.info/ Watir: http://watir.com/ PhpUnit: http://www.phpunit.de/manual/3.7/en/
MPTT Shameless Promotion: https://github.com/mnishihan/phpMptt
Reference : Articles
Caching http://www.mnot.net/cache_docs/ http://bit.ly/9cTJfA
Load Balancing http://en.wikipedia.org/wiki/Load_balancing_%28computing%29 http://1wt.eu/articles/2006_lb/index.html
Scalability & Architecture http://www.diranieh.com/DistributedDesign_1/Scalability.htm http://www.infoq.com/presentations/Facebook-Software-Stack http://99designs.com/tech-blog/blog/2012/01/30/infrastructure-at-99designs/ http://bit.ly/16cKu
Database Sharding http://www.codefutures.com/database-sharding/ http://bit.ly/Y3b3J http://www.startuplessonslearned.com/2009/01/sharding-for-startups.html
CDN http://bit.ly/sMRyxC
MPTT http://www.sitepoint.com/hierarchical-data-database/
Join phpXperts [http://bit.ly/phpxperts]Follow me on twitter [http://twitter.com/mnishihan]Subscribe in facebook [http://fb.me/mnishihan]
Thank You
Questions???
I will be glad to answer
Top Related