Running applications in a production environment
Nikola Krgović
https://joind.in/talk/64924
How do most web applications start
• A CMS (Wordpress, Drupal, etc)• A small custom made site• Using Rapid development tools (Frameworks&ORMs)• Agile development : Minimal viable product• Deployed to a single server
Once the real traffic arrives
• Need for performance• Price constraints force horizontal scalability• High Availability becomes a necessity
Changes in methodology
• Agile is embraced.• 12-Factor App• Continuous Delivery and testing• Continuous Deployment• DevOps
Continuous deployment• Continuous delivery (CD or CDE) is a software engineering approach in
which teams produce software in short cycles, ensuring that the software can be reliably released at any time and, when releasing the software, doing
so manually.• Continuous deployment (CD) is a software engineering approach in which
software functionalities are delivered frequently through automated deployments.
Continuous deployment
• Creates a need for more complex tools• Mandatory Automated testing• Both unit tests and integration tests are necessary
Continuous deployment
DevOps• DevOps is a software development methodology that combines software development
(Dev) with information technology operations (Ops). The goal of DevOps is to shorten the systems development life cycle while also delivering features, fixes, and updates frequently in close alignment with business objectives.The DevOps approach is to
include automation and event monitoring at all steps of the software build
• DevOps is a methodology - not a job title. :)
DevOps
DevOpsDevOps practices change the life of a developer :
• Configuration management tools to create environment• Deployments are automated : No manual “touch-ups” on the server• No direct access to servers. Code is on shared storage, deployed trough
a “jumpbox” or immutable inside a container• Logs are centralised, and available trough a dedicated app - usually the
ELK stack is used : You need to master RegExp• Application performance monitoring becomes a regular practice
DevOpsConfiguration Management
• Configuration Deployment tools like Ansible guarantee all environments are setup the same
• Configuration management tools, like Puppet use agents, which add assurance that the environment will remain the same throughout use
• Very little effect on the developers, other then the guarantee that the system will be deployed and maintained in a consistent manner
DevOps : Monitoring
DevOps : APM
DevOps : Kibana
DevOps : Logs
Development Environment
12-Factor App :
X. Dev/prod parityKeep development, staging, and production as similar as possible
Development EnvironmentTypical :
• Developers machine (Virtualbox+Vagrant / MiniKube / OKD*)• Code / CI (GitLab with Test)• Test Systems (“Beta”)• Staging system• Production
*Kubernetes system previously known as OpenShift Origin
Development Environment
Development Environment
High Availability
• Highly available systems have no single point of failure• Well designed HA systems don’t have redundant and “hot-
standby” components : design is “active-active”• Well designed apps can scale horizontally
High AvailabilityTypical Components
• Load Balancers• Content Delivery Network and Object Storage• Application Servers• Relational Database Management System• Key-Value storage• Queue• Document Storage / Object Storage / NoSQL• Full Test Search• Shared Storage
High Availability System
Load BalancersNginX or Haproxy
• Distributes connections to application servers• Checks application severs for health• Terminates TLS connections• Does cookie manipulation• Redirecting if needed• Web Application Firewall
Load BalancersNginX or Haproxy
• proxy_set_header x-real-ip $remote_addr• proxy_set_header x-forwarded-for $proxy_add_x_forwarded_for• proxy_set_header x-forwarded-proto $scheme
CDN and Object Storage• Object storage uses an API (usually S3) to store data.• Usually used as-a-service , but can be hosted on-prem.• Simple and easy to use from concurrent locations
• CDN’s offer faster loading times for data• Should be used for all static assets (images, css, js)• Served of a different, cookie-less domain• Require versioning, due to long caching times• When used as-a-service offer a simple way to geo-distribute data and
significantly speed up loading times.
Application Servers• Application servers must be stateless• Applications can be stateful with shared session storage • Deploy is done via automation• Non-container deployments often use shared storage• If using interpreted systems, like PHP, you need to flush cache
opcache_reset()
Application Servers• Unix privileges are not an enemy
• SE Linux security contexts and Mandatory Access Control are your friends too
httpd_sys_content_t httpd_sys_rw_content_t
Application Servers
• A pool of servers is ~100X more powerful then your machine• A pool of servers will have ~10,000X visitors of your machine
• Memory is a very critical resource. Talk about it with Ops!
Key-Value storage
• Redis is the default choice, use memcached only if you must. • Redis does have high availability options• Almost never disk persistent. Disk is used for cache warmup.• Can be deployed shared or per-instance
• Shared Redis is needed if servers are stateful, for session storage• Per-instance Redis is more performant, but complicates cache invalidation
Queue
ZeroMQ, RabbitMQ, AWS SNS
• Highly available by design• Centralized and scalable• Provide a simple method of asynchronously processing messages.• Provides a built-in mechanism for retrying• Should be used instead of in-database queues
Full-Text Search
ElasticSearch or Sphinx
• Usually used in read-only fashion• ElasticSearch has high availability clustering• Sphinx can be made HA with HAProxy• Loading data into FTS needs a separate process
Document StorageMongoDB
• Not a Relational database• Fully ACID compliant• Great for storing object, poor with relations and “join”-like queries• Has built-in high availability using quorum, initial change is just the connection string
mongodb://s1.example.net:27017,s2.example.net:27017,s3.example.net:27017/
MongoDB• Not a Relational database• Poor performance relations and “join”-like queries• Queries that require object manipulation can be slow• It is advisable to use readPerf() and send slow queries to
secondary instances
Relational DatabasesMySQL or PostreSQL
• Used to store relational data• Always design using normal forms (1NF, 2NF, 3NF, BC)• Usually has asynchronous replication• In-app logic usually scales better then in-db, but…
Relational DatabasesIndexing
• Primary keys are a must• Covering index vs Row read• An index too many• Master vs Slave index
Relational DatabasesPrivilege Separation
• GRANT SELECT,INSERT,UPDATE,DELETE, CREATE TEMPORARY TABLE ON ‘schema’.* to
‘application_user’@’10.%’;
• Forget about migrations from code
Relational Databases
Relational DatabasesAsynchronous replication
• Assume slave will always have ~30s replication lag• High availability can provide connectivity but can’t do a read-
write split • Automated solutions exist (ProxySQL, mysqlnd_ms) but still
require hinting for some cases like SELECT after INSERT
Relational Databases
Relational DatabasesORMs as a disaster in production
• ORM can be viewed as a rapid prototyping tool, but that’s it• ORM’s can slow down JOIN’s by orders of magnitude• At very small cases with ~100,00 rows you get:
Bulk inserts were tested at 2x the speed of ORMJoins sometimes go over 10x faster then ORM
Relational DatabasesOptimizing in production
• Work with DBA’s (or OPS) on indexing• Explain is your friend• Temporary tables can speed things up massively
Conclusions• Keep the stack as small as possible• Use the right tool for the right job• Don’t use multiple tools for the same job• Always consider that you’ll have millions of users• When in doubt scale horizontally
Running applications in a production environment
Questions…?
Top Related