Ansible at Scale IL - Ansible at scale.pdf · 10 different VPCs with different access levels VPCs...
Transcript of Ansible at Scale IL - Ansible at scale.pdf · 10 different VPCs with different access levels VPCs...
Ansible at Scale
Ansible Israel, May 9, 2016
David Melamed
Senior Research Engineer, CTO Office, CloudLock
[email protected] @dvdmelamed
Who is this guy?
4 B
Where is he working?
Founded: 2011
Corporate Headquarters: Waltham, Mass. (U.S.A.)
R&D Headquarters: Tel Aviv
Employees: 140 (30 in TLV)
Trusted by major brands:
157K APPS
10 MUSERS ACTIVITIES
01 Ansible main notions
What is Ansible?
● Open-source configuration automation tool● Written in Python and easily extensible● Agent less (only requires SSH / WinRM)● Idempotent modules● Ad hoc task execution● Reusable list of tasks● Code deployment
Inventory
WEB SERVERS DAEMON SERVERS FILE SERVERS
COMPUTING CLUSTER
[webservers]192.168.1.12192.168.1.13192.168.1.19
[daemonservers]192.168.1.34192.168.4.24
[vpc]webserversdaemonservers
Static inventory
VPC
Task, play & playbook
- name: check server is aliveaction: ping
- name: update app configurationaction: copy src=myapp.conf dest=/etc/myapp/prod.conf
...
task
play
playbook
Role
- tasks main.yml
- handlersmain.yml
- templatestemplate.conf.j2
- filesfile1.txt
- varsmain.yml
Vault
● Put all secrets in one place● Store secrets into git
02 Our requirements
CloudLock requirements
● Multiple environments (AIO vs. VPC, AWS vs. AppEngine)● Multiple environment types (local / stage / prod)● 10 different VPCs with different access levels● VPCs with ~ 100 machines of several types● Multiple small repos (python package) with dependencies● Zero-downtime deployment as much as possible
Multiple stacks & environments
Web server(Angular app)
My laptop(OSX)
Your laptop(Ubuntu)
Multi-tier env.in AWS
AIOin AWS
Multi-tier env.in AWS
LOCAL STAGE PROD
API server(Flask app)
Database(PostgreSQL or RDS)
Cache server(Redis or ElastiCache)
Message Queue(RabbitMQ)
PRE-PROD
Multi-tier env.in AWS
03 Ansible profiling
Profiling Ansible (1)
● Install callback plugin https://github.com/jlafon/ansible-profile
● Other interesting plugins:○ Human-readable plugin○ Ansible-report
Profiling Ansible (2)PLAY [Deploy | Ensure database and user] *************************** Thursday 15 October 2015 09:51:01 +0000 (0:00:01.786) 0:00:12.318 ****** ===============================================================================
TASK: [storage/postgresql-database | Create | Ensure database from database variable] *** Thursday 15 October 2015 09:51:01 +0000 (0:00:00.011) 0:00:12.329 ****** ok: [sandbox]
TASK: [storage/postgresql-database | Create | Ensure database user from database.user variable] *** Thursday 15 October 2015 09:51:01 +0000 (0:00:00.163) 0:00:12.493 ****** ok: [sandbox]
TASK: [storage/pgbouncer | Start pgBouncer] *********************************** Thursday 15 October 2015 09:51:09 +0000 (0:00:00.242) 0:00:20.782 ****** ok: [sandbox]
TASK: [storage/pgbouncer | Bump file descriptor limits] *********************** Thursday 15 October 2015 09:51:09 +0000 (0:00:00.177) 0:00:20.960 ****** changed: [sandbox] => (item=hard)changed: [sandbox] => (item=soft)
...
PLAY RECAP ******************************************************************** module1 | Install | Ensure modules ------------------------------------- 13.14smodule2 | Install pgBouncer --------------------------------------------- 7.51smodule3 | Install | Clean/uninstall modules ----------------------------- 6.85smodule4 | Install | Ensure core installed ------------------------------ 4.66s...Thursday 15 October 2015 09:52:49 +0000 (0:00:00.023) 0:02:00.236 ****** =============================================================================== sandbox : ok=142 changed=82 unreachable=0 failed=0
04 Tips for scale support
(faster & easier to maintain)
Factors impacting ansible speed
● SSH connection● Facts gathering● Tasks performed serially● Redundant tasks
Improving SSH speed
● Persistent connection (default on for SSH)○ ControlMaster=auto○ ControlPersist=60s
● SSH pipelining (1 connection per task)○ Requires disabling requiretty
Ansible configuration
● Commit your ansible.cfg● Control facts gathering (gathering)
○ implicit (default) - always discover the facts○ explicit - use facts cache, not used unless defined in play○ smart - use facts cache, discover facts for new hosts
● Control the number of parallel processes (forks)○ default is 5○ we use 25
● SSH args / SSH pipelining
Inventory
● Make your ansible code environment agnostic● Machine grouping by environment or by “role” type● Hierarchical inventory● Vault per environment● Dynamic inventory for better cloud support● Use dedicated machine to deploy (ansible-workstation)
CloudLock static inventory overview
inventory/ | |---- environments | |----- allinone |----- beta |----- demo |----- dev1 |----- dev2 |---- qa1 |----- qa2 |---- group_vars | |----- allinone/ |----- beta/ |----- demo/ |----- dev1/ |----- dev2/ |---- qa1/ |----- qa2/
+ use of route53 for internal DNS
EC2 dynamic inventory
● Python script using boto● List of instances + hostvars● Use instance names or IPs● Groups by instance tags, vpc, …● List cached
"ec2": [ "52….", "52….", "52….", ], "tag_Environment_prod": [ "52….", "52…..", "54….." ], "tag_Name_prod_bastion": [ "54…." ], "tag_Name_Report_Decryptor": [ "52….." ], "tag_Name_devpi": [ "52….." ]
Playbooks
● Tasks executed synchronously○ Segment roles/groups to leverage parallel forks
● Use tags to add modularity (i.e. config, deploy…)● Name each task● Limit conditional execution in roles, put them in the
playbooks instead
Tasks & Roles
● Make your role generic and simple● Role should be decoupled from inventory● Keep your configuration separate● Tasks should be idempotent● Use “include” for sub-roles● Try to avoid redundant tasks (use AMI)● Share handlers with a global role ● Avoid using command and shell and use appropriate modules instead
- roles/
ci/
jenkins/
jobs/
monitor/
cloudwatch/
nagios/
platform/
base/
component-a/
component-b/
events/
setup/
teardown/
system/
web/
Vault
● Encrypt only what is necessary● No way to merge 2 encrypted files● Several tools to improve vault management
○ https://github.com/building5/ansible-vault-tools○ https://gist.github.com/benzado/7bf5aa15e15d2d0d0380
ansible-playbook vs ansible-pull
● Regular mode: connect to server and deploy● “Pull” mode: pull from repo on remote and execute● Syntax: ansible-pull -U git://github.com/REPO.git -d DEST_DIR● Example of cron install using ansible
https://github.com/ansible/ansible-examples/blob/master/language_features/ansible_pull.yml
CI for Ansible
● Test locally with vagrant / docker● PR reviews (issue with vault changes)● Jenkins job deploying to AIO + github hook
● Coming soon: unit tests (ansible-kitchen)
Ansible 1.9 vs. Ansible 2.0
● Some breaking changes● A lot of new cloud modules (i.e. ECS, VPC)
Results
● Before: deployment to VPC took several hours● After: ~ 20 min for a full deployment
More about Ansible
● Awesome Ansible: https://github.com/jdauphant/awesome-ansible
● Ansible for DevOpshttps://leanpub.com/ansible-for-devops
Cloudlock is looking for talents
Questions/feedback