How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views....

27
Rae Yip Sony Gaikai SCaLE 13x How To Build A Better SRE

Transcript of How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views....

Page 1: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Rae YipSony GaikaiSCaLE 13x

How To Build A Better SRE

Page 2: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Gaikai – Who we are

● Subsidiary of Sony● Developed PS4 Remote Play● Operates Playstation Now

● Disclaimer: This talk represents my own views

Page 3: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Agenda

● Introduction● Toolbox● Concepts● Application Stacks● Other Abilities ● Resources● Q & A

Page 4: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

What's an SRE?

● Availability, Scalability, and Security● Managing Deployment and Changes● Troubleshooting● Influencing Architecture and Implementation

Page 5: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Related Jobs

● Sysadmin and Webadmin– Significant overlap of skills

– Different focus

● Release / Integration Engineer– Builds, packaging, integration

● Software Engineer / Developer– DevOps ~ SRE?

Page 6: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Why SRE?

● Traditional 3-tier model may not be best fit● For people who like challenges● Cross-disciplinary exposure● Role tends to reward excellence

Page 7: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox - Languages

● Languages– C / C++

– Java

– Python

– Javascript, Ruby, etc.

● Fluency– Reading & Writing

– Running & Debugging

Page 8: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Unix (1)

● Commands– ls, find, man

– df, du

– ping, traceroute, mtr

– ssh

– curl, wget

– rpm, dpkg, ldd

– svn, git

● Scripting– grep, cut, tr

– awk, sed, jq

– xargs

– bash

– cron

Page 9: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Unix (2)

● System Visibility– vmstat, iostat, dstat, top

– netstat, lsof

– strace, ltrace

– tcpdump

● Many morehttp://www.brendangregg.com/USEmethod/use-linux.html

Page 10: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Unix (3)

● Debuggers– gdb

– jstack

– pdb

● Profilers– oprofile

– dtrace

– ftrace

– perf events

– Poor Man's Profiler

Page 11: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Unix (4)

● Kernel– dmesg

– modinfo, lsmod

– modprobe, insmod

● Hardware– lspci

– lshw

– dmidecode

Page 12: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Network (1)

● Host Networking– ip vs. ifconfig

– Static routing

– iptables● Firewall & NAT

– VIP failover

– Bonding

Page 13: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Network (2)

● OSI Stack● TCP vs. UDP● ICMP, PMTUD● SCTP● IPv6

Page 14: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Network (3)

● Application Protocols– HTTP(S)

– SPDY, HTTP 2.0

– DNS

– NTP

– DHCP

– SNMP

● Routing Protocols– BGP

– OSPF

– MPLS

Page 15: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Toolbox – Network (4)

● Routers & Switches– Managed vs.

Unmanaged

– Cisco

– Juniper

● Load Balancers– F5 Big-IP

– Netscaler

– Barracuda

Page 16: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Concepts – Operating Systems

● Scheduler● Drivers● User vs. Kernel space● Virtualization

Page 17: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Concepts – Queueing Theory

● Little's Law– Load = Arrival Rate x Service Time

● Universal Scalability Law (Gunther 1993)http://www.perfdynamics.com/Manifesto/USLscalability.html

Page 18: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Concepts – Cloud Computing

● What is it?● How is it different?● Difference of scale● Cloud vs. Enterprise

Page 19: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Concepts – Distributed Systems (1)

● Fallacies– http://en.wikipedia.org/wiki/Fallacies_of_distributed

_computing

● Not just theoretical– https://aphyr.com/posts/288-the-network-is-reliable

● Murphy's Law (De Morgan 1866?)

Page 20: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Concepts – Distributed Systems (2)

● Network is not reliable

– Two General's Problem (Akkoyunlu, Ekanadham, Huber 1975)

● Network is not secure

– Byzantine Generals' problem (Pease, Shostak, Lamport 1980)

● CAP principle (Brewer 1999)

Page 21: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Application Stacks (1)

● Apache, nginx● MySQL / MariaDB, PostgreSQL● Hadoop, HBase● Zookeeper● ElasticSearch, Logstash, Kibana● Kafka, RabbitMQ

Page 22: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Other Abilities & Traits (1)

● Communication– Concise writing

– Ability to avoid flame wars

– Interviewing

● Troubleshooting– USE Method

– Scientific Method

– How To Solve It (Polya)

Page 23: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Other Abilities & Traits (2)

● Dispatching– Ability to identify the party with the domain

expertise to solve problem

● Ownership– Ability to see problem through from end to end

● Statistics– Ability to use numbers to make your case

Page 24: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Additional Resources (1)

● High Scalability

http://highscalability.com● Jepsen articles

https://aphyr.com/tags/jepsen● Brendan Gregg's blog

http://www.brendangregg.com● ServerFault

http://serverfault.com

Page 25: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Additional Resources (2)

● Conferences– SCaLE

– Surge

– Velocity

– USENIX (LISA, SREcon)

● Academic papers– arxiv.org

– IEEE ACM

Page 26: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Anybody can be an SRE!?

● Not everyone should be an SRE, but good SREs can come from anywhere.

● Diversity of talent, breadth and depth of knowledge is key to solving problems of unanticipated nature

Page 27: How To Build A Better SREryip/building-sre.pdf · Disclaimer: This talk represents my own views. Agenda ... Influencing Architecture and Implementation . Related Jobs Sysadmin and

Thank You!

● Questions?● Swag