When Tools Attack: IT Infrastructure at Playground Games

28
Chris Makin IT Infrastructure at Playground Games

description

An overview of how Playground Games used Perforce to help solve issues with tools

Transcript of When Tools Attack: IT Infrastructure at Playground Games

Page 1: When Tools Attack: IT Infrastructure at Playground Games

1

Chris Makin IT Infrastructure at Playground Games

Page 2: When Tools Attack: IT Infrastructure at Playground Games

2

Chris Makin is a battle-scarred veteran of Perforce server planning, implementation and administration. With 11 years of IT service to the video games industry he proudly serves at Playground Games as IT Infrastructure Administrator. If not elbow deep in servers he can be found trying to sample every craft beer under the sun.

Page 3: When Tools Attack: IT Infrastructure at Playground Games

3

•  Sharing in real world examples of custom tools with hearts of gold and hands of destruction!

•  The Impacts & issues caused •  Fixes, resolutions & preventative actions •  Tips for server setup & tools testing

Page 4: When Tools Attack: IT Infrastructure at Playground Games

4

•  Founded in 2009 with 16 staff •  20+ years experience in game development •  Turn10 & Microsoft Studios October 2012 September 30th 2014

Page 5: When Tools Attack: IT Infrastructure at Playground Games

5

•  Over 150 developers at peak •  Outsource teams across the globe •  17 Perforce Server Instances

–  3 core & 4 supporting P4Ds –  Further 10 replicas for workload balancing, DR & HA –  Linux & Windows, VMWare & Bare metal –  Nimble, EqualLogic & DAS

Page 6: When Tools Attack: IT Infrastructure at Playground Games

6

•  Single P4D server –  4TB total size

•  100 developers •  12 build servers •  13 solid hours lock time

per day for automated systems

•  3 Core P4D servers –  10TB total size –  1TB at peak change

•  24 build servers •  Builds over 150GB •  485,000 ops completed •  340,000 automated •  Lock contention – gone

Page 7: When Tools Attack: IT Infrastructure at Playground Games

7

•  Deadliest Catch –  Trawling the depths.

•  Tor‘s Hammer –  An internal cyber attack!

•  Skynet –  Ignore it for too long and it’s taken over the world.

Page 8: When Tools Attack: IT Infrastructure at Playground Games

8

Trawling the depths

Page 9: When Tools Attack: IT Infrastructure at Playground Games

9

•  Tool built to create a depot heat map •  “p4 files @=clnumber” •  Started at CL 1

–  Worked its way upwards. –  9 streams

•  High latency connection

Page 10: When Tools Attack: IT Infrastructure at Playground Games

10

•  Unresponsive server •  Commands queue •  Human element

–  F5, F5, F5, F5… •  Database lock contention

Page 11: When Tools Attack: IT Infrastructure at Playground Games

11

•  Lower tool polling frequency •  Lower thread count •  Forward commands to replica, on or offsite •  Trigger or P4broker limiting number of concurrent

operations per user •  Upgrade to 2013.3 or higher for lockless reads

Page 12: When Tools Attack: IT Infrastructure at Playground Games

12

Internal cyber attack!

Page 13: When Tools Attack: IT Infrastructure at Playground Games

13

•  Level editor with Perforce integration –  Have my files been updated? –  Queries workspace #have against #head

•  Query was being carried out on each file individually –  25,000 files –  150 developers –  3,750,000 queries –  Every second

Page 14: When Tools Attack: IT Infrastructure at Playground Games

14

•  You are now under DDoS attack! •  Perforce server stops responding to all requests

–  P4Auth stops responding •  TCP flood •  OS level/network stack issue

–  TCP/IP port exhaustion

Page 15: When Tools Attack: IT Infrastructure at Playground Games

15

•  Hang, draw and quarter tools programmer •  Local firewall

–  Flood protection rules •  Tune network stack

–  Max available ports –  Min keep alive time

Page 16: When Tools Attack: IT Infrastructure at Playground Games

16

Ignore it for too long and it’s taken over the world

Page 17: When Tools Attack: IT Infrastructure at Playground Games

17

•  World editor •  Designed so every file is self descriptive with a

UID –  \\game\level1\walls\wall1_walls_level1_123456789.png

•  In the backend this turns into –  D:\p4depot\game\level1\walls

\wall1_walls_level1_123456789.png,d\1.12345.gz

Page 18: When Tools Attack: IT Infrastructure at Playground Games

18

•  In practice –  \\game1\mainline\data\level_data\level1\objects\textures

\walls\wall1_walls_textures_objects_level1_level_data_data_mainline_1234567890.png

–  \\game1\mainline\data\level_data\level1\objects\textures\walls\wall1_walls_textures_objects_level1_level_data_data_mainline_mainline_data_level_data_level1_objects_textures_walls_wall1_1234567890.png

•  199 chars compared to original 54

Page 19: When Tools Attack: IT Infrastructure at Playground Games

19

•  Project was heavily branched & continuously integrated with > 100,000 files

•  Integrations took longer •  Exponential metadata growth – GB per day •  Higher RAM, swap & CPU utilization •  Windows OS & proxy path length issues

Page 20: When Tools Attack: IT Infrastructure at Playground Games

20

•  Send a cyborg back in time to stop the tools change submit

•  Trigger/P4Broker rule inspecting file & path length for irregularities/max allowed length

Page 21: When Tools Attack: IT Infrastructure at Playground Games

21

Page 22: When Tools Attack: IT Infrastructure at Playground Games

22

•  Perforce Support •  P4D > 2013.3

–  Lockless reads! •  Replica & Edge servers

–  Offload locks, CPU & I/O intensive tools and workloads •  P4Broker & Triggers

–  Don’t like a command? Block or re-direct it! •  “Side-track” server instance

Page 23: When Tools Attack: IT Infrastructure at Playground Games

23

Page 24: When Tools Attack: IT Infrastructure at Playground Games

24

•  Metadata replica –  Offline checkpoints, additional replicas, no live

interruption •  Enable process monitoring •  Monitor server •  Pay attention to your type map

Page 25: When Tools Attack: IT Infrastructure at Playground Games

25

•  Every tool has an impact – TEST! •  Test against real data

–  Metadata & full replicas •  Set a high level of logging

–  Utilize Perforce Server Log Analyzer •  Monitor system utilization

–  CPU, RAM, disk I/O…

Page 26: When Tools Attack: IT Infrastructure at Playground Games

26

Chris Makin [email protected]

Page 27: When Tools Attack: IT Infrastructure at Playground Games

27

•  http://answers.perforce.com/articles/KB_Article/Setting-Up-a-Side-track-Server

•  http://answers.perforce.com/articles/KB_Article/How-to-Monitor-a-Swamped-Perforce-Server

•  http://answers.perforce.com/articles/KB_Article/Installing-P4Broker-on-Windows-and-Unix-systems

•  http://answers.perforce.com/articles/KB_Article/Using-P4Broker-With-Replica-Servers

•  https://kb.perforce.com/psla/ •  http://www.perforce.com/blog

Page 28: When Tools Attack: IT Infrastructure at Playground Games

28