Section: Architecture - HighLoad++ · 2013-10-27 · Monitoring: Zabbix API, Pinba + API, Graylog...

Section: Architecture

About Sockets and Millions of Packets per

Second on One CPU Core

Alexandr Krizhanovsky (NatSys Lab)

Abstracts

select(2) and poll(2) are the slow ones, epoll(7) is the fast one. To solve the task containing 10k queries,

we use streams, multiplexing and multicore servers. But what if these 10k queries come at the same

time (for example, in case of DDoS)? How fast will be the server to establish all these connections and

process the queries? How many packages per second do the usual sockets hold on one processor core?

The report will inform:

* how does the packet in Linux get from the network adapter to the process TCP-socket;

* on the way of establishing new connections, multiplexing and reading from sockets;

* on the methods to speed up the application server (the optimization for accept(), MSI-X and RPS/RFS,

GRO);

* that this is insufficient – achieving higher speed is possible: how does the Oracle’s Reliable Datagram

Sockets operate?

* That all can be done even faster: transition to fully synchronous sockets (not to be confused with the

blocked input-output).

Target Audience

C/C++ Linux system developers.

Alexandr Krizhanovsky

Founder and System Architect of NatSys Lab, an expert in high-performance computing and UNIX-

systems. Before NatSys Lab, he worked as Senior Software Developer at IBM, Yandex and Parallels.

Speaker’s Company Profile

NatSys Lab is a consulting company specializing in developing high-performance, failsafe and

distribution systems. Since recently, NatSys Lab is also developing its own Deep Packet Inspection

solution allowing analyzing and modifying gigabits of traffic using even the simplest equipment.

Scaling compiled applications

Joe Damato

Abstracts

When we hear the word scaling, we typically think about scaling backend services, databases, and even

data centers. But, what happens when you have to scale a compiled application out to a heterogenous

environment with thousands of nodes that you can't control? How do you debug an application running

on a machine you don't have access to that uses different versions of libraries with custom patches that

you don't have?

It might surprise you to learn that bugs in libpcap prevent you from sniffing packets efficiently from a

network interface, but it doesn't stop there. NIC drivers are also rife with bugs that disappear and

reappear in different kernel versions on the same major operating system release. Ethernet bonding

also has its own set of interesting bugs when using NICs that support hardware VLAN tagging.

I've run into interesting problems with everything from compilers, kernels, linkers, and loaders, all the

way up to autotools bugs, bugs in automation systems, even sharp edges in the packaging systems

themselves.

Engineers have a well defined toolset they can reach for when building and scaling services. This talk

aims to define that toolset for scaling compiled applications while offering some best practices, caveats

to watch out for, and strategies for dealing with failure.

Target Audience

Developers working with compiled applications.

Joe Damato

Low-level hacker


None

Skype's Web Stack

Pavel Brylov (Skype)

Abstracts

In this report, I will tell you how we make Web in Skype, on practices, technologies and processes

allowing us to dynamically scale our web applications. I’ll tell you about our “cloud” PHP framework, on

SOA, on the approach to design, testing and deployment on hundreds of computers.

The key points of the report are as follows:

- General architecture, its advantages over our previous solutions, basic principles. SOA and REST.

- Todays PHP’s code flavor is almost equal to Java. Below is the brief summary of interesting subjects we

use.

- CI: development cycle, packaging, testing, (auto)deployment.

- Brief description on the aspects of working with the database: an approach to releases, versioning.

- Integration tests.

- Chef – the way we use it, nonstandard solutions for the gradual rollout.

Target Audience

PHP programmers, architects.

Pavel Brylov

Seven-year working experience at the Kazakhstan Stock Exchange as Head of the Internet Project

Department; then moved to Tallinn, where he became Software Development Engineer at

Skype/Microsoft.


Skype needs not to be introduced :)

Mature Optimization

Carlos Bueno (Facebook)

Abstracts

Too often, performance work is treated as a one-off activity. This talk describes the design and

implementation of a layered measurement system for reliably detecting and fixing performance

problems. It includes an overview of what the system should look like, pitfalls to avoid, and many tales

of triumph and clownish folly. If you work in operations for Internet or web systems or care about

performance this talk is for you.

Define the Problem: Optimization relies almost entirely on the quality of the measurements you make.

Before you can optimize anything you need a way to measure what you are optimizing. Otherwise you

are just shooting in the dark. But before you can measure you need a clear, explicit statement of the

problem you are trying to solve. Otherwise, in a very real sense, you don’t know what you are doing.

Systems are Cyclical: When a major component of your system is the entire world and the people in it,

you can’t really put a copy on the bench. On the other hand, getting live data and iterating on what you

learn has never been easier. This new (old) programming model has important consequences. To

understand them, it’s possible we should rely less on computer science and more on operations

research and process control.

Raw Data: A good measurement regime is built up in layers. Throwing away your raw data and storing

only some aggregate metrics to save disk space is a common mistake. Collect measurements, not

aggregate metrics.

Check Your Yardsticks: Measurement software is just as likely to have bugs as anything else we write,

and we should take more care than usual. A bug in user-facing code results in a bad experience. A bug in

measurement code results in bad decisions.

Unusually Effective Debugging: Performance optimization in the small is a kind of debugging. You're

finding something specific to fix, or fixing something specific you've found. Slowness is a bug. It's an

unexpected behavior of the system and you want to remove it while disturbing as little as possible.

Target Audience

All developers of highload projects interested in high performance.

Carlos Bueno

Performance Engineer at Facebook


Facebook

Distributing Systems in Scala with Finagle

Julio Capote (Twitter)

Abstracts

Twitter's various backend services work in concert to serve hundreds of thousands of requests per

second to millions of concurrent users. This is made possible by Finagle, an open source, extensible RPC

system for the JVM built on top of Netty.

By providing both a client and server framework with features like functional concurrency primitives,

detailed metrics, failover/retry, load balancing and connection pooling, building out complex yet robust

asynchronous systems becomes straightforward and reasonable.

This talk will focus on how Twitter builds its high performance concurrent network services with Finagle.

Attendees will be given an overview of functional programming concepts and the core concurrency

abstractions introduced by the framework, so no knowledge of Scala is required.

The talk will conclude with a tour of a fully working Finagle service and client giving attendees the

necessary tools to build backend systems at massive scale.

Target audience

Scala developers and other specialists.

Julio Capote

API team Scala writer at Twitter.


Twitter

Managing a Computing Cluster of DSP

processors in Erlang

Ilya Shcherbak (Eltex)

Abstracts

The world will not be the same. There exist many computers - both “smart” and not so “smart”, both

expensive ones and cheap ones. How to control them and stay sane? Probably, there exists some

mechanism for this. The report will tell why the monolithic architecture won’t do anymore and what to

do with it.

Below are the key abstracts of the report:

- a problem of complex computing tasks (based on the video coding example);

- Erlang and С integration (the problem of zero-copy and VM safety at this integration);

- cluster architecture, interaction between DSP-processors and the host (transport level: DMA, PCI-e

level; OS: linux driver; application level: own framework);

- framework for working with shared memory.

A new world, new abstractions, new tools.

Target Audience

Telecommunication developers, embedded system developers, real-time system developers, DSP-

integrators.

Ilya Scherbak

Developer of the VCS product (videoconference server) at Eltex.

Basic professional interests: video processing and transport; HA & HL architectures in telecom, FP, DSP.


Eltex, jointly with the world leading chip vendors, provides a complex hi-tech solution integrated into

the modern telecommunication equipment. For details, please visit: http://eltex.nsk.ru/o-kompanii

Parsing Protection System 2GIS API

Dmitry Barkhatov (2GIS)

Abstracts

When a database contains 1.3 million contacts of the companies throughout all of Russia it is no wonder

that somebody tries parse it down to from time to time. Here arises the problem: how differentiate

good users from bots?

In our report, we will tell you about the evolution in our parsing protection system. We’ll consider the

following stages and approaches:

— special location in Nginx;

— PHP + Redis (keyed counter);

— Nginx + Redis (configuration file);

— Nginx + Lua + Redis : complication of protection logics without the loss to the response speed.

Also, we are going to tell about the Lua language bundled with Nginx not only in case of parsing

protection but also in other frequent cases where the heavy parent application is undesirable to be

“raised”.

The reference API 2GIS is the largest REST API in the Russian Internet.

More than 300 partners, among which there are 2GIS-Online, Mail.ru, NGS, Е1.ru. The monthly audience

is 14 million users.

The service provides the information on 1.3 million companies and 1.8 million POI in 200 cities of Russia,

Padua (Italy), some cities in Ukraine and Kazakhstan.

Target Audience

The report is intended for the developers, team leads and system administrators.

Dmitry Barkhatov

Web developer in the 2GIS API team.


2GIS is the reference of companies marked on the maps of the cities, which is being developed for

Russia, Ukraine, Kazakhstan and Italy.

The total audience is 14 million users per month. Offline and online platforms (maps.2gis.ru and API

2GIS), smartphones (iOS, Android, WP).

The Architecture of 2GIS Reference API

Sergey Korzhnev (2GIS)

Abstracts

In our report, we are going to consider the service architecture and basic infrastructure processes.

The architecture: Yii-framework and components, PgSQL, Sphinx, С++-daemons for multi-objective

search.

Deployment: servers (Novosibirsk, Moscow, Amsterdam), Phing, Chef.

Monitoring: Zabbix API, Pinba + API, Graylog methods profiling utility.

Caching: Nginx + Lua, Redis, APC, cache sharding and invalidation.

Also, we will tell you how do we manage in stably making releases each Tuesday and updating the data

for all of the cities daily. And many other things...

Reference 2GIS API is the largest REST API in the Russian Internet.

More than 300 partners, among which there are 2GIS-Online, Mail.ru, NGS, Е1.ru. The monthly audience

is 14 million users.

The service provides the information on 1.3 million companies and 1.8 million POI in 200 cities of Russia,

Padua (Italy), some cities in Ukraine and Kazakhstan.

Target Audience

The report is intended for the developers, architects and system administrators.

Sergey Korzhnev

Reference 2GIS API Architect. Also worked in the 2GIS-Online and Flamp teams.


2GIS is the reference of companies marked on the maps of the cities, which is being developed for

Russia, Ukraine, Kazakhstan and Italy.

The total audience is 14 million users per month. Offline and online platforms (maps.2gis.ru and API

2GIS), smartphones (iOS, Android, WP).

TopGun = the Terabit DPI Platform

Architecture from Shaping to PRISM

Leonid Yuriev (ZAO PETER-SERVIS)

Abstracts

In the report, there will be represented an overview of the multipurpose DPI platform architecture,

based on which there can be built both “spy” СОРМ/PRISM applications and commercial systems for

PCEF/TDF, safe Internet (smart content filtering), advertisement targeting etc. The peculiarities of the

solution being considered include multi-terabit scaling, balancing method, “swarm” processing (Swarm

Intelligence) and failover by means of replicating finite-state machines.

Roadmap:

- off topic: who needs DPI and why?

- off topic: legitimacy and moral issues.

- which “moon” should we land on, what do we want to do?

- traffic distributions due to using switches and MAC rewrite.

- Swarm Intelligence to control balancing and data processing.

- replication of finite-state machines (virtual micromachines).

- distributed “Table” as the operating store with key-value and eventual consistency elements, lockfree

in shared memory.

- transport face, unifying DPDK, netmap, Infiniband (RDMA), 0mq and zerocopy as well as lockfree

exchange via shared memory.

- latency and throughput, asynchronous decision-making.

Better the devil you know than the devil you don't – be ready to get acquainted ;-)

Target Audience

Architects and developers interesting in new (pretentiously innovative) approaches and solutions.

Leonid Yuriev

The leading system architect in Peter-Servis R&D. Previously found in the Natalya Kasperskaya Centre of

Innovations, Infowatch, KB Chronics, ElCat.


ZAO PETER-SERVIS is the developer of solutions for the telecommunication sector since 1992

specializing in the development, implementation and maintenance of the systems for large

telecommunication operators.

For details, please visit: http://www.billing.ru/company

http://www.billing.ru/company

The Experience of Livestreet Social Media:

Migration from PHP/MySQL to

NodeJS/Redis/Lua

Dmitry Degtyaryev (Habikasa)

Abstracts

We took a typical web application based on PHP/MySQL (Livestreet social media drive) and removed all

the slow and bad things from it:

- continuous PHP - MySQL - Memcached data marshaling;

- framework initialization processes at each call;

- caches and their invalidation mechanisms;

- normalized data structures and multiple operations of JOIN tables;

- access to file system including the database-side;

- crooked, repulsive and JIT-uncompilable programming languages.

The solution we’ve got are as follows:

- NodeJS, Redis and all the logics in Lua inside Redis (as stored procedures).

- NodeJS gets JSON from Redis and sends it to the client without any transformations where it falls into

the MVC framework of Ember.JS type.

- The page issuing speed has increased to 30-100 times.

- There appeared the possibility of push-notifications sending to the client via pub/sub in Redis.

Our experience

- We did it for the project at Live street with 3k topics, 20k registrations and 10k comments.

- Lua skript library for working with bundled objects in Redis will be posted for free access on GitHub

before the beginning of the conference.

The problems, possible solutions, the subject for further research

- Redis executes Lua scripts in a strictly sequential manner; respectively it is not scalable to several

processors.

- When replicating, Redis fully replicates all the scripts, so the native replication becomes senseless.

- The replication mechanism at the program level is needed. It is suitable for applications where read

operations prevail.

Target Audience

Developers, system architects.

Dmitry Degtyaryov

Development manager, architect. He has the experience in creating web applications on EJB, .NET,

LAMP and NodeJS platforms. He was responsible for the development and operability of the following

websites: Svyaznoy.ru, Dostavka.ru, 003.ru. Presently, he is developing his own startup in the blog

hosting area.


Habicasa is a young company whose objective is launching niche social media and blog hostings,

possessing technical and other advantages as compared to the existing forums and general-purpose

social media. One of the first projects launched by our company is http://www.7dach.ru

http://www.7dach.ru/

Garbage Collection in Java without Pauses

Alexey Ragozin

Abstracts

The report on pauses when collecting garbage was already presented at one of the previous HighLoad++

conferences.

Stop-the-world pauses are the integral feature of the automatic memory control.

Or can they still be avoided? They can!

There do exist the algorithms non-requiring pauses for memory control. Also, there exist real JVM

implementing them.

Contents of the report

- Automatic memory control (garbage collection) principles.

- "Metronome" – a classic unpaused garbage collection algorithm

- С4 - Zing JVM (Azul Systems) garbage collection algorithm.

- Peculiarities of efficient implementation on the x86 architecture.

- Additional problem sources: weak references, fragmentation, and etc.

Target Audience

Java developers and other specialists.

Alexey Ragozin

Alexey Ragozin graduated from Bauman Moscow State Technical University. He has the long-term

experience in developing and designing high load information systems in such sectors as finance,

telecommunications and ecommerce.


Alexey is an independent advisor on implementing DataGrid solutions, building distributed systems and

solving the application performance problems. In addition, he actively participates in the Open Source

Projects (including OpenJDK).

Backward Compatibility

Sergey Konstantinov (Yandex)

Abstracts

1. Why: backward compatibility as a competitive advantage.

2. The method allowing designing the architecture to minimize the risks of violating backward

compatibility.

3. One and a half method of safe refactoring.

Target Audience

Large project developers.

Sergey Konstantinov

Yandex.Maps API developing team manager. Graduated from the South Urals State University. Since

2008, he has been involved in the development of Yandex.Maps API. In 2013, he was elected to W3C

TAG. He has an interest in classical music.


Yandex is a Russian IT company owning the same named search engine in the Web and the internet

portal.

Aviasales Metasearch Engine Design

Boris Kaplunovsky (Aviasales.ru)

Abstracts

- AviaSales is the ticket metasearch. To process one search query, our service inquires from one to four

dozen of external services and processes their responds. The respond of one service is from 200 to 2,000

kilobytes of XML/JSON. The company profitability depends directly on these data processing speed.

- To enable the efficient operation with the remote service APIs and fast parallel data processing, the

“Yasen” drive has been developed.

Among the requirements imposed on the system are:

- high failsafety;

- easy scalability;

- hot deployment;

- simple configuring;

- low entry threshold for programmers

Within the framework of developing one search query, the queries shall be sent and the results shall be

processed from a large set of the external APIs, these queries shall be executed in parallel. In addition

to parallel module design, the sequential execution is required. A portion of data may be processed

already after providing the data to the user, this is the deferred execution. DSL used in Yasen contains

the three mentioned structures (parallel/sequential/delayed). The workflow is built of the independent

and isolated modules (units) using the DSL. A typical workflow consists of 10-100 units integrated in

sequential, parallel and deferred chains.

The necessity of fast responding to user reviews and performance indicators of the external services

influences the requirements to the system configuring simplicity. Such things as changing the ticket

extra charge for various suppliers, managing the set of agencies for searching depending on the user

locale, traffic source, user’s geographical position and many other factors shall be done in runtime

without the assistance of technical specialists. To solve this problem, the workflow in “Yasen” is built of

units, each of which has an URL to get current unit configuration and its updating. To simplify the

development of units and their testing, each unit can be called via HTTP in addition to URL configuring.

The units are interacted via HTTP, the data are coded in json.

The failsafety requirements made us reconsidering the database operating techniques applied earlier.

The data were split into three types:

- references – the data being continuously read by the application but quite rarely updated (the

information on airports, cities, countries, airline companies);

- logs – the data being continuously written but never read from the application (the information on

search queries, clicks, user behavior)

- dynamics – short life-cycle data being actively written and read (the information on searches, links to

the agencies’ websites).

Separate failsafe techniques were developed for the different data types.

The system we’ve got allowed triple decrease of the machine fleet, it has simplified connecting new

services and has considerably increased the system operational predictability.

Target Audience

Programmers and load system architects.

Boris Kaplunovsky

CTO at AviaSales.ru, more than 10 years in the industry, ready to program on all that can be compiled

and interpreted. He has working experience at Sun Microsystems and Intel but two years ago he left for

a small company named AviaSales to get a real drive.


Aviasales.ru is the first and the largest internet search of cheap air tickets in Russia. We search air tickets

in 728 airline companies, 40 air ticket offices and 5 booking systems (GDS), the technology stack

includes RoR/Redis/RabbitMQ/MySQL.

Aviasales.ru is the project visited by more than 4 million users monthly providing the additional

searching functions for hotels, insurances and car rental services. In 2012, the tickets amounting to RUR

5.1 billion were sold using the search engine. An English air ticket search engine JetRadar.com operates

for the Western market as well. The company staff is approximately 70 persons. The headquarters and

80% of the employees can be found on the beautiful sunny island of Phuket in the Kingdom of Thailand.

Developing and Maintaining API in a Large

Service (Wamba)

Vitaly Katorgin (Wamba)

Abstracts

The theater begins with a hanger, and the high load service for multiple platforms (mobile, desktop and

even console ones) begins with API which shall be thoroughly designed, documented and developed

considering the needs of each consumer.

- Automatic document collection for third-party developers:

- A web console.

The console generated using the apigee.com service based on phpdoc documentation with the help of

ApiGen, where the developer may experiment with API:

-- structure descriptions;

-- http routing service descriptions.

- generated routing based on the documentation (phpdoc).

Within the framework of the report it will be demonstrated how to create a beautiful documentation

interface for third-party developers and how to make impossible the service running in case of no

documentation available for it.

- Dynamical form building with the states (FormBuilder).

With no transfer to hybrid applications, we can add the web form dynamics in native applications by

using the lightweight JSON protocol.

- Behat functional testing.

API natural-language testing.

- Structure version control.

The method for softly changing the API protocol without breaking production clients.

Target Audience

Developers, testers.

Vitaly Katorgin

I manage the mobile service development department at Wamba. My key activity is developing so fast

and flexible API that the developers of many distinctively different platforms can use it.

Such searches lead us to interesting/unique discoveries in the area of API architecture and testing.

Frequently, we are led by the motto: “We know how to do it but how can we make it better?!”


Wamba is one of the leaders among the services for meeting new people.

The System for Fraud Detection in High-

Frequency Stock Trading

Iosif Itkin (Exactpro Systems)

Abstracts

The main objective of the exchange monitoring and control is ensuring ordered and stable market

functioning and, in particular, supporting the analytical work of the departments responsible for

detecting possible manipulations.

The monitoring system shall collect the information on all the incoming requests, system responds, data

obtained from external sources as well as on the relevant internal conditions of the trading platform.

Under the conditions of high-frequency exchange trade this amount of information is rather large and

their processing requires scalable infrastructure.

The exchange operators and traders expect the risk monitoring and control systems to cause minimal

impact on the basic functioning of the exchange platform and the response time.

Unfair market participants try to conceal market abuse and manipulations posing them as legitimate

financial transactions; some legitimate operations possess many signs of violations as well. The market

monitoring and control systems have to issue analytical conclusions under the high-load conditions

based on the fuzzy logic whose parameters are necessary to be continuously adapted for new trading

situations and traders behavior patterns.

Our report contains the review of the subject field and basic requirements to the control and

monitoring systems for exchanges as well as the description of the system being developed by us, its

architecture and the selected process stack including Akka, LevelDB, Play, ZeroMQ and ExtJS.

Target Audience

A wide range of specialists interested in high load systems and, particularly with the high-frequency

trading systems.

Iosif Itkin

Cofounder of Exactpro Systems.

http://www.linkedin.com/in/iosifitkin

His interests: development of analyzing tools and methods for high-load trading systems for the leading

financial companies.

He managed the testing projects for the systems of the London Stock Exchange, large investment banks

and leading brokers.

Organization of the conferences "EXTENT Trading Technology Trends & Quality Assurance"

(http://extentconf.com) and "Tools & Methods of Program Analysis" (http://tmpaconf.org)

Co-reporter is Anton Sitnikov.


Exactpro Systems (http://exactpro.com)

All the information about Exactpro Systems on one slide: http://www.slideshare.net/IosifItkin/what-

makes-exactpro-different

JIT Compilation in a Java Virtual Machine

Alexey Ragozin

Abstracts

It’s not an easy task to ensure the efficient execution of a dynamically typed high-level language. Just-in-

time (JIT) compilation, i.e. dynamic machine code generation performed with account for the

information gathered during the application execution phase, is a key element of the virtual machine

performance (be it Java, .NET or even JavaScript). To set off language "dynamism" a JIT-compiler should

have an impressive set of tricks and optimizations.

The report will examine HotSpot JVM (free JVM from Oracle) and its JIT-compilation architecture. In

particular, the following topics will be expanded on:

- multi-level compilation;

- on-the-fly executable code rewriting;

- method call devirtualization;

- escape analysis & scalar replacement;

- memory management system (garbage collection) and compiler interaction.

Target Audience

Java developers, everyone interested in modern Just-in-Time compilation achievements.

Alexey Ragozin

Alexey Ragozin graduated from Moscow State Technical University n.a. N.E. Bauman. With many years

in the industry, he has acquired expertise in high-load systems development & design in the fields such

as finance, telecommunication and ecommerce.


Alexey is an independent DataGrid solutions deployment consultant specializing in building distributed

systems and managing application performance. Also, he is an active participant of Open Source

projects (including OpenJDK).

Heavy Load in the Cloud: Simultaneous

Delivery of Personalized Mobile Notifications

to Millions of iOS, Android, Windows &

Windows Phone Users

Kirill Gavrylyuk (Microsoft Corporation)

Abstracts

Mobile alerts are of extreme importance for creating the applications you want to use all the time. You

are welcome to examine how high load mobile solutions such as Bing News handle a problem of the

simultaneous delivery of customized mobile alerts to millions of iOS, Android, and Windows Phone

smartphones users, using Windows Azure Notification Hubs. We will examine the architecture of these

solutions as well as one of Notification Hubs, including parallel delivery execution with a little delay,

"rich" pub-sub routing, audience segmentation and alerts customization enabling multi-platform

support.

Target Audience

Developers, architects.

Kirill Gavrylyuk

Kirill Gavrylyuk, our compatriot, graduated from Moscow State University in 1997. Later on, Kirill moved

to the USA for work. In 2000, he started working in Microsoft. Within his career in Microsoft Kirill

Gavrylyuk has participated in the development of many products, such as System.Xml, Web Services

Enhancements, Windows Communication Foundation.

Nowadays, Kirill Gavrylyuk's duties in Microsoft include creating Windows Azure mobile services for

mobile software development, including Windows Azure Mobile Services and Notification Hubs,

designed to simplify for mobile developers the process of creating the features such as authorization,

data storage, and high-volume push-notifications.


Microsoft Corporation builds software, services, and devices worldwide.

“Badoo in the Cloud” (Solutions for Running

CLI Scripts in a Cloud Developed by Badoo)

Yuri Nasretdinov (Badoo)

Abstracts

Badoo is the largest social media for meeting new people worldwide. We have thousands of servers in

two data centers, some of them go down all the time. Our machine cluster consists of different groups:

machines used for handling web requests, data base servers, image storage, servers used for executing

cron tasks, machines for C/C++ services and some others. For scheduled task processing, we use so-

called script machines that allow running PHP-scripts in CLI mode and carrying out all the required

actions. Up until recently, we were using regular cron for scheduled scripts running as well as an in-

house developed utility in order to automate the process of writing required strings in cron. However,

developers, based on some criteria, had to choose one or a few machines that they would use to run

these scripts, they were strictly assigned to specified servers. If one of machines was going down, we

had to manually transfer scripts from it to other machines. To balance load across servers as well as to

ensure automatic recovery in case of failover we decided to create our own cloud which is designed to

fix these problems. This report is dedicated to the process of creating a cloud and the first results we

achieved in respect of using it.

Report outline

- Cloud requirements.

- Existing solutions.

- Balancing load across machines.

- Processing equipment failure.

- Cloud monitoring.

Target Audience

Developers and lead developers.

Yuri Nasretdinov

PHP Developer


Badoo is the largest and the most fast-growing social media for meeting new interesting people. Badoo

connects 170 million users from 180 countries.

The Badoo website ranks as the 137th most popular website in the world, according to Alexa.com. With

46 millions unique users per month, the website ranks as the 59th in Google TOP 1000, the list of the

most visited websites . Every day 150,000 new users sign up in Badoo, moreover, the system is getting 3

million pictures uploaded and 50 million messages sent daily.

Badoo is a technically complex and high-load project. Steady project operation is ensured by 2 thousand

servers located in two geographically remote data centers (Miami, Prague). The daily dynamic load on

backend during peak hours accounts for more than 40 thousand requests per second. Several billion

events are uploaded to Badoo analytical processing systems daily.

A set of Badoo in-house developments were released under free license, the most known of them are

FCGI manager for PHP (php-fpm), Pinba server designed to collect statistics in real time, fast template

engine Blitz.

Now, Badoo has two headquarters, in London and in Moscow, with more than 200 employees. London

is responsible for business development and all product-related concept decisions, while 90% Moscow

office workers are developers who implement these ideas.

The Ecosystem of Sochi 2014. Dreaming of

Space

Mikhail Chekanov, Olga Kulikova (Organizing Committee

Sochi 2014, Articul Media digital agency)

Abstracts

The website Sochi 2014 is a greatest event in the modern Russian history both offline and online!

We will talk about the demands that Organizing Committee for the Olympic Games put forward to the

web portal Sochi 2014. The high load is not by no means all, the high load doesn’t consist of the website

traffic only.

We will tell you about the solution evolution, about the options we were considering, and we will

discuss the selection criteria, and, finally, we will bring to light our choice, everything that concerns

technologies, architectures, and people.

Plus to that, we will explode many popular myths about a grandiose online construction site!

Target Audience

Everyone interested in web projects and the coming Olympic Games.

Mikhail Chekanov, Olga Kulikova

Mikhail Chekanov (Organizing Committee Sochi 2014),

Olga Kulikova (Articul Media)


Organizing Committee Sochi 2014 and Articul Media digital agency

Section: Databases, Storage Systems

The Whole Truth about Indexes in

PostgreSQL

Oleg Bartunov, Fedor Sigarev, Aleskandr Korotkov

(PostgreSQL)

Abstracts

Databases are an integral part of any modern information system, usually, they are located at the lowest

level of software stack. They have direct impact on the overall performance of the whole system.

Besides obvious hardware-related factors, DBMS is affected by the factors such as structures that store

data, data amount, request types, amount, and competitiveness degree. Often, indexes in databases are

considered as a magic wand that can help to gain the lead in a continuous chase for performance.

However, first of all, you need to cast and load this silver bullet, then carefully take aim and smoothly

move the trigger, and finally, plant it in a target. PostgreSQL offers a number of opportunities to index

various data together with an opportunity to create new data types and associated indexes. Difficulties

that junior developers face when trying to gain insight into the matter and to choose the best option are

caused by the diversity of opportunities.

Our report will explain what is meant by indexes, what index types already exist in PostgreSQL and

which of them you can write yourself, you will learn the way they should or should not be used, you will

get an insight on efficient techniques of index management within various database scenarios.

Target Audience

DBA, system administrators, programmers.

Oleg Bartunov, Fedor Sigarev, Aleskandr Korotkov

Oleg Bartunov is a leading PostgreSQL developer, creator of the Astronet project, specializing in the field

of high load information projects and very large databases, also he is a research scientist at Sternberg

Astronomical Institute MSU.

Alexandr Korotkov is a leading developer at Intaro Soft Ltd. He has been working on the following

projects: new website of the State Duma, order processing system for re:Store, federal insurance web

portal strahovka.ru, human DNA testing service i-gene.ru.

Fedor Sigarev has been participating in the PostgreSQL project for many years. For more information

about the speaker visit http://www.sigaev.ru/


PostgreSQL is a leading DBMS with open source code. The PostgreSQL community accounts for

thousand users and developers, as well as numerous organizations around the globe. The PostgreSQL

project has been existing for 25 years; the University of California, Berkeley originated its development,

nowadays the project grows in an unprecedented way. For its features, PostgreSQL is not just equal, but

even superior to leading commercial DBMS, due to its developed functionality, extensibility, safety and

stability. For more information about PostgreSQL and the opportunity to join PostgreSQL community

visit the website www.postgresql.org.

An Overview of Popular Modern Algorithms

for Storing Data on Disks: LevelDB, TokuDB,

LMDB, Sophia: LevelDB, TokuDB, LMDB,

Sophia

Konstantin Osipov (Mail.Ru,Tarantool.org)

Abstracts

If you create a high load project the problem of creating a specialized DBMS or the one of existing

product selection should arise before you. In my report I'll give an overview of the principles of

operation of popular systems existing today: LevelDB, TokuDB and others. I will clearly demonstrate how

the storage, atomicity, fast data access are organized, I'll speak about advantages and disadvantages of

different systems.

Target Audience

Architects, С/C++ programmers

Konstantin Osipov

Tarantool, MySQL


Mail.Ru,

Tarantool.org

Dissecting the Rabbit: RabbitMQ Internal

Architecture

Alvaro Videla (Pivotal, Inc.)

Abstracts

RabbitMQ is a Messaging and Queueing server implemented in Erlang. This talk will explore how

RabbitMQ uses Erlang and OTP to build a highly reliable message broker.

We are going to review areas such as:

RabbitMQ Boot System: How does the broker boots until it's ready to accept messages.

A day in the life of a message: The path a message takes while passing across RabbitMQ.

RabbitMQ's own message store: persistent messages, transient messages and the in memory cache for

fast delivery.

Supervisor Trees, RabbitMQ own behaviors and more.

If you ever wondered what goes inside a robust Erlang application then this is the talk to attend.

Target Audience

All specialists interested in RabbitMQ.

Alvaro Videla

Cloud Foundry Developer Advocate at Pivotal Inc.


Pivotal Inc.

The MySQL and MariaDB Story

Michael Widenius (Monty Program Ab)

Abstracts

The story of how MySQL was created, why it was successful and how it grow until it was sold to Sun,

who was then overtaken by Oracle.

It will also cover how and why MariaDB was created and what we are doing to ensure that there will

always be a free version of MySQL (under the name of MariaDB).

The talk will also explains the challenges we have had to do this fork, especially the merge with MySQL

5.5, and the various systems (like buildbot) that we used to build the binaries and how we are working

with the MariaDB/MySQL community.

Target Audience

All living in MySQL and MariaDB world.

Michael "Monty" Widenius

CEO & VP Community at Monty Program Ab.


Monty Program Ab.

How to talk about database storage

performance MySQL versus the

alternatives

Mark Callaghan (Facebook)

Abstracts

Write-optimized databases and flash storage have arrived. Moving from disk to flash is usually easy.

Changing a database implementation has much more uncertainty when that includes moving from the

well-understood B+ Tree algorithm as implemented by InnoDB to a write-optimized algorithm like a Log

Structured Merge tree as implemented by WiredTiger, LevelDB and HBase. I did such comparisons on

paper and using real implementations. I will describe what I learned and the framework I developed to

describe and compare the alternatives. My focus is on storage efficiency.

Target audience

All specialists using modern database storages.

Mark Callaghan

Software Engineer at Facebook.


Facebook

Intergalactic Data Speak

David Fetter (Disqus)

Abstracts

Disqus runs the comments section on 75% of the world's web sites. To handle this volume of data, we

use many different data storage and motion technologies including Memcached, Cassandra, Redis,

Storm, and of course PostgreSQL at the center. Find out how we're federating all this and what for with

foreign data wrappers.

Target audience

All interested in PostgreSQL and Disqus.

David Fetter

Data Architect at Disqus


Disqus

Road Network in Neo4j Graph Database

Vadim Shashenko (2GIS)

Abstracts

In my report, I will explain why we chose graph database Neo4j for controlling road graph of Russian

cities (all the populated localities with the population more than 300,000). With Neo4j we solve the

main problem of connection and traffic control.

Report outlines:

— SQL versus graph databases;

— neo4j graph database review;

— architecture of a solution where a graph database is used;

— executing algorithms on graph under conditions of its frequent changes

The report is based on the results of the work on the Fiji project. Fiji is an in-house system that allows to

2GIS full-time cartographers to create, store and export maps to external software products, i.e. online,

desktop and mobile versions of 2GIS.

Target Audience

Developers, team leads, architects, system administrators.

Vadim Shashenko

Fiji project team leader (in-house system of 2GIS map creation).

Technologies: С++, С#.


2GIS is a directory of companies on maps of the cities of Russia, Ukraine, Kazakhstan, and Italy. Total audience is 14 million users per month. Platforms: online, offline (maps.2gis.ru & API 2GIS), smartphones (iOS, Android, WP).

Calculation of Unique Combinations through

the Example of Statistics System for Groups

in the Odnoklassniki Social Media

Alexandr Sharak (Odnoklassniki.ru)

Abstracts

Odnoklassniki has over 1 million active groups with more than 700 millions actions taken per day. For

group administrators, we calculate various unique combinations for different time periods and with

different frequencies. Below are the examples of some of them: group, sex, age, activity per period (per

hour, per 24 hours, per 365 days). In case of calculating unique combinations per 365 days we have to

process more than 200 billion records daily.

These calculations are done on two MS SQL 2012 AlwaysOnAvailabilityGroup servers and on one MS

Windows 2008 R2 server with .Net 4.0. This report examines how we built this system and what means

we used to solve the problems we had to face during the process.

Target Audience

Developers and database administrators.

Alexandr Sharak

Head of business analysis department at Odnoklassniki


Odnoklassniki.ru

How We Store 60,000 Events per Second

Arsen Mukuchan (AdRiver)

Abstracts

1. Introduction.

1.1. About AdRiver.

1.2. Defining an event as a system informational unit.

1.3. Conditional system separation in two parts: a real time-part is designed to fetch banners and to

create events, a statistical subsystem is designed to store and to process them.

1.4. Specifying the need for storing events, a few examples of the analytical data that is provided to

clients.

1.5. Specifying the need for ensuring data consistency is a main business requirement to the storage

subsystem. Narrative about distant ages when events were stored in text logs that forced us to regularly

explain to our clients why numbers in different reports didn't agree.

2. Problem statement and requirements.

2.1. Description of operating data volume: 4 billion events per day or approximately 500 Gb of serialized

data per day. Minimal storage period is 1 year.

2.2. Ensuring data consistency.

2.3. Ensuring random data access.

2.4. Ensuring reliability, fault-tolerance, and scalability.

3. Implementation and key solutions.

3.1. Application environment benchmarking study. Experiments with map/reduce frameworks and

DBMS. Justification for developing an in-house storage system.

3.2. General description of the developed History system.

3.3. Rationale for the importance of an algorithmic optimization and for renouncing "excess" logics on

high load servers.

3.3.1. Our project development principle: every component has to perform WELL one function instead

of performing many of them BADLY.

3.3.2. Server that has random data access is abstracted from data and it doesn't process them in any

way. It allows reducing the load on the processor during processing multiple data requests.

3.4. Brief description of serialization mechanisms used. Analogy with Google Protobuf.

3.5. Description of the used indexation mechanism that allows identifying any system event. This is a key

solution ensuring consistency.

3.6. Dataflow storage and paging — panacea to solve internal memory-related problems. It's preferable

to "bump" into a disk subsystem or the processor performance than in "Out of Memory", because the

first case will lead to the system performance degradation only, while the second one will cause denial

of service.

3.7. Data condensation, storage paging and server abstraction from data allow our any node to give one

gigabyte of data per second serving 200 clients and processing 1,000 files. At that moment, process,

disk, and memory utilization do not exceed the minimal value. The server sends to a client the achieved

data pages that he/she should unpack and process before sending request for the next portion of data.

During this period of time the server is able to prepare the next portion of data and to prepare other

clients.

4. Localization directions.

4.1. Once we broke our own project development principle by loading the server with data reading

optimization (grouping by keys). This caused Pandora's box to open, now configuration errors can make

the system to "sink" into io wait.

4.2. The disk is operated asynchronously with the help of POSIX AIO. We are planning to switch to kernel

AIO and to make a comparison of operating characteristics.

5. Conclusions and future directions.

5.1. The problem is solved, everyone is satisfied.

5.2. Summarizing system operating characteristics: data year - 200 Tb, parallel handling of client

requests from 1,000 clients, total subsystem output traffic - 20 Gb/second. Estimated cost.

5.3. Brief overview of the analytics tools built on the base of History. Reasoning on the following topic:

all of the "excess" logics that was renounced in paragraph 3.3. went to higher levels.

5.4. It's necessary to significantly increase data acquisition rate. Descriptions of the options that allow

implementing this requirement.

5.5. It's necessary to reduce system latency without data consistency loss.

Target Audience

People working on the mechanisms of the high load system development.

Arsen Mukuchan

Lead programmer at AdRiver. 8 years experience in commercial software development, 7 years

experience in high load systems development in the fields of telecommunication and digital advertising.


AdRiver is a known leader in the field of digital advertising technical solutions in Russia. Nowadays the

company ensures approximately 4,000,000,000 advertising modules impressions daily processing more

than 60,000 requests per second.

Query Optimizer in MariaDB: No Indexes

Anymore!

Sergey Golubchik (Monty Program Ab)

Abstracts

It's commonly known that to optimize queries in MariaDB/MySQL you need to use indexes. Indexes

allow fast data access, moreover, they allow to get statistics. Obviously, sometimes index maintenance

can considerably slow performance, but what will you do?

A query optimizer in MariaDB 10.0 supports statistics no matter what data storage engine is used.

Firstly, it means that the optimizer can collect and use statistics across all of the tables from all of the

engines, in the same way and independent of the specific engine capabilities. The most important

achievement is that it can collect data on non-indexed columns! It's able to get not just one figure,

cardinality, for indexes but also the histograms that precisely describe value distribution on a column!

Within my report I will demonstrate how to enable these fundamentally new optimizer features and I

will tell how it will affect the rate of your queries execution.

Target Audience

DBMS developers and administrators (DBA), newbies, and skilled professionals.

Sergey Golubchik

He has been on the lead of MySQL developers since 1998. He started his career at MySQL AB, then

moved to Sun Microsystems. From 2009 he has been working on MariaDB at Monty Program Ab. During

these years, he implemented many projects almost in every MySQL server subsystem and portion.

These include full-text search, XA, HANDLER, arbitrary-precision calculation library, parallel index

building in MyISAM, indexes in MERGE, plugin API, pluggable authentication, microsecond-precision

TIME, DATETIME, and TIMESTAMP storage and other. Sergey wrote the book MySQL 5.1 Plugin

Development.


Monty Program Ab.

Distributed Data Storage Systems for

Analytics: Vertica and other systems

Alexandr Zajcev (LifeStreet)

Abstracts

Many companies face the necessity to store and analyze large amounts of data (around terabyte and

more). Sooner or later storage systems exceed specific server capacity, then the problem of distributed

system selection arises before developers and architects, moreover, they have to examine routine

questions such as performance, scalability, fault-tolerance etc. There are several approaches to

distributed data storage systems creation, each of them fulfills the above requirements in a different

way. However, not all of them are well-suited for analyzing large amounts of data. There are no

universal tools.

This report will examine specific requirements to distributed data storage systems that were obtained

from analysis examples, specifically from ones of multidimensional analysis. These requirements reveal

the particular technical issues that can be fixed differently. I will talk about the ways to win through

them based on the examples of several systems, including specialized analytical system RDBMS Vertica I

will focus on in this report (other distributed system used for comparison: key-value Dynamo-like,

ShardQuery, Hadoop, HadApt, MemSQL, Paraccel). Our company has been using Vertica as a main

platform for analyzing effectiveness and optimization of advertising network performance for three

years and we are satisfied with this system. It allows processing and analyzing up to 10 Tb of raw data or

3.5–4 billion events per day (see my report from the last year:

http://www.highload.ru/2012/abstracts/430.html). After have tried different solutions we have got a

clear understanding not just of advantages but also of compromises that need to be accepted. Vertica

architectural solutions are well-thought out and suit perfectly for solving the problem of distributed

analysis of large data. Understanding of usage limitation as well as advantages and disadvantages is

useful not just for Vertica users but also for other distributed systems users and developers.

Target Audience

Specialists in databases and storage systems, distributed systems architects and users.

Alexandr Zaitsev

Development director at LifeStreet. Specialist in analytical architecture. Specialist in distributed systems.


LifeStreet is a leading company in the market of in-app advertising for Facebook and mobile devices. Our

advertising network processes around four billion transactions per day. Our employees, robots and

optimization algorithms constantly analyze transaction statistics in order to achieve maximum efficiency

of advertising campaigns and optimum monetization of advertising space for our clients.

How to Put Database Migration on the Flow Ilya Kosmodemyansky (PostgreSQL-Consulting.com),

Roman Druzyagin (404 Group)

Abstracts

Many storage system manufacturers offer simple wizard-like systems to allow your application to switch to their technology. However, when our clients ask us about how to migrate from one database to another we always start with explaining the complexity and uniqueness of this process in each particular case. After having read a good book you can get an insight into operating a database or you can get basic knowledge of database structure. However, books will not teach you how to plan a migration of one of the main system components, the data handling layer, from one technology to another. Beside the point, there is not such a book. In this speech, we will share with you unrivalled expertise of one of our clients — 404 Group. They managed to enable a non-bespoke database migration: for the last few years, a number of independent projects were successfully transferred to PostgreSQL from different data storage technologies. Very often, people who offer to change a data storage system are looked upon as professional revolutionists. They show some fanaticism and commitment to raze everything and then..., plus (often) unwilling to work in a team, and (sometimes) inability to develop a system in an evolutionary way. How to avoid such extremes? How to estimate the necessity migration, potential gains and costs correctly? How to implement a migration with minimal damages for your business? We will answer to all of these questions and we will try to demonstrate a few points of view related to them. With Roman Druzyagin, chief technical officer at 404 Group, we will examine every case from two points of view — from the one of a technical manager and a database administrator. For example, you are a developer and you know for certain that to ensure optimum data storage you need to implement a migration from MySQL, or PostgreSQL, or Oracle, or HBase to HandlerSocket, or DB2 z/OS, or MongoDB, or even to "write your simple storage engine" (tm). What you should think about before presenting this idea to your manager and what reasons you should give for your proposal? Another situation: you are a technical manager. From day to day, developers come to your office, with eyes lighting up with a proposal to do "like on Facebook", or "like on Vkontakte", or "like we were taught at university". Which considerations to rely on when making a decision? From where to start if you are offered a new technology that you are not yet familiar with? We will reveal you all of the entrails: which reasons lead to the migration, how exactly we were planning and implementing the migration, what problems we faced during the process, what changes appeared in the project performance after the migration done. We will tell about what a team needs to do in order to implement a migration. We will examine the process of the assimilation of a new technology, we will discover pluses and minuses of the migration that affect development process.

Target Audience

Architects, developers, chief technical officers, field engineers.

Ilya Kosmodemyansky, Roman Druzyagin

Ilya Kosmodemyansky is a consultant at PostgreSQL-Consulting.com, he's a specialist in Oracle, DB2,

PostgreSQL databases.

Roman Druzyagin is a chief technical officer at 404 Group.


PostgreSQL-Consulting.com offers a full professional support service for the full life cycle of PostgreSQL

databases, including ones in Russian.

Behind the name of 404 Group there is a consolidated group of companies, all of them have the

common center of strategic management, and the active mentorship concept, and participating in

projects development. By using successful technical and management expertise in the field of IT and

digital project development the 404 Group offers to companies financial, operations and administrative

support to develop, launch and develop their projects.

NoSQL Database Performance Testing

Denis Nelyubin (Thumbtack)

Abstracts

We at Thumbtack have been working on benchmark performance testing and fault-tolerance testing of

the four NoSQL solutions: Aerospike, Couchbase, MongoDB, and Cassandra. Research considerations:

proprietary key-value Aerospike database was included in the comparison, SSD performance testing.

Database performance was measured with YCSB (Yahoo! Cloud Service Benchmark). We needed to

make changes to YCSB: to add Aerospike support, to fix MongoDB drivers, to write a wrapper for the

parallel start of YCSB tests across multiple clients and for the visualization of obtained results. We

needed to enable multiple clients support after we had discovered that the full loading of a 4-way

cluster (4 cores per node) can be implemented on 8 client machines (256 streams).

The tests demonstrated that the performance of the data in a key-value database random access

memory is up to a million operations per second. The performance of more functional document-

oriented and column-family databases is an order of magnitude less. The fastest databases latencies

don't exceed 1 msec (per 1Gb Ethernet), the average latency value of the slowest ones doesn't exceed

10 msec.

Fault-tolerance testing was implemented by the means of the cut (kill -9) and the followup enabling of

one of the cluster nodes, fault time (during the cut and the enabling of the node) and performance

degradation were measured. Different databases demonstration quite different behaviors designed by

their architectures.

Within the speech we will examine testing techniques used and result interpretation methods.

Moreover, we will announce the results and personal impressions concerning the use of these

databases.

Target Audience

The speech will be interesting for the architects and developers who are getting to choose NoSQL

database. The speech will contain the absolute figures of performance and several characteristics of

database operating under high load.

Denis Nelyubin

He has been working in the field of IT for over 10 years. He has begun his career as a system

administrator, now he's a chief technical officer at Thumbtack. He has participated in the creation of

web applications, storage systems, document search systems, ecommerce systems, CMS etc. He writes

his own mobile applications for Android. He participates in the activities of the MongoDB users group.

He works at Thumbtack.


Custom software development, specialized in complex and high load distributed systems.

Cassandra vs. In-Memory Data Grid in

eCommerce

Alexandr Solovyov (Griddynamics.com)

Abstracts

- Description of the main requirements to the system, high-level architecture.

- Reasons of migrating from In-Memory Data Grid to Cassandra.

- Description of the actions that helped to improve performance (TPS and/or latency).

- Description of the actions that didn't bring significant results, reasons explanation.

Target Audience

Everyone interested in practical aspects of the usage of Cassandra technology and those interested in

the comparison of Cassandra and In-Memory Data Grid within a real project.

Alexandr Solovyov

http://www.linkedin.com/profile/view?id=139775488

Commitments: distributed systems, application languages, AI.


http://www.griddynamics.com/

Top 10 Popular Questions to DBAs, or Why

I’m against Free Distribution of Handguns

Ilya Kosmodemyansky (PostgreSQL-Consulting.com)

Abstracts

“Mr. Military, will they give us weapons?

— Three hundred thirty-five…”

(c) DMB

1. We have to implement multi-master replication.

2. Why do we need to backup if we have a slave?

3. We’ve created an index—why is it not used?

4. “I was told that InnoDB can be customized such a way that it will faster than Oracle…”

5. Can we use a join?

6. “We have to display 20 much important count(*) on the main page…”

7. “So maybe it’s still possible anyway?”

8. “Can we rollback a commit?” “But it’s possible in git…”

9. “Let’s make it schemaless (or EAV): will this allow us bypass the column adding problem?”

10. “SQL is slow and obsolete, why don’t we read directly from the table?”

Target Audience

It’ll be useful to anybody having to do with the server side :-)

llya Kosmodemyanski

Consultant at PostgreSQL-Consulting.com, specialist in Oracle, DB2, PostgreSQL databases.


PostgreSQL-Consulting.com offers full-scaled professional support service comprising the entire lifecycle

of PostgreSQL databases, also in Russian.

Nutanix: Creation of New Generation Clouds

Based on Big Data Working Principles

Maxim Shaposhnikov (Nutanix)

Abstracts

Servers and Data Storage Systems with RAID as legacy (due to complexity of infrastructure support,

large engineer teams, high requirements to technical specialists).

There are lots of various hardware solutions in the market (servers, data storage systems,

communication hardware, etc.). Most often, nameplates and fractions of a percent in performance are

the only difference.

Almost all of them use x86 processors, similar types of memory, hard disks, external data storage

systems. There are lots of standard bottlenecks and restrictions.

As the data volume grows (that means hundreds of terabytes per a logical group), data loss on a RAID6

array will happen for sure within less than a year.

Approaches of such companies as Google, Facebook and Amazon are unique and free of almost all

aforesaid problems.

There are no RAID and data storage systems, but “dispersed data” is used instead; there are no

“servers” in traditional sense, but there are unified modules, Ethernet switching instead of building FC

SAN networks, the whole logic lives in software, and not in dedicated chipsets.

Result: unique performance and reliability.

Problem: such technologies, as a rule, are inaccessible to other businesses and projects (they are

technological know-how), and development of similar solutions is feasible only to world’s largest

companies.

Nutanix has gathered the best engineers (including those who came from the said companies) and it

resulted in a solution that employs the same principles, but is much more accessible to mass

applications.

Maximum utilization of existing solutions and technologies (such as Linux, Cassandra, Apache

ZooKeeper, python, and several others), orientation at flash storage, and its being tailored to private

clouds (KVM, VMware, HyperV).

Result: even rather small companies and businesses can build their own infrastructures on a turn-key

basis within few minutes with high packing density (hundreds of virtual machines per each rack-mount

unit), thus saving resources and focusing at solution of business problems, not technology problems.

Target Audience

Engineers, infrastructure specialists

Maxim Shaposhnikov

Nutanix Technical Director and Solutions Architect for Eastern Europe and Russian Federation.


Nutanix is a next generation computer platform developer company.

MySQL Performance Analysis and

Optimization

Peter Zaitsev (Percona)

Abstracts

A database must operate quickly. It definitely must operate quickly, always. Unfortunately, it’s not

always achievable in real life.

In our report, we’ll try to answer a number of the following questions:

— How can we know whether MySQL really restricts performance of our application?

— Which factors affect performance?

— What’s the correct way to test MySQL performance and which metrics should be considered?

— How to determine which queries create problems?

— How to increase MySQL performance?

— How can MySQL performance be predicted and planned?

Target Audience

Developers, DB administrators, and not only them.

Petr Zaytsev

CEO at Percona


Percona

Section: Video

Broadcasting Video at 10 GBit/s

Maxim Lapshin (Erlyvideo)

Abstracts

Modern objectives of video streaming require new solutions. People want to watch multi-bitrate SD/HD

video with ability to select languages and subtitles, where old solutions fail.

The first part of the report will give a brief review of the most archaic solutions for video file streaming

that are still in use today. The audience will be also presented a review of newer solutions, whereupon it

will be explained why such solutions have ceased to meet the today’s needs after the advent of HD

video and price reduction of internet channels.

The second part of the report will recount which hardware problems, OS level problems and algorithmic

problems occur when 10 Gbps traffic rate from one server is achieved.

Namely:

— Why does one have to abandon hardware RAIDs in favor of primitive JBOD solutions and how to

manage them?

— how could it happen that there is an event-driven epoll API for the swift network, but only a blocking

API for the slow disks, and what can be done with it?

— how can one effectively use expensive SSDs along with cheap HDDs? Different load profiles at sites

with different contents: somewhere it’s possible to seed the whole traffic from a mirrored SSD, and

somewhere the hot zone size exceeds 5 terabytes.

Approaches to solution of the choice problem: which content can be placed on a SSD, and which one

should be seeded further from the HDD?

How to build a program so that it never reports about the elapsed timeout and clearly diagnoses its own

restart?

Target Audience

Programmers, system administrators, video engineers, video project managers.

Maxim Lapshin

I’m engaged in video streaming for a long time.


Erlyvideo is one of few existing modern video streaming servers

Reliable Video (and not only) Distribution by

Simple Means

Evgeny Potapov (Itsumma.ru)

Abstracts

In my report I’ll tell how to build a reliable, fast and cheap distribution of static data (using of the system

we have built for carambatv.ru as an example) under conditions when you wish to make everything

cheaply, reliably and quickly (choose two of three!) — and simultaneously.

1. Video preparation: uploading, conversion and subsequent distribution.

1.1. How do we check integrity of the uploaded video?

1.2. Uploading to distribution servers: how we tried distribution via torrents.

1.3. Rationality of connecting clouds for video conversion.

2. Distribution: delivery of video to a user.

2.1. Cheap hosting sites for cost minimization.

2.2. Several hosting sites as a method to increase failure tolerance.

2.3. Balancing: why did we stop at DNS Round Robin. Advantages and problems.

3. Failover and scaling.

3.1. Traffic volume monitoring and prediction of distribution volume growth. Automatic shutdown of

servers that have “eaten” the traffic.

3.2. Peak time distribution: hybrid of hardware and cloud.

3.3. Last mile monitoring: “how to verify that the users don’t have lagging video?”

Target Audience

Managers, project developers interested in organization of data distribution system at large traffic

volumes and bandwidth.

Evgeny Alexeyevich Potapov

General Director of Summa IT.

Web development experience: 10 years.

Speaker at Highload++ 2010, RIT++ 2011, Highload++ 2012.


Itsumma.ru is a company engaged in technical support of web sites. The company staff is currently 30

persons. The sites we support are daily visited by 140 millions of unique visitors.

A Platform for Our Video Service Made in a Quarter Alexandr Tobol (Odnoklassniki)

Abstracts

My report won’t present any special technology or a magic algorithm. I’ll tell how it took slightly over a

quarter for a rather little team to re-launch a 24/7 not-so-small video service of Odnoklassniki

(“Classmates”) on a written-from-scratch platform deployed on over 200 servers distributed among

several data processing servers.

I’d like to tell about our successes and failures during the solution of the task to ensure uninterrupted

video loading, transformation, storage, distribution and monitoring, and also to dwell on the features

related to the load of 1000 views per second, with the size of the daily audience being 8 million of

geographically distributed users in the Russian Federation and abroad. I’ll also address some

technologies we have employed.

Target Audience

Java programmers, people who’d like to write their own video service, or have done it already.

Alexandr Alexandrovich Tobol

Engineer of Odnoklassniki project at Mail.Ru Group. He is working on video processing and information

storage for over 6 years.

Since 2012 he is in charge of software video platform that underlies the video service of Odnoklassniki

portal. Earlier he has worked for Zodiac Interactive as leading developer of multimedia service platform

of tru2way®interactive television. Prior to that, he has developed CAS (Content-addressable storage) of

an EMC Center with 99.9999% availability at EMC. He participated in ACM ICPC international students’

team contests in programming as a member of the team of St. Petersburg State Electrotechnical

University “LETI”.


We’re all Classmates (‘Odnoklassniki’) here!

Section: Search

Multi-Terabyte Sphinx HA Cluster

Vyacheslav Kryukov (Ivinco)

Abstracts

Ivinco, which I represent, is the company closely cooperating with the Sphinx team, and by our order

they have developed several features, including Sphinx HA, which will be the subject of the

presentation.

Synopsis

1. Background.

2. Architecture.

— Ordinary Sphinx cluster.

— Sphinx HA cluster, implementation options, their pros and cons.

— Cluster scaling, rebalancing.

3. Data processing:

— uploading indexable data into MySQL DB;

— indexing and index synchronization;

— index updating;

4. Ensuring performance and robustness:

— increasing throughput due to HA;

— control of performance and cluster’s critical parameters;

5. Perspective developments:

— index distribution system;

— resource consumption control;

— RLP.

Target Audience

System architects, developers of scalable high-load systems.

Vyacheslav Kryukov

I am developer at Ivinco, long time ago I have worked for SpyLOG.

Information about Speaker’s Company

Ivinco works in the field of consulting and development of scalable solutions based on Sphinx.

Full-text Search in the Mail.Ru Mail Sevice Dmitry Kalugin-Balashov (Mail.Ru Group)

Abstracts

Requirements to the search performed in mail differ from those usually imposed on “big” search

systems. This results from application of quite different, non-standard technological solutions. In my

report I will tell about the arrangement of full-text search in the mail hosted by Mail.Ru.

Major subjects we will touch upon are listed below.

– What is the difference from the “big” web search? Data is not stored in one large index. Every

mailbox has its own index: filling up the “large” index would effectively mean to keep it constantly in

memory, which is expensive. It’s also important to note that mail is a very private thing, so it would

be absolutely intolerable to allow “mixing” of the data (say, due a to developer’s fault). In addition,

the mail search imposes more serious requirements to completeness of the search output. It’s

unacceptable that even a single message gets lost (a bug concerning a message lost has always

blocking priority).

– It’s inexpedient to cache indexes. Users hardly ever make more than 2 mail search queries in a row.

Storing their index in memory would be expensive. For this reason, the index is arranged in such a

manner that the necessary information therefrom can be read from disk within a minimum

timeframe (2 to 3 disk reads, the entire index is never read, but only a small part of it).

– Tokenization, that is splitting of texts into separate words. We’ll review the algorithm and the

reasons to choose it.

– Index replenishment process with minimal costs and the reasons for index rebuild. The index must be

updated as soon as the changes occur. So frequent index changes result in increased disk loading

(one has to rewrite the index file completely, since the data stored therein are sorted). To avoid this,

we store the newly arrived data unsorted (as a transaction log), in which the sequential search is

performed. Adding information to that log is just one write operation at the end of file. The index is

rebuilt only when the search speed becomes lower than a specified threshold.

– We’ll review the very process of index search and subsequent operations on the received selection:

applying filters, ranking (unlike the “big” web search, ranking in mail search I trivial: it’s just sorting

by date), highlighting of found words in the output.

– Building the index of search suggests, justification of its optimality (maximum result output speed at

reasonable index size), and the very algorithm of obtaining search suggests selection.

– We’ll also discuss robustness and measuring search quality.

Target Audience

Developers of high-load systems.

Dmitry Andreyevich Kalugin-Balashov

Programmer, Mail.Ru Group.

Graduated from Siberian Federal University in 2008 (Department of Information Science and Computing,

specialization in Applied Mathematics); he completed his post-graduate studies at the same university in

2011.

He is employed at Mail.Ru since 2011 and engaged in development of full-text search in mail.


Mail.Ru Group is the largest internet company in the Russian language segment of the net and the

Russian internet’s leading player by the number of unique monthly visitors.

Sphinx 2013

Andrey Axenov (Sphinx Technologies Inc.)

Abstracts

Lemmatization, compound word splitting, JSON attributes, nested samples, fast search for phrases,

wildcard search, heap of new ranking signals, ALTER TABLE, OPTIMIZE INDEX, GROUP <N> BY extension,

optimization for extra-large schemes, SHOW PLAN, SHOW PROFILE and several dozens of new features

(but these just as a list!) Sphinx search server which we have completed and rolled out in 2013.

Target Audience

Anybody who is interested to know what’s new in Sphinx search engine.

Andrey Axenov

Author of Sphinx, open full-text search engine.


Sphinx Technologies Inc.

Section: System Administration

Performance Testing of DNS Servers

Andrey Leskin (QratorLabs/HLL)

Abstracts

Despite the fact that DNS is a service hardly noticed by many, it’s also a cornerstone of almost the whole

Internet. In the event of any problems with that system, users will lack their habitual ability to access the

sites they need. 2013 has shown how vulnerable this place is: There were DDoS attacks on DNS hosting

of companies, spoofing of entries for client redirection — and all that entails enormous losses.

In this connection, robustness and productivity of DNS servers is an important research subject

nowadays. By now, there are at least 20 DNS servers available in the market, to every taste: from simple

makeshifts to large systems that include a DNS server as one of its components. Yet a competent and

fine-tuned server configuration is a sensitive issue due to the fact that if settings are not quite correct, it

is not only possible to allow an intruder to edit real data, but also to become an involuntary participant

of a DDoS attack.

The report addresses several DNS servers from the standpoint of productivity, given identical initial

data. The appraisal is made not only in view of the best RPS, but also taking into account ability to

reduce server loading due to adjustment of respective settings. It also shows interesting examples of

incorrect server behavior in case of certain requests. The reviewed DNS services are: Knot-1.2, Knot-1.3,

NSD-3, NSD-4, pdnsd, PowerDNS, TinyDNS, Unbound, Yadifa. The choice of those servers is based on the

community opinion, source code quality and the number of installations.

Servers are compared from the standpoint of possible protection against known attack types: DNS

Cache Poisoning, DNS Amplification, etc. The main problem of DNS consists in working under UDP

protocol which warrants effectively nothing. Thus we have a choice, whether it’s generally worth to

respond to the incoming request (and if it is, then how and to what specifically) and how to do it as

quick as possible.

From the standpoint of DDoS problematic, DNS gives the intruder a chance to obtain an attack leverage

ranging from 30 to 60 (and even more). The DNS server may response to a small incoming packet with a

much larger one. And in connection with the target’s spoofed IP, we immediately have an opportunity

to create a network level attack, virtually at no cost.

In order to avoid exploitation of the protocol by adversaries, the community has suggested several types

of defense, each of which added some data to the response. Balancing on reduction of data amount due

to exclusion of checks and adding external limitations, you can get a clever server which will be most

suitable to the specified tasks.

Target Audience

It specialists engaged in DNS server administration, hosting providers, internet project owners.

Andrey Viktorovich Leskin

Developer at High Load Labs (QratorLabs/HLL).

Earlier he was engaged in remote sensing of Earth: receipt and processing of Earth remote probing data

from satellites. Software development for unpacking, analysis and visualization of received data.

Image processing specialist.


Protection against DDoS attacks, appraisal of project security, and consulting and development,

maintenance and support of information security of web projects with high loading.

What’s New in nginx?

Maxim Dunin (Nginx, Inc.)

Abstracts

As many have heard, nginx is a fast (some even say, a very fast) http server. Frequently, knowledge of it

ends here.

In fact, if you really need a fast server, take nginx, and any other options, even if they were available,

could be safely crossed out.

Is it really so? Or, probably, there are some subtleties?

There are subtleties. There’s a server that’s even faster and even more efficient—and always at your

service. And that’s… nginx! Just wipe off the dust^W^W… take the new version!

What’s new has appeared in nginx recently and what is it all for? The report addresses major new

features that appeared in nginx 1.1.x… 1.5.x.

1.1.x: cache loader improvements, support of elliptic curve cryptography (greetings to NSA!),

optimization of memory consumption by SSL connections, simultaneous application of multiple

limit_conn and limit_req, MP4 streaming “out of the box”, proxy_cache_lock, disable_symlinks for

shared hosting, and support of PCRE JIT for those who love regular expressions.

1.3.x: improvements in IPv6 support, least_conn upstream server balancer, support of ETag (and,

therefore, download resume in IE9 +), gunzip and ability to store compressed resources, OCSP Stapling,

SPDY, support of request body transfer as chunks, WebSocket proxying and ability to write pre-

compressed logs.

1.5.x (still counting): support of EPOLLRDHUP and use of O_PATH for disable_symlinks on Linux, multiple

error_log directives simultaneously, SMTP pipelining, still more SSL optimizations, proxy_ssl_protocols

and proxy_ssl_ciphers, auth request module, unbuffered work with FastCGI back ends

(fastcgi_buffering) for those who need response streaming.

Everything you always wanted to know about new functionality in nginx, but were afraid to ask!

Target Audience

System administrators, developers, architects.

Maxim Dunin

nginx developer.


Established in 2011, Nginx, Inc. is engaged in development and support of nginx server.

Comparing Distributed File Systems

Marian Marinov (1H Ltd.)

In this talk I will show performance comparisons between the following file systems:

GlusterFS;

XtremeFS;

FhgFS;

Ceph.

I also added that I only had servers with Gbit interfaces for comparison, so I could not measure their

performance on 10Gbit.

Target Audience

Developers looking for the best distributed file system for a future project.

Marian Marinov

Co-founder & CEO at 1H Ltd.


1H Ltd. has developed a set of software systems for automation and optimization of a web hosting

business by time.

Vagrant and Packer for Stable, Portable,

Multi-Cloud Environments

Mitchell Hashimoto (HashiCorp)

Abstracts

We now firmly live in a "cloud" world. Even with physical hardware, we're focusing on more quickly

deploying and imaging these machines. But installing software and managing our servers across multiple

environments (dev, test, staging, production) is stagnating, following practices established nearly a

decade ago. Tools like Vagrant and Packer are a new way to approach these problems.

Automation and an abundance of available compute resources demands a new approach towards

workflows around development, test, staging, and production environments. If you're not automating,

then you're losing out on huge potential, not just on speeding up your work, but making it significantly

easier as well.

How long does it take you to bring up a new web server? 2 minutes? 5 minutes? What if I told you can

do it in 15 seconds without any real downsides? Packer makes this easy. How similar are your

development and production environments? Are you using the same ops scripts to setup both? Does it

run on Mac, Windows, and Linux? With Vagrant, this is easy.

In this talk, I'll show how Vagrant and Packer can be used to produce stable, portable environments for

development, test, staging, and production. These images will run in multiple "clouds" such as

VirtualBox, VMware, AWS, and even physical hardware.

Target Audience

Mitchell Hashimoto

Founder at HashiCorp. Creator of Vagrant.


HashiCorp.

Statistics in Practice to Search Anomalies in

Load Testing and Production

Anton Lebedevich

Abstracts

Modern monitoring systems allow collecting performance data and can send notifications if indicator

values run beyond predetermined thresholds. Breakdown or unexpected change in behavior does not

necessary lead to threshold overrun, but a human sees such a change on graphs. Yet a person is unable

to constantly monitor thousands of graphs, so it’s about automation of that process.

For load testing, it was possible to pick out some mathematical methods which help to find out what

and when has broken during the test.

Analyzing behavior of a system with real users is much more complicated. There are some popular

methods that have several deficiencies. There are rather new methods which produce good results, but

have limited application range.

For both cases (testing and production), sometimes it is useful not only to know what and when has

broken, but also to find internal dependences in the system, therefore the report includes a feature

story on correlation and metric clustering methods.

We’ll review Kale stack (skyline, oculus) from Etsy as an implementation example, it’s an open source

instrument to identify and correlate anomalies.

Target Audience

System administrators, load testing specialists, developers.

Anton Lebedevich

Currently I’m working on a log management service. I was engaged in distributed high-load systems in

stock trading, hh.ru, Video and Music projects at yandex.ru, and web development.


Independent consultant in distributed system development and performance optimization.

DDoS Attacks in Russia in 2013

Alexandr Lyamin (QratorLabs/HLL)

Abstracts

We will sum up the results of the first three quarters of 2013.

We will tell according to which parameters this year turned out to be really outstanding.

We will consider popular methods of DDoS attacks realization, principle of its work and possible

strategies of counteraction.

We will talk about how TO MAKE THE WORLD BETTER AND KINDER.

We will take a look into the future.

Target Audience

DevOps

Alexandr Lyamin General Director of QratorLabs


QratorLabs/HLL is currently the only one company in Russia which is specializing in development of

effective solutions for DDoS attacks counteraction. Analyzing the properties of realization, used

technologies and ways of attack realization, growth and development dynamics, our specialists improve

filtration algorithms providing our customers with reliable defence from DDoS attacks from 2009.

SmartOS/Solaris Technologies for System

Tuning of Applications

Sergey Zhitinsky, Alexandr Chistyakov (Git in Sky)

Abstracts

1) Services problems are still the same – lags in unknown places, failures, recovery time.

2) How SmartOS/Solaris technologies can help to solve these problems?

3) ZFS – file system of the future. Brief overview of key features – automatic control of integrity,

deduplication, snapshots, file transfer.

4) ZFS snapshot: how to make quick snapshots on the base of copy-on-write?

5) ZFS send/recv: how to find a dataset quickly in a private cloud or even in a public cloud.

6) A couple of undocumented facts about ZFS from the personal experience.

7) DTrace – tools of debugging an application, drivers, a core – finding where the trouble is!

8) DTrace – example 1. Why did the database lag? Debugging functions in disk driver.

9) DTrace – example 2. Not clear which error code the core function has returned – what’s the matter?

Dealing with it.

10) Briefly about Solaris Zones – virtualization technologies in SmartOS.

11) Our experience in solving problems. Effectiveness of implementations.

12) Several examples of hosting fast growing American startups on SmartOS in Joyent - LinkedIn, Voxer,

Digital Chocolate, GoldFire.

Target Audience

System engineers, system administrators, technical directors and internet business owners.

Sergey Alexandrovich Zhitinsky

I am 45, in 1993 I created the first business in the Internet – an Internet provider. My colleagues and I

were always interested in advanced technologies. In 1996 we were among the first to develop a large

web project – the website www.spb.ru.

Later I took part in ROL projects, made one of the first browser games ‘Vladyki Pentakora’, joined to the

project ‘Bobrdobr’ and brought it up to the level 100 000 uniques per day.

I am interested in OpenSolaris/Illumos/SmartOS stack technologies, because I see a lot of advantages

they give to businessmen for developing new interesting products.

My co-speaker and colleague - Alexandr Chistyakov.


Git in Sky is a new company that is specialized in system tuning of websites under Linux/Unix, it has its

own cloud under SmartOS for hosting customer services and backups, it offers products for startups.

The company hosts hangouts of Devops in Saint Petersburg and takes part in Moscow ones, together

with Express42.

Our engineers have large experience in system engineering and are regularly communicating with

colleagues.

Section: Testing

Piling to One Request: Vulnerability of Web

Applications, Leading to DoS

Ivan Novikov (ONsec)

Abstracts

The problem of fault tolerance of web applications is often considered in the context of DDoS

distributed service attacks. Recently a trend of botnet number decrease and attack efficiency to one

request increase is observed. The aim of this work is a demonstration of examples with logical and other

errors in code or web applications that cause service failure by sending only one or several requests. In

report the examples of real vulnerabilities from information security audits from 2009 to 2013 are given.

All described vulnerabilities are relevant to web application availability and/or its components,

infrastructure parts and they can be classified into four groups: vulnerabilities of application work logic,

architecture or administration vulnerabilities, vulnerabilities of data format processing, classical

vulnerabilities. The reasons of vulnerability appearance and ways of overcoming them,

recommendations on eliminating and results of work with customers from our experience will be

discussed.

Target Audience

Web applications developers, system administrators.

Ivan Novikov

Founder, leader and chief expert of ONsec company. From 2004 he has been doing research in the

domain of web applications security. He is the author of many research of web applications security.

Awarded by Google for vulnerability identifying of Chrome browser, by Yandex for winning a

competition ‘A month of vulnerability search’, by Trustwave for achievements in ModSecurity SQLi

Challenge, by ‘1C Bitriks’ for winning a competition on proactive security circumventing. Now he is

actively working on developing self-learning systems for detection attacks on web applications and on

heuristic analysis.


ONsec is performing security audits (safety assessment) of web applications of any complexity: websites,

site management systems, social media, portals, internet banks etc. Due to the narrow specialization on

only web applications analysis it is possible to achieve the highest level of work quality and provide

unique services, in particular to analyze large volumes of source code.

The advantage of ONsec is a young and highly qualified team that has in experience a lot of

vulnerabilities found in products of such famous companies as Google, Adobe, Yandex, Opera, 1С-

Bitriks. The company provides regular research works that are assessed by famous foreign experts in the

domain of informational security.

A/B Testing Architecture: Do It Yourself

Sergey Averin (Badoo)

Abstract

Badoo is a large social media with more than 190 million users. In our company we firstly assess a bigger

part of the new functional by A/B testing. Almost a year we are using our own heavily loaded

framework, and in my opinion it is very simple, understandable and does not require huge resources for

developing and support. In my report I will tell you why we came to our own solution, I will explain its

architecture and principles of work. I am sure that each of you is able to make something similar to your

project and begin to make more reasonable decisions.

Theses

- How did we test earlier?

- Why did we make our own instrument?

- Architecture: API, graphical interface, transport, scripts, database.

- Test structure.

- Main rules of A/B testing.

- Results assessment, examples of reports.

- Final part – about the fact that we are not able to entirely eliminate a human with a head.

Difficulty

Despite the conference is called Highload++, I assure you that the architecture described here is suitable

even for a project with attendance of 1000 people per day and three programmers in staff. To realize

everything described here in PHP is possible during less than one week taking into account the work of

one person. By the way, the result can be measured, it is expressed in real income.

Target Audience

The report is intended for developers and technical managers of social media, ads websites, blogs with

distribution, projects with sale by email distribution, different community websites, banks and any

projects where cooperation with each customer is planned as long-term.

Sergey Averin

I manage different projects in Badoo company, during last year I have been closely working on A/B

testing. As a hobby I make funny reports.

70 years in the Internet, I have developed small and big websites. I have taken part in creating the

following projects: habrahabr.ru, dirty.ru, leprosorium.ru, autokadabra.ru, dribbler.ru, trendclub.ru, and

in several unnamed startups.

All previous reports at http://www.slideshare.net/rybaxek.

Also I have accounts at Moi Krug (http://averin.moikrug.ru/), Facebook

(http://www.facebook.com/ryba.xek) and Twitter (http://twitter.com/ryba_xek).


Badoo is not only the biggest company, but also one of the most progressive, innovative and high-tech

companies in social media industry, that is in 150 largest world projects. It has approximately 193 million

users and more than 150 000 new users join it every day.

Changing the Process of Development and

Testing. Main Problems

Vladislav Chernov (Badoo)

Abstracts

In this report I am speaking about main problems of changing the process of development and testing in

big or small teams of development and testing. Main errors at changeover to the continuous integration

scheme will be considered. The whole process from the developing scheme selection in version control

system (several scheme variants) and deployment automation at production servers will be described.

The content of the report is the following.

1) Do you need to change business processes in your company?

2) The human factor as a main problem in changing the process.

3) Basic schemes of software development in the version control system.

Advantages and disadvantages.

4) Main steps of changeover to a new workflow. Where an error is unacceptable?

5) Integration of auxiliary systems and process automation. Main problems.

6) Continuous integration and Continuous delivery. Time and business are against the team of

development and testing.

7) Why to change anything in spite of all problems?

In this report I am going to consider both main technical problems and problems of staff motivation on

the example of Badoo company.

Target Audience

The report is intended for a wide audience.

Vladislav Chernov

Release engineer


Badoo is the largest and the fast growing social media for meeting new interesting people. Badoo unites

more than 170 million users in 180 world countries.

According to Alexa.com Badoo website is on the 137th place in the world rating of the most popular

websites. It is in the most visited websites world rating Google TOP 1000 at the 59th place with

attendance of 46 million unique users per month. Every day more than 150 000 new users are

registering in Badoo, more than 3 million pictures are downloading, more than 50 million messages are

sending.

Badoo is a technically difficult, highly loaded project. Stable work of the project is provided by 2000

servers located in two geographically remote data centers (Miami, Prague). Daily the dynamic load at

backends during peak hours is more than 40 000 requests per second. Several billion events are

downloaded into analytical Badoo systems per day.

Several Badoo internal developments were opened under free licenses, the most popular products are

FCGI messenger for PHP (php-fpm), Pinba server for the statistics collection in real time, fast templates

Blitz.

Now Badoo has two main offices: in London and in Moscow, where more than 200 employees are

working. London office is responsible for the business growth and all conceptual solutions on products,

while 90 percent of Moscow office employees are putting these ideas into life developing them.

Section: Architecture - HighLoad++ · 2013-10-27 · Monitoring: Zabbix API, Pinba + API, Graylog...

Documents

Transcript of Section: Architecture - HighLoad++ · 2013-10-27 · Monitoring: Zabbix API, Pinba + API, Graylog...