Techno Arms Dealers and High Frequency Traders

12
Techno Arms Dealers & High Frequency Traders Today represents the hottest time to be in financial markets – nanosecond response times, the ability to affect global markets in real time, and lucrative spot deals in dark pools being all the rage. For companies who do business in these times, it is a technical arms race, worthy of a Reagan era analogy. With High-Frequency Trading firms locked into an effective “Space Race”, the challenges for these firms are now far reaching, extending beyond traditional regulatory, compliance, and government boundaries. With a need to ensure that regulatory requirements are met, serious fines for non compliance and even enforceable undertakings by 3rd parties to halt trading activities on markets are still outweighed by the potential upside for combatant firms playing in the race. Increasingly, the most marginal of technical errors can spell doom for market participants. In a market where risk is a prime

Transcript of Techno Arms Dealers and High Frequency Traders

Page 1: Techno Arms Dealers and High Frequency Traders

Techno Arms Dealers & High Frequency Traders

Today represents the hottest time to be in financial markets – nanosecond response

times, the ability to affect global markets in real time, and lucrative spot deals in dark

pools being all the rage. For companies who do business in these times, it is a technical

arms race, worthy of a Reagan era analogy.

With High-Frequency Trading firms locked into an effective “Space Race”, the

challenges for these firms are now far reaching, extending beyond traditional regulatory,

compliance, and government boundaries.

With a need to ensure that regulatory requirements are met, serious fines for non

compliance and even enforceable undertakings by 3rd parties to halt trading activities on

markets are still outweighed by the potential upside for combatant firms playing in the

race.

Increasingly, the most marginal of technical errors can spell doom for market

participants. In a market where risk is a prime occurrence and measured often

in millions of dollars, glitches are a regular occurrence, resulting in lost revenue,

disappointed customers, and the fast destruction of once high-profile market leaders.

Page 2: Techno Arms Dealers and High Frequency Traders

Recently, this was brought to the public’s awareness, with the spectacular failure of

Knight Capital: in August of 2012, erroneous trades were sent to the New York Stock

Exchange, leading to the obliteration of nearly 60% of the firms value in under 1 hour.

The firm’s catastrophe has forced an attitude change among investors and corporate

technology leadership, with a focus on compliance controls and board level

accountability. Tiny lapses in controls are expensive mistakes, leading to the disruption

of markets, in conjunction with the immense losses and liability suits that often trail such

events, the stakes are higher than ever to develop software in a controlled way and get it

to market in the shortest time possible.

With regulatory changes imminent, the need for clearer, actionable reporting at all levels

of technology organizations require a clearer approach than the traditional ones taken in

the past.

The Landscape of Failure:

In the last 2 years alone, there have been numerous incidents of technology

misconfiguration that led markets awry. Institutional investors aside, the mechanisms

that govern software development for brokerage firms and markets have far-reaching

and damaging consequences. From ill-prepared recovery protocols to poorly governed

front, back, and middle offices; there are several noteworthy incidents in recent times

that have led to greater scrutiny for trading companies.

November 2012 – NYSE/Eurodex

A newly implemented market matching engine UTP (Universal Trading Platform, the

core trading platform employed by the NYSE) caused a day-long disruption and forced

the Big Board operator to establish closing prices for more than 200 stocks using a

fallback to it’s old system Super Display Book (sDBK). Trading never resumed during the

day for the 216 stocks affected, and the exchange determined the official closing price

for each of the affected securities based on a consolidated reading of last-sale prices,

instead of an auction system used to close stocks, manual intervention was required to

revalidate positions for venues and participants.

Page 3: Techno Arms Dealers and High Frequency Traders

Overview of the Root Cause: Poor Testing/Quality Assurance/Release Management

Failure.

2007 – 2010 London Stock Exchange (LSE) Multiple Outages & the Move to Linux

Over the course of a 4 year period, the London Stock Exchange began to earn a

reputation as the most unreliable exchange in the market. Multiple outages and multiple

technology problems all led to a raft of technology errors, which were manifested in

regular outages. In fact, the LSE had to ultimately change it’s entire operating stack to a

new platform and institute a raft of new mature processes to achieve the kind of

reliability they needed.

LSE Migration to Linux

August 2012 Knight Capital:

In the span of 45 minutes, a little over four hundred million dollars was lost when an

algorithmic trading program designed for testing environments was released to their

production environment market. The blunder led to a seventy five percent dip in the

stock price in a 30 minute period before attempts to salvage the situation could be

initiated. The error entailed HFT (High-Frequency Trading) of up to 140 stocks, and is

just the latest in a string of such errors.

The Root Cause: Poor Configuration Management, Inconsistent Testing Approach, Poor

Release Management

But How?

Most brokerages apply several layers of risk mitigation when developing and deploying

software. I’ll give a high level overview (below) of a traditional approach in another post

(I won’t go into the details of settlement, vetting, market matching etc). Trading firms are

complex beasts, with multiple market participants, multiple exchanges and a plethora of

investment instruments to use, and going into detail on the actual technologies detracts

from the message. What is apparent is that the process life-cycles, which are used to

achieve releases, are governed by mechanisms from a different time and place, with

Page 4: Techno Arms Dealers and High Frequency Traders

varying inconsistent controls not designed for rapid release schedules, leaving gaps in

organizational capabilities that are open to failure.

The “New Old Ways” to Manage These Problems:

Typically Application Lifecycle Management (ALM), a recent play, is a means of ensuring

that software remains relevant. A vital aspect of the Software Development Life-Cycle

(SDLC), ALM is an integral part of ensuring that the firms overcome challenges to

developing top-notch software at a fast pace. The new wave premise of ALM, follows a

design, build, run mentality, and pushes the paradigm to encompass all activities in the

development cycle under one roof, whereas previous approaches followed often

different approaches with best-of-breed solutions.

The benefits of this, with regard to trading systems, are clear. Greater visibility and

consistency between tools implies more fixes to bugs, and ultimately fewer glitches. The

unfortunate reality is that underlying configurations are not still maintained well in this

approach, and unfortunately would not have been necessarily caught with traditional

ALM technology vendors.

ITIL is a widely accepted approach to IT service management in these organizations. An

ITIL enabled process centrally focusses on what is called a Configuration Management

Database (CMDB); which contains all information pertaining to an information system. It

helps the organization identify and comprehend the relationship between system level

components and applications, and it is designed to track relationships between

technology services and at a micro level, items called CI’s (Configuration Items). This

process is known as configuration management, but as this typically lives in the

operational part of the equation (Application Support, Infrastructure Operations &

Service Management), the process usually only gets invoked at a high level in the pre-

production environments. There is another discipline called Software Configuration

Management which has applicable components in ITIL and ALM, however the tools and

processes rarely meet, as the distinction between the disciplines are very much either

software or infrastructure orientated.

The conceptual CMDB enables controlling and specification of configuration items in a

systematic and detailed manner, reducing configuration drift. As mentioned previously,

Page 5: Techno Arms Dealers and High Frequency Traders

problems with this approach manifest in the ITIL world, as the CMDB typically does not

converge with the version control repositories in the development life-cycle, and more

often than not are actually not version controlled themselves – leaving further

inconsistencies.

Okay Okay We Get That, So What Went Wrong at Knight?

Basically, Knight accidentally released simulation software they used to verify their

market-making software functioned properly, into NYSE’s live system.

Within Knight Capital’s development environments lived a software program called “a

market simulator”, designed to send spread patterns of buy and sell orders to its

counterpart market matching software, called RLP in this case. The trade executions are

recorded and were potentially used for performance validation prior to new releases of

the market matching software. This is probably how they could stress test how well their

new market-making software worked under load before deploying to the live system

connected to the NYSE live system.

Prior to August the 1st, a number of teams progressively would have migrated software

between environments for release into the “live environment”. Potentially, a manual

process was caught in the deployment, and pushed a copy of the simulation software

into the “live”. As you can see, most companies do not employ baseline configuration

tests in the later environment stages, thus (probably at a later stage in the process),

someone opted to add the program to the release package and deployed it.

This is exacerbated in large teams, and is simply an overhang of the fact that typically no

one team owns the configuration state, of both the Applications & the Operating

Systems/Platform that they run on, the closest team is usually the systems

administration team, but as they have a production environment to manage, these

“lesser” environments get sidelined with more important problems to deal with.

Combined with the fact that there are very few tools that actually focus on the

configuration testing aspects and people use collections of scripts or home-brew

solutions, it is easy to see where this went wrong.

Page 6: Techno Arms Dealers and High Frequency Traders

The lack of a well-defined configuration baseline and set of configuration tests including

differences between the environments is the likely cause (well, from an outsider’s

perspective) of the problem.

On the morning of August 1st, the release was successfully deployed and the simulator

inadvertently bundled with the release was ready to do its job: execute market-making

software.

This time however, it was no longer in one of the test environments, it was actually

executing live trades on the market, with real orders and real dollars.

For stocks where Knight was the only one running market-making software as a RLP,

and the simulator was the only algo trading that crossed the bid/ask spread, then we

saw consistent buy and sell patterns of trade executions, all marked regular, all from the

NYSE, and all occurring at prices just above the bid or just below the ask.

Examples include EXC and NOK, and you can see these patterns in charts here. The

simulator was functioning just as it did in the test environments, and Knight’s market

making software was intercepting these orders and executing them. Knight’s net loss is

minor on simple volumes, on this day however, the problem was compounded, as the

software was operating , but they were generating a lot of wash sales.

For stocks where Knight was not the only market-maker, or when there was other

algorithmic trading software actively trading (and crossing the bid/ask spread), then

some, or all of the orders sent by the simulator were executed by someone other than

Knight, and Knight now had a position in the stock. Meaning it could have been making

or losing money. The patterns generated for these stocks depended greatly on the

activity of the other players.

Because the simulator was buying indiscriminately at the asking and selling of the bid,

and because the bid/ask spreads were very wide during the open, we now understand

why many stocks moved violently at that time. The simulator was simply hitting the bid or

offer, and the side it hit first determined whether the stock opened sharply up or down.

Page 7: Techno Arms Dealers and High Frequency Traders

Since the simulator didn’t think it was dealing with real dollars, it didn’t have to keep

track of its net position. Its job was to send buy and sell orders in waves across pre-

defined positions.

This explains why Knight didn’t know right away that it was losing a lot of money.

They didn’t even know the simulator was running.

When they realized they had a problem, the first likely suspect was likely the new

market-making software. We think the two periods of time when there was a sudden

drop in trading (9:48 and 9:52 AM), are when they restarted the system. Once it came

back, the simulator, being part of the package, fired up and continued trading positions.

Finally, just moments before a news release at 10 AM, someone found and killed the

simulator.

We can fully appreciate the nightmare their team must have experienced that morning, a

lack of visibility, inconsistent sources of what was actually running in production, and

poor visibility over the successful release.

Regulated Controls Against Flash Crashes

Like those that came before it, Knight Capital was once THE retail market-maker in the

US; its reputation has now been irreparably damaged. It’s prudent to note that the error

was vastly avoidable, had the relevant controls been put in place.

Several factors played into this scenario, namely:

- Poor configuration management,

- A set loose controls around the release management process within the firm,

- A lack of visibility into the makeup of the changes that were being introduced into the

market.

- An inability to isolate the configurations that we deployed

- A lack of configuration testing

Page 8: Techno Arms Dealers and High Frequency Traders

- A lack of operational acceptance testing

Automated Governance is the Way Forward

DevOps, a recent answer to the challenges of collaboration across release cycle,

stresses the seamless integration of software development and collaboration between IT

teams, with a view towards enabling a rapid rollout of products via automated release

mechanisms. It recognizes the existing gap between activities considered as part of

development life-cycle, and those characterized as operational activities. Historically, the

separation of development and operations has manifested itself as a form of conflict, as

can be clearly seen by the sheer amount of frameworks developed to address the

problem, which ultimately predisposes entire systems to errors.

What’s currently lacking in each approach is a mechanism to gather systems knowledge

in environments where skills and capabilities between teams varies significantly.

For orchestration and deployment Puppet, Chef, Bladelogic and Electric Cloud go a long

way towards improving upon the existing configuration components of ALM models, but

often neglect the interaction with ITIL. Puppet has been making strides in recent months

with integrations into tools of this nature. Yet, the existing suites of tools require specific

knowledge of declarative domain-specific languages to enable a user to describe system

resources and their state. In the case of Puppet, discovery of system information and

compilation into a usable format is possible, but is a daunting task to a novice user in

these fast paced corporate environments.

Over time, heavily regulated environments, governed by strict auditing requirements,

combined with a validation mechanism that can clearly be maintained and usable by

then varying capability levels of an organization must be put in place to ensure that

configuration drift between environments is caught early and reported back.

Increasingly smart automations will be deployed, which will ensure state is forcefully

maintained by testing, recording, and auto-provisioned safely. This is a unique means of

peer-based systems configuration and a measure of prevention before configuration

errors affect running systems,that very few companies are experimenting with (aka

configuration-aware systems).

Page 9: Techno Arms Dealers and High Frequency Traders

Our own tool, ScriptRock, complements the existing workflow tools and offers the

simplest way for Developers/Configuration Managers & Systems Administrators gain

realtime validation of configuration state to great effect. It enables the creation and

running of configuration tests, collaborative configurations for teams, and a robust

community option in coming months, as well as the creation of detailed documents that

act as reports to satisfy audit standards. Applying ScriptRock to these environments

ensures fast process maturity for developing seamless system configurations and

requires no new syntax introductions or code; everything is available as a version

controlled test that can be executed under strict security contexts on the target system.

Governing Bodies

The Knight crisis is not an isolated event. However, it has been looked at as a rallying

call for greater visibility into the processes and compliance measures implemented

within trading participants.

With the increasing complexity of trading algorithms, which are the backbone of trading

procedures, the necessity of controls to govern these technology organizations is

becoming more apparent each day.

Mary Shapiro, the outgoing Chair of the SEC, called for a review of the SEC’s

automation review policies, which were put in place with exchanges after the 1987

market crash, that require venues to notify the regulator of trading failures or security

lapses. Portions of those policies will serve as the basis for the new rules.

The implementation of a powerful trading platform rests on many pillars. Their

remarkable effectiveness has led to the reliance of historically legacy solutions to deal

with the rapid release schedules that firms now face to stay recognized as leading

systems. This comes at a cost, as this increased pressure to deliver innovation has

opened up these systems, and more importantly, the processes and tools that govern

them to exposure and the risk of failure if glitches occur. As a result, the concepts

outlined in DevOps are clearly necessitated in order to continue delivering key features

and key components of financial markets, proper execution will help avert crises such as

the Knight fiasco in future.

Page 10: Techno Arms Dealers and High Frequency Traders

The comprehension and adoption of the various frameworks, the integration of IT

Automation, and clear governance of development and operational environments will go

a long way into ensuring that a fiasco such as the Knight crisis remains solely as a

problem of the past, never to be replicated. Unfortunately, we still have a long way to go

in this journey.