NoSQL in Financial Industry - Pierre Bittner

18
SCALED RISK Next Generation of Financial Platform NoSQL in Financial Industry Distributed Matters - Barcelona – 21 November 2015 Pierre Bittner - CTO SCALED RISK

Transcript of NoSQL in Financial Industry - Pierre Bittner

Page 1: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

Next Generation of Financial Platform

NoSQL in Financial IndustryDistributed Matters - Barcelona – 21 November 2015

Pierre Bittner - CTO

SCALED RISK

Page 2: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK 2

Integrated Big Data & Analytics Platform

SaaS or On-Premise

Hadoop/HBase + Low latency + External Consistency+ Flexible Data Schema + In-Memory OLAP

WHAT?

HOW?

FOR? Real-Time Risk Management

WHERE?

What is Scaled Risk?

Page 3: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK 3

Why NoSQL Matters in Financial Industry?• Volume / Velocity

§ New York Stock Exchange generates about 4−5 terabytes of data per day.§ Algo Trading, High Frequency Trading: In 2012, accounted for 50% of all US equity trading volume.

Trade execution milli- and even microseconds.

ġ

E

Y

G

• Coherency / Availability / Security§ Regulatory Report: Intraday Monitoring,§ MTTR, Data Spikes on Market Event, Disaster Recovery, ACL

• Mixed workloads: Streaming and Historical Analysis – Point In Time Comparison§ BackTesting, Replay (UTC Timestamping of all events, FIFO)§ Lambda-architecture, Kappa-architecture

• Needs for Multi-tenancy / Data & Process Governance (Data Lake / Data Centric Arch.)

Page 4: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK 4

G

Y

E

Real-Time Enterprise-Wide Risk ManagementImproved and trustable view of global risk and support implementation of next regulations

Real-TimeFraud DetectionPre-check, real-time and historical data verification for trades, payments, orders, …

Real-Time Market Analytics

On-demand live and historicaldata analysis on global market

Why NoSQL Matters in Financial Industry?

Customer Story: Market Exchange

Market Surveillance

Page 5: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

OTC Market• Foreign Exchange• Debt Market (Bond)• Commodities• Bloomberg, FXAll,…• …

Regulated Market• Securities• Options• NYSE, Eurex• LSE• …

Structured Data Feeds

Booking Systems• Trader Positions• Intraday Events• Valorization• Volatility, Correlation

Referential Data• Counterparts• Analytical Structure• Products Definition• Mappings

Unstructured Data FeedsNews & Mkt Analysis• Reuters, BBG• Research

Social Media• Twitter• LinkedIn, …

Trading• Global Positions• Intraday funding & forecasting• Collateral Optimization

RT Aggregated Positions

Sales• Credit Line• Profitability Indicator• Customer Interests

Global• Market Flows• Analyst/Market Correlation

On-Demand Analysis

Market Risk Analysis• Stress per Counterparty

Sales• Customer alerts on

Market Trends• Recommendation &

Lead Generation

Live Report & Alerts• On Market Events• Custom scenario• Market Surveillance

5

Today’s Trading Challenge: On-Demand Live Analysis & Alerts

Risk• CVA, Counterparty Exposure• Limit, Stress Test

Intraday Limit Risk• Automatic Monitoring

Page 6: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK 6

ContextExtreme performance and resilience : Peak activity > 1M order p. second Low Latency

ObjectiveOn-demand market analytics out of real-time & historical dataResilient primary storage

ProblemsHigh volumes, difficult access to historySLAs for data & service availability

Customer Story: On Demand Market Surveillance for Exchange

SolutionScaled Risk at the outflow of the matching engine

BenefitsStreamline process, consistent viewHigh availability and scalabilityReduced TCO

lResultA single system for storage and computation of spot & historical data for market surveillance

Page 7: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK7

On Demand Market Surveillance: Pilot PerimeterHigh Level Architecture Candidate ģ

Page 8: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK8

On Demand Market Surveillance: Pilot Perimeter

Focus on evaluating HBase frameworkåØ HBase performance on Read/WriteØ HBase behavior during a node failureØ HBase process isolationØ Global consistency

Key parts of the architectureåØ Message Bus (Kafka)Ø Storage System (HDFS)Ø Operational Database (HBase)Ø Real-Time Analytics tool (Scaled Risk)Ø History & Data Analytics tool (SR & Spark)

Benefits of architecture (streamline process, cost, …) not covered in this step.

Confirm Hadoop/HBase technical Stackå

Evaluate Scaled Risk performanceå

Explore Scaled Risk featureså

Pilot PerimeterSuitability of HBase and Scaled Risk in term of properties and performance.

Pilot duration : 2 months

Page 9: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK9

HBase: Random Access to your Planet-Wide Data

Key-value data organization per row. Table is a namespace.å

Each cell is timestampedå

ACID per row; Rowkey for fast access and data distribution å

HBase in few wordsHBase is an open-source, distributed, versioned, non-relational, scalable, wide-column data store.Ø It is the Hadoop database, leveraging mainly on HDFS.Ø Based on Google BigTable storage system.

Four primary operations are Get, Put, Delete and Scan å

Server-side operations with Coprocessor (Observer, Endpoint)å

Linear scalability, automatic sharding and failover supportå

Strictly consistentå

Hadoop ecostem integration (YARN), MapReduce, Hive, Sparkå

Phoenix for SQL Flavorå

Page 10: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

NoSQL Wide Column Store Real-Time Distributed OLAP

• Dynamic Data Schema• Schema on read and write• Fast, Random R/W access

• Fast In-Memory Data Processing• Full Consistency; Linear Scalability• Open API (Valuation)

On-Demand Market Surveillance: Functional Architecture

10

Low

Lat

ency

Inte

rnal

Bus

Read

-Iso

latio

ns

As Of DateHBase As Storage

Inje

ctor

(Thr

ift)

• Advanced Index and search for Data Classification and Correlation

• Semantic reconciliation

Real Time Indexing

Real-Time Alerting

0

1

2

3

4

5

6

Contrat 1 Contrat 2 Contrat 3 Contrat 4

Alert on Analytics

Volume Matching Cancel Rate

Alert on Data

REST

/ A

PI /

Web

Soc

ket

Page 11: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

On-Demand Market surveillance: Technical Architecture

11

Head NodeName Node

Head NodeSecondary Name Node

Head NodeHbase Master

Worker NodeRegion Server

Data Node

Worker NodeRegion Server

Data Node

Worker NodeRegion Server

Data Node

Worker NodeRegion Server

Data Node

Worker NodeRegion Server

Data Node

Worker NodeRegion Server

Data Node

HP Loader

3 x Hadoop Head nodes: HP ProLiant DL360 Gen9 Server8x 900GB 10k rpm SAS, 128 GB RAM, 2 x (10 cores) Intel Xeon CPU E5-2660 v3 @ 2.60GHz, 4 x 1GbE ports and 2 x 10GbE ports

6 x Hadoop worker nodes: HP ProLiant DL380 Gen9 Server2 x 120GB SSD OS,15 x 3TB 7.2k rpm SATA, 128 GB RAM, 2 x (10 cores) Intel Xeon CPU E5-2660 v3 @ 2.60GHz, 4 x 1GbE ports and 2 x 10GbE ports1 x HP Smart HBA H240ar, 1 x HP Smart HBA H240

1 x HP Loader:HP ProLiant DL380 Gen814 x 1TB 7.2k rpm SAS, 128 GB RAM, 2 x (10 cores) Intel Xeon CPU E5-2670 v2 @ 2.50GHz

Cluster size and componentsHadoop cluster details :• Hadoop HDFS usable size : 60TB (Block replication 3, no compression)• Hadoop HDFS data disk RAW size : 241TB• Hadoop cluster memory : 6 x 128GB = 768GB

Hadoop components and associated services• Hadoop Distribution : HortonWorks HDP 2.2 Stack• Cluster management : HP Insight CMU v7.3• Hdfs v2.6.0• Hbase v0.98.4• Zookeeper v3.4.6

Other details :• OS : RHEL - RedHat Enterprise Linux v6.5 – 64bit• Linux filesystem for Hadoop data : ext4• JVM used for Hadoop : Oracle Java 1.7.0_67

Page 12: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK 12

On Demand Market Surveillance : Functional Consistency• Market Exchange Data types

§ A unique Data flow containing all types of message§ Order messages§ Trade messages§ Test injector generates 1,5m in 7’ (client limitation)

E

Y

G• Scaled Risk Data exhaustiveness control

§ Dynamic data model with two tables§ Trade and Order messages are split§ Test method: Messages count

Message Type Message sub type Count

Order

New 792,546Replace 645,889Status 40,821Others 80

(unique order ids)792,886

Cancel n/a 680,626Trade n/a 137,573

• Order and Trade Life-cycle Control§ Message fields consistency control§ Test method: Data sampling

Message Type CountOrder Table 792,886Trade Table 137,573

Order Id Trader Contract Qty Price SideA 6C9 JFFCE150500000F 1 49350 BuyB W90 JFFCE150500000F 2 49350 SellC MAT JFFCE150500000F 1 49345 Buy

Trade Id Trader Contract Qty Price Side1630 6C9 JFFCE150500000F 1 49350 Buy1630 W90 JFFCE150500000F 1 49350 Sell1631 MAT JFFCE150500000F 1 49345 Buy1631 W90 JFFCE150500000F 1 49345 Sell

Page 13: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK 13

On Demand Market Surveillance : Performance Indicators

Sender/Trade (per region)• 130 K trades per second• 800 K on cluster

Test Scenario• 7 minutes• 1,479,335 messages• Stats only on Order Table

End-to-end• Nominal Latency ~200ms• 90% of messages with <412ms

Page 14: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

On Demand Market Surveillance : Fault Tolerance TestHBase is designed to be fault tolerant. • A node fails when the white stripe appears on the

whole width of the graph.

• All nodes are impacted by the failure, and not only thekilled node (as expected remember CP).

• Another white rectangle is displayed before the nodefailure.

• It represents all the messages that have beencorrectly inserted before the failure, but never flushedto disk.

• Because the WAL is deactivated by trade injector(option), those messages were lost when regions weremoved from the killed node to other nodes.

X axis is the rowkey prefix, to show the distribution of insertion on the cluster. The Y axis is the time.Points displayed over the entire width of the X axis means that the distribution is correct.

Page 15: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

On Demand Market Surveillance : Fault Tolerance Test

X axis is the rowkey prefix, to show the distribution of insertion on the cluster. The Y axis is the time.Points displayed over the entire width of the X axis means that the distribution is correct.

A second test confirms that HBase remains available even if a node fails.Test consists in inserting data in HBase from both YCSB and trade injector clients.• YCSB inserts data in a table distributed on 5 nodes• Trade injector inserts data in a table distributed on 4 nodes.• The node killed does not impact trades injection.

Page 16: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

On Demand Market Surveillance: Next Steps

Deeper evaluation of HBaseImpact of volumes on performanceEvaluation of HA Region Servers for data access

Wider view of the targeted architectureOverall resilienceOverall latencySimplificationHot zone/Cold zoneTCO

Business requirements of the project:MIFID II impactNew services

Page 17: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK 17

Extreme flexibility thanks to our OLAP cube and Data Schema• 360 view of the position (As Of Date, explain, multi-aggregation level)• In-memory distributed calculation• Sub-second end-to-end (push architecture)

Low latency internal bus• UDP unicast, acknowledgement by UDP• No region location pain• Exactly once delivery, no message resent, multicast storm prevention

Resiliency• HBase RPC poll on message losses• HDFS message storage on overflood and region events

Overview of Scaled Risk implementationĦ Open Architecture

• Open Standards: seamless integration to HBase (coprocessor) • Open API (Valuation, FIFO), Toolkit approach

Page 18: NoSQL in Financial Industry - Pierre Bittner

SCALED RISK

www.scaledrisk.com

SCALED RISK