Grid Asia2008 Low Latency Data Grid
-
Upload
jags-ramnarayan -
Category
Technology
-
view
1.792 -
download
2
description
Transcript of Grid Asia2008 Low Latency Data Grid
Copyright © 2006, GemStone Systems Inc. All Rights Reserved.
Low Latency Data Grids in Finance
Jags RamnarayanChief Architect
GemStone [email protected]
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Background on GemStone Systems
• Known for its Object Database technology since 1982
• Now specializes in memory-oriented distributed data management
• Over 200 installed customers in global 2000
• Grid focus driven by:– Very high performance with predictable
throughput, latency and availability• Capital markets• Large e-commerce portals – real time fraud• Federal intelligence
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Use of Grid computing in finance
• Two primary areas in tier 1 investment banks– Risk Analytics– Pricing
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
State of affairs – Risk Analytics
• Deluge of data (market data, trade data, etc)
• Overnight batch job doesn’t cut it– Want intra-day risk metrics– In some cases, real-time risk
• Explosion in simulation scenarios– More accurate risk exposure– Compliance
• Increasing number of smaller calculations
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
State of affairs – Pricing (derivatives)
• Too many products
• Increasing complexity in products– Too many underliers– Many relationships
• Hunger for latency reduction– Calculating the new price with lowest possible
latency– Pushing the prices to distributed applications
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Where is the problem?
Grid
Scheduler
Compute farm
Data warehouses Rational databases
• Database/file access contention– Too many concurrent
connections– Large database server
bottlenecks on network– Queries results are large
causing CPU bottlenecks– Even a parallel file system
throttled by disk speeds• Too much data transfer
– Between tasks, Jobs– Between Grid and file systems,
databases– Data consistency issues
File system
CPU bound job turns into a IO bound Job
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Data Fabric for Risk Analytics
OtherDatabases
File system
JavaClient
C++Client
C#Client When data is stored, it is
transparently replicated and/or partitioned;
Redundant storage can be in memory and/or on disk—ensures continuous availability
Keep reference data replicated on many; partition trade data
Machine nodes can be added dynamically to expand storage capacity or to handle increased client load
Pool memory (and disk) across cluster ; parallelize
data access and computation to achieve very high
aggregate throughput
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Data Fabric for Risk Analytics
OtherDatabases
File system
JavaClient
C++Client
C#Client
TaskFlow - As results are generated push events to compute nodes to initiate subsequent computation Avoid bulk data transfer
across tasks or Jobs
Thousands of compute nodes can maintain local cache of most frequently used data; Optionally use local disk for overflow
Move reference data to local cache
Synchronous read through, write through or
Asynchronous write-behind to other data sources and
sinks
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Move business logic to data
f1 , f2 , … fn
FIFO Queue
Data fabric Resources
Exec function
s
Sept TradesSubmit (f1) ->
AggregateHighValueTrades(<input data>,
“where trades.month=‘Sept’)
Function (f1)
Function (f2)
Principle: Move task to computational resource with most of the relevant data before considering other nodes where data transfer becomes necessary
Parallel function execution service (“Map Reduce”) Data dependency hints
• Routing key, collection of keys, “where clause(s)” Serial or parallel execution
Copyright © 2008, GemStone Systems Inc. All Rights Reserved.
Key lessons
• Apps should think about capitalizing memory across Grid (it is abundant)
• Keep IO cycles to minimum through main memory caching of operational data sets– Scavange Grid memory and avoid data source access
• Achieve linear scaling for your Grid apps by horizontally partitioning your data and behavior– Read “Pat helland’s – Life beyond Distributed transactions” (
http://www-db.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf)
• Get more info on the GemFire data fabric– http://www.gemstone.com/gemfire