#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
Pivotal's effort on Apache Geode
-
Upload
apache-apex -
Category
Technology
-
view
129 -
download
0
Transcript of Pivotal's effort on Apache Geode
Apache Geode,and Pivotal's leadership role
in open sourcing (Gemfire)
Nitin Lamba
(incubating)
Pivotal’s Open Source strategy
What is Apache Geode?
History
Differentiators
Basic Concepts
Resources
Q & A
Agenda
2
3
4
In 2015, Pivotal granted the components of its Big Data Suite to open source
6 Million Lines of Code4 new open source communities
5
May 2015 Sept 2015
Sept 2015Oct 2015
From GEMFIRE to GEODE…
6
A distributed, memory-based data management platform for data oriented apps that need:• high performance, scalability,
resiliency and continuous availability
• fast access to critical data sets• location-aware distributed data
processing• event-driven data architecture
What is GEODE?
7
• 1000+ systems in production (real customers)• Cutting edge use cases
Incubating but ROCK solid…
8
<2000 2004 2008 2012 2016
Early drivers• Data Volumes• Margins/ transactions• IT maintenance costs • Elasticity needs
Real-time needs• Real-time response• Time to market needs• Flexible Data Models • Persistent+In-memory
Global Data• Visibility across DC• Fast Ingest• Device to enterprise • Uptime (always on)
Open Source!• Apache Incubation• Gemfire > Geode• Geode M1 release• 1st Geode Summit
Financial Services
US DoDTrade Clearing
Travel Portal
Online Gambling
TelcosManufacturing
Auto InsurancePayroll processing
Rail systems
…with both SCALE and SPEED, …
9
40KTransactionsper second
3TB Data
in-memory
17B Records
in-memory
120KConcurrent
users
… and impacting a LOT of people!
10
China RailwayCorporation
Indian Railways
17%
19%
36%of the world population
High-level Architecture
11
Powerful app development kit• APIs: Java & REST• Adapters: Redis, Lucene*, Spark*, …
Multiple persistence options• Filesystem, RDBMS or HDFS*• Sync: read-through, write-through• Async: write-behind
Durable <K,V> cache/ store• Data replicated or partitioned• Redundant storage in-memory/ disk• Flexible data retention policiesÎ
!
Loca
tor
Serv
er
Serv
er
Serv
er
Serv
er +""""
"
$
%%%
&& &% % %% %% %%
&&
A Peer-2-Peer in-memory Distributed System
REST
!
* Experimental and waiting community feedback
• Minimize copying
• Minimize contention points
• Run user code in-process
• Partitioning & parallelism
• Avoid disk seeks
• Automated benchmarks
What makes it go FAST?
12
• Cache• Region• Member• Client Cache• Persistence• Functions
Let’s talk about a few BASIC CONCEPTS…
13
• In-memory storage and management for your data
• Configurable through XML, Java API or CLI
• Collection of Region
What is a CACHE?
14
• Distributed java.util.Map on steroids (Key/Value)
• Consistent API regardless of where or how data is stored
• Observable (reactive)
• Highly available, redundant on cache Member (s).
What is a REGION?
15
• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
Region: Types & Options
16
LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY
• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
Persistent Regions
17
Server 1 Server N
• A process that has a connection to the system
• A process that has created a cache
• Embeddable within your application
What is a MEMBER?
18
Client
Locator
Server
• A process connected to the Geode server(s)
• Can have a local copy of the data
• Run OQL queries on local data
• Can be notified about events on the servers
What is a CLIENT CACHE?
19
Persistence - Shared Nothing
20
Server 3Server 2Server 1
Persistence - Shared Nothing
21
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
22
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
23
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Persistence - Shared Nothing
24
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
B3
B2
Server 1 waits for others when it starts
Persistence - Shared Nothing
25
Server 3Server 2Server 1
B1
B3
B2
B1
B3
B2
Primary
Secondary
Fetches missed operations on restart
Persistence - Operational Logs
26
Create k1->v1
Create k2->v2
Modifyk1->v3
Create k4->v4
Modify k1->v5
Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to operation log
Persistence - Operational Logs: Compaction
27
Create k1->v1
Create k2->v2
Modifyk1->v3
Create k4->v4
Modify k1->v5
Create k6->v6
Member 1Put k6->v6
Oplog2.crf
Oplog1.crf
Append to operation log
Copy live data forward
• Used for distributed concurrent processing (Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
Functions
28
Functions
29
30
• Check out: http://geode.incubator.apache.org
• Subscribe: [email protected]
• Download: http://geode.incubator.apache.org/releases/
Join the Community!
31
Thank you!
Additional Slides
32
Built for PERFORMANCE…
33
0
200,000
400,000
600,000
800,000
1,000,000
A Re
ads
A Up
date
s
B Re
ads
B Up
date
s
C Re
ads
D In
serts
D Re
ads
F Re
ads
F Up
date
s
Ope
ratio
ns p
er s
econ
d
YCSB Workloads
Cassandra Geode
…and horizontal, consistent SCALABILITY!
34
Horizontal scaling for reads, consistent latency and CPU
0.
4.5
9.
13.5
18.
0.
1.25
2.5
3.75
5.
6.25
2 4 6 8 10
Speedu
p
ServerHosts
speedup latency(ms) CPU%
• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size
High Availability
35