Building Apps with Distributed In-Memory Computing Using Apache Geode

Building Apps with Distributed In-Memory Computing

using Apache Geode

Nitin Lamba@nlamba9

(incubating)

William Markito@william_markito

Introduction (Nitin) • WHAT? Overview & history • WHY? Relevance & Differentiators • HOW? Features & Basic Concepts • SEE! Quick start

Hands-on (William) • LEARN: Advanced Concepts - Persistence, f(x), PDX, … • SHOW: Demos (Docker, PDX)

Resources Q & A

Agenda

IntroductionNitin

From GEM to GEODE…

A distributed, memory-based data management platform for data oriented apps that need: • high performance, scalability,

resiliency and continuous availability

• fast access to critical data sets • location-aware distributed data

processing • event-driven data architecture

What is GEODE?

High-level Architecture

Powerful app development kit • APIs: Java & REST • Adapters: Redis, Lucene*, Spark*, …

Multiple persistence options • Filesystem, RDBMS or HDFS* • Sync: read-through, write-through • Async: write-behind

Durable <K,V> cache/ store • Data replicated or partitioned • Redundant storage in-memory/ disk • Flexible data retention policiesÎ

er +""""

&& &% % % % % % % %

A Peer-2-Peer Distributed System

* Experimental and waiting community feedback

• 1000+ systems in production (real customers) • Cutting edge use cases

Incubating but ROCK solid…

<2000 2004 2008 2012 2016

Early drivers • Data Volumes • Margins/ transactions • IT maintenance costs • Elasticity needs

Real-time needs • Real-time response • Time to market needs • Flexible Data Models • Persistent+In-memory

Global Data • Visibility across DC • Fast Ingest • Device to enterprise • Uptime (always on)

Open Source! • Apache Incubation • Gemfire > Geode • M1 release • 1st Geode Summit

Financial Services

US DoDTrade Clearing

Travel Portal

Online Gambling

TelcosManufacturing

Auto InsurancePayroll processing

Rail systems

…with both SCALE and SPEED, …

Transactionsper second

3TB Data

in-memory

17B Records

in-memory

Concurrent users

… and impacting a LOT of people!

China RailwayCorporation

Indian Railways

of the world population

Built for PERFORMANCE…

200,000

400,000

600,000

800,000

YCSB Workloads

CassandraGeode

…and horizontal, consistent SCALABILITY!

Horizontal scaling for reads, consistent latency and CPU

Speedu

ServerHosts

2 4 6 8 10

speedup latency(ms) CPU%

• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size

• Minimize copying

• Minimize contention points

• Run user code in-process

• Partitioning & parallelism

• Avoid disk seeks

• Automated benchmarks

What makes it go FAST?

• Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization

Let’s talk about a few (basic) CONCEPTS…

• In-memory storage and management for your data

• Configurable through XML, Java API or CLI

• Collection of Region

What is a CACHE?

Region

• Distributed java.util.Map on steroids (Key/Value)

• Consistent API regardless of where or how data is stored

• Observable (reactive)

• Highly available, redundant on cache Member (s).

What is a REGION?

Region

java.util.Map

Key Value

K01 May

K02 Tim

• Local, Replicated or Partitioned

• In-memory or persistent

• Redundant

• LRU

• Overflow

Region: Types & Options

Region

java.util.Map

Key Value

K01 May

K02 Tim

Region

java.util.Map

Key Value

K01 May

K02 Tim

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

• Durability

• WAL for efficient writing

• Consistent recovery

• Compaction

Persistent Regions

Modify k1->v5

Create k6->v6

Create k2->v2

Create k4->v4 Oplog2.crf

Member 1

Modify k4->v7 Oplog3.crf

Put k4->v7

Region

java.util.Map

Key Value

K01 May

K02 Tim

Region

java.util.Map

Key Value

K01 May

K02 Tim

Server 1 Server N

• A process that has a connection to the system

• A process that has created a cache

• Embeddable within your application

What is a MEMBER?

Client

Locator

Server

• A process connected to the Geode server(s)

• Can have a local copy of the data

• Run OQL queries on local data

• Can be notified about events on the servers

What is a CLIENT CACHE?

Application

GemFire Server

Region

Region Client Cache

• Clone & Build

• Start Services

• Create & Monitor Region

How to START? Easy as !!

gitclonehttps://github.com/apache/incubator-geodecdincubator-geode./gradlewbuild-Dskip.tests=true

cdgemfire-assembly/build/install/apache-geode./bin/gfshgfsh>startlocator--name=locatorgfsh>startserver--name=server

gfsh>createregion--name=myRegion—type=REPLICATEgfsh>start[pulse|jconsole]

Hands OnWilliam

• Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization

More (advanced) CONCEPTS…

Persistence - Shared Nothing

Server 3Server 2Server 1

Primary

Secondary

Primary

Secondary

Primary

Secondary

Primary

Secondary

Primary

Secondary

Server 1 waits for others when it starts

Primary

Secondary

Fetches missed operations on restart

Persistence - Operational Logs

Create k1->v1

Create k2->v2

Modifyk1->v3

Create k4->v4

Modify k1->v5

Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to operation log

Persistence - Operational Logs: Compaction

Create k1->v1

Create k2->v2

Modifyk1->v3

Create k4->v4

Modify k1->v5

Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to operation log

Copy live data forward

• Used for distributed concurrent processing (Map/Reduce, stored procedure)

• Highly available

• Data oriented

• Member oriented

Functions

Submit (f1)

f1 , f2 , … fn

Execute Functions

Functions

Server

FunctionService.onRegion.withFilter.execute ResultCollector.getResult

Server Distributed System

execute

Server

result

execute

result result

Server

Partitioned Region Data Store - X

Partitioned Region Data Store - Y

Partitioned Region Data Store - Z

Partitioned Region Data Accessor

filter = Keys X, Y Client Region

• Register Interest • Individual Keys OR RegEx for Keys • Updates Local Copy

• Examples: • region.registerInterest(“key-1”); • region1.registerInterestRegex(“[a-z]+“);

• Continuous Query • Receive Notification when Query condition met on server • Example:

• SELECT * FROM /tradeOrder t WHERE t.price > 100.00

Can be DURABLE

Events & Notifications

• CacheWriter / CacheListener

• AsyncEventListener (queue / batch)

• Parallel or Serial

• Conflation

Listeners

High Availability

Fixed or Flexible schema?

id name age pet_id

{id:1,name:“Fred”,age:42,pet:{name:“Barney”,type:“dino”}}

Portable Data eXchange (PDX)

C#, C++, Java, JSON

No IDL, no schemas, no hand-coding Schema evolution (Forward and Backward Compatible)

* domain object classes not required

|header|data||pdx|length|dsid|typeid|fields|offsets|

Efficient for queries

{id:1,name:“Fred”,age:42,pet:{name:“Barney”,type:“dino”}}

SELECTp.nameFROM/PersonpWHEREp.pet.type=“dino”

single field deserialization

But HOW to serialize data?

Benchmark: https://github.com/eishay/jvm-serializers

Schema Evolution

Member A Member B

Distributed Type Definitions

Application #1

Application #2

v2 objects preserve data from missing fields

v1 objects use default values to fill in new fields

PDX provides forwards and backwards compatibility, no code required

Demo(Docker, PDX, …)

Code • New features • Bug fixes • Writing tests

Documentation • Wiki • Web site • User guide

How to CONTRIBUTE?

Community • Join the mailing list

• Ask or answer • Join our HipChat • Become a speaker • Finding bugs • Testing an RC/Beta

Website http://geode.incubator.apache.org/ JIRA

https://issues.apache.org/jira/browse/GEODE Wiki

cwiki.apache.org/confluence/display/GEODE GitHub

https://github.com/apache/incubator-geode Mailing lists

mail-archives.apache.org/mod_mbox/incubator-geode-dev/

Where to BEGIN?

Thank you!http://geode.incubator.apache.org

https://github.com/Pivotal-Open-Source-Hub

Building Apps with Distributed In-Memory Computing Using Apache Geode

Technology

Transcript of Building Apps with Distributed In-Memory Computing Using Apache Geode

Apache Geode - The First Six Months

Developing cross platform mobile apps using Apache Cordova

Building Effective Apache Geode Applications with Spring Data GemFire

Introducing Apache Geode and Spring Data GemFire

Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Other NoSQL Data Systems

Apache conbigdata2015 christiantzolov-federated sql on hadoop and beyond- leveraging apache geode to build a poor mans sap hana

Apache Geode Clubhouse - WAN-based Replication

Apache Geode で始めるSpring Data Gemfire

Where Does Apache Geode Fit in CQRS Architectures?

Intelligent Apps with Apache Lucene, Mahout and Friends

Apache Geode (incubating) Introduction with Docker

Apache Apex & Apace Geode In-Memory Computation, Storage & Analysis

SAS Viya 3.4 Administration: Licensing · (Geode) Apache HTTP Server SAS Cache Server (Geode) Pgpool Server SAS Configuration Server (Consul) SAS Secrets Manager (Vault) SAS Message

Apache geode

In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode

Apache Calcite for Enabling SQL Access to NoSQL Data ... · Big-Data, Hadoop, Spring Cloud Dataflow, Apache Geode, Apache HAWQ, Apache Committer, Apache Crunch PMC member Whoami Disclaimer

#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode

Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement

Developing Native JavaScript Mobile Apps Using Apache Cordova

Apache Geode/Pivotal GemFire - itdks.su.bcebos.com