Data-Centric Infrastructure for Agile Development

44
© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The Data-Centered Data Center Presented by: Jim Clark, Senior Director of Product Management

description

Most data centers are filled with rigid data servers that are tightly linked to specific applications, leading to data duplication, lengthy development cycles, and unnecessary costs. Learn how you can use an Enterprise NoSQL database platform to help create a flexible, agile data fabric that will allow you to iterate your application development, optimize your data, and reduce costs. When your enterprise infrastructure is data-centric instead of application-centric, you make it easy for anyone to pull crucial data without spending unnecessary time and money on plumbing...freeing resources for building better applications. Learn how other companies have built –and benefited from– a data-centric infrastructure for agile development. Ingest and manage all your data, documents, and semantic triples in a flexible, schema-agnostic platform – without sacrificing the ACID transactions, granular security, database management tools and other features you’ve come to expect in a mature database platform Quickly build complex, interactive search applications Deliver robust, real-time search and alerting within your applications Use – and optimize – modern infrastructure including Hadoop and cloud to attain operational agility Simplify implementation of data governance requirements around security, privacy, provenance, retention, continuity, and compliance – while reducing risk, cost, and time

Transcript of Data-Centric Infrastructure for Agile Development

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

The Data-Centered Data CenterPresented by: Jim Clark, Senior Director of Product Management

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 2

THE WORLD IS VERY APPLICATION-CENTRIC

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 3

2. Determine needed data 3. Determine needed queries

?

?

1. Design the application

7. Load the data 8. Code the application 5. Build a database 6. Design the ETL strategy

4. Design the schema and indexing strategy

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 4

OLTP

Warehouse

Data MartsArchives

“Unstructured”

“ ”

VideoAudio

Signals,Logs,Streams

Social

Documents,Messages

{ }Metadata

Search🔍

ReferenceData

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 5

HOW DO YOU DETERMINE IN ADVANCE WHAT'S USEFUL?

Love the application...can

you go back and include the

data from 1990 – 1995?

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 6

TOO MUCH DATA TO BE COPYING FOR EVERY NEW APPLICATION

Serious?! Third time this

month I'm moving that

data around!

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 7

ETL CONSUMES ALL RESOURCES

With all of the new data

we're trying to get into the

database, there's no time to

build new features!

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 8

TOO MANY TECHNOLOGIES CREATES SCALING HEADACHES

To scale this system, we've got to buy

new hardware. We can take the old

hardware and move it to this other

system. That one can't get any bigger.

Period.

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 9

TOO MUCH AND TOO MANY COPIES...YOU'VE LOST CONTROL

Who's reading it? Who's

editing it? Where's the

master copy? What's

happened to it over time?

Is it reliable?

How up-to-date is this data

store? Are the security

models consistent? Are there

different backup models? Are

the lifecycles, retention,

disposal policies the same?

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 10

APPLICATION-CENTRICDATA CENTER

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 11

APPLICATION-CENTRICDATA CENTER

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 12

The data-centered data center

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 13

5. On-premises, Cloud... both!

3. Elasticity with no downtime

6. Create powerful data services

1. Hadoop4. Manage

the data lifecycle2. Low-cost Tiered Storage

7. Complete database platform

How?

8. Enterprise Readiness

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 14

Enter Hadoop…

Hadoop

Staging Analytics

Persistence

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 15

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 16

Legacy RDBMS Indexes Transactions Security Enterprise operations

“NoSQL” Flexible data model Commodity scale out Distributed, fault-tolerant Hadoop sink/source

Why must we choose?

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 17

Enterprise NoSQL

Flexible data model, comprehensive indexeso Documents: Hierarchy, text, values, tags—schema “when you need it”

o Scalars: Aggregates and range filters, including geospatial

o Triples: Linked facts and inferencing

o Permissions: Users, roles, compartments, and privileges

o Queries: Reverse indexes for alerting, matching

Ad hoc queries, lock-free reads Real-time transformation Strict consistency, security throughout

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 18

Data-centered

EnterpriseNoSQL

HadoopMarkLogic

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 19

NoSQL Online applications Delivery Decision-making Real-time Granular updates Distributed indexes

Hadoop Offline analytics Staging Model-building Long-haul batch Write-once, read-many Distributed file system

Complementary approaches

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 20

TIERED STORAGE

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 21

With Tiered Storage You Can

Provide multiple Service Level Agreements (SLAs)

in a single system

Decrease time and costs of ETL to bring

offline content back online

Empower your operations team without

imposing burdens on your developers

SLIDE: 22 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Tiered StorageHere’s how you enable tiered storage…

Define data tiers based on a range index

Have content balanced into forests by tier

Move an entire tier to different storage

Query one tier…

…or the other tier…

…or both at once!

All with no downtime, and 100% consistency!

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 23

OPERATIONAL TRADE STORE

Case Study:

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 24

Tier 1 Bank: Operational trade store

“What are the bank’s obligations?”

ETL

Trade execution

Post-trade processing

Reporting

Analytics

Trade stores

Reference data

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 25

Legacy trade store challenges

Long development cycles for new instrument types Complex combinations of ETL and data models Limited visibility across the business Governance risk, maintenance costs of siloed infrastructure Varied SLAs and access patterns created inefficiencies

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 26

Preserving Context with Documents

Trade Cashflows

Party Identifier Net Payment

Payment Date

Party Reference

Payer Party

Trade ID

Payment AmountReceiver

Party

ApplicationModel

ProviderModel

PersistenceModel

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 27

Information lifecycle

Active Historical Archive

Time

SSDDASSANHadoop

DASSANNASHadoopS3

NASHadoopS3

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 28

Active

Active Local 10K SAS, RAID10

Replication for HA

Merge overhead for updates

20 hosts, 320 shards

4 TB of SSD cache

96 TB

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 29

Compliance

Active

Compliance Shared NAS

63 hosts

Effective 8 TB/host

504

96

TB

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 30

Active

Compliance

Analytic Hadoop

120 hosts

Effective 12 TB/host

10 MarkLogic hosts

Analytic

1,044

504

96

TB

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 31

Active

Compliance

Analytic

Online migration

TB

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 32

96 504 1,044

592 2,066 2,080

Total Size (TB)

Total Cost ($000)

Effective Unit Cost ($/GB)

$4

Compliance

$1.50

AnalyticOperational

$25

($/GB)

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 33

Align infrastructure with objectives

Data volumes are increasing, but IT budgets are not Storage is the dominant factor in the overall cost Value of data and pattern of access varies widely and changes over time

Last month’s news

Current quarter’s open transactions

Latest message traffic

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 34

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 35

ELASTICITY

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 36

With Elasticity You Can

Know when to scale

How much to scale

Programmatically expand and contract

On premises or in the cloud

SLIDE: 37 © COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.

Elasticity

Scale up and down with Tools to understand in detail how your cluster

is performing, and to find bottlenecks

Fine-grained tuning parameters for optimization of indexes, cache sizes, etc.

Cloud orchestration APIs to expand and contract clusters programmatically on-prem or in the cloud

Continuous, online rebalancing of content across nodes in a cluster to keep performance optimal for your cluster size

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 38

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 39

The data-centered data center

Index once

Single security model

Flexible data model

Transactions

Elastic operations

…when you need themSimplified governance

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 40

SECUREMinimize duplication,

costly ETL, reduce risk

REAL-TIMEEnterprise-class database for real-time search, delivery &

analytics

THE DATA-CENTERED DATA CENTER

RUN APPLICATIONSRun mission critical applications

directly on HDFS

© COPYRIGHT 2013 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 41

PowerfulDeliver more value, build more powerful applications

Full Text Search

Scalable

Analytic Functions

Alerting & Event

Processing

Geospatial Query

In-database MapReduce

Visualization Widgets

Semantics: RDF &

SPARQL

Flexible Indexes

JSON Storage

REST & Java APIs

Triple Index

POWERFULDeliver more value, build more powerful applications

AGILEPrepare for and respond quickly to change

BI Integration

HDFS & Amazon S3

Storage

Elastic

ProgrammaticControls &

Metering

Application Builder

Information Studio

SQL Support

HadoopConnector

Tiered Storage

CloudReady

Schema-Agnostic

mlcp Content

Pump

TRUSTEDEnterprise-ready and secure for mission-critical apps

ACID Transactions

XA Distributed

Transactions

Database Rollback

Backup/Restore

Automated Failover

Journal Archiving

Replication

Point-in-time

Recovery

Monitoring &

Management

Role-based Security &

LDAP Support

Common Criteria

Security Certification

ConfigurationManagement

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 42

Take-Aways

New and more data is both an opportunity and a threat Last generation of data management is not sufficient More copies, representations, transformations increase risk and slow innovation Index once and reuse across workloads, lifecycle

NoSQL: indexing and updates for interactive apps

Hadoop: staging, persistence, and analytics

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 43

SEARCHDATABASE

APPLICATION SERVICES

© COPYRIGHT 2014 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. SLIDE: 44

Any Questions?