The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations...

12
The 5 Main Benefits of Apache Cassandra

Transcript of The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations...

Page 1: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

The 5 Main Benefits of Apache Cassandra™

Page 2: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

2

Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store, and analyze their data.

But then Facebook came along, and an RDBMS was suddenly not quite enough. The social giant needed a powerful database solution for its Inbox Search feature, and Apache Cassandra—a distributed NoSQL database—was born.

Released as an open source project in July 2008, Cassandra—named after the mythological prophet who famously put a curse on an oracle—became an Apache Incubator project in March 2009. It graduated to a top-level project in February 2010.

Since its 2010 release, Cassandra has gone through several iterations. As we approach the release of Cassandra 4.0, it’s worth checking out a brief overview of how the database evolved over the last several years:

2011JUNE

2013SEPTEMBER

2011OCTOBER

2015NOVEMBER

2017JUNE

Cassandra 0.8was released in June 2011, adding support for the Cassandra Query Language (CQL), support for zero-downtime upgrades, and more.

Cassandra 2.0was released in September 2013, adding lightweight transactions, improved compactions, and more.

Cassandra 3.11was released in June 2017; it’s the latest release.

Cassandra 1.0was released in October 2011, adding improved read performance, integrated compression, and more.

Cassandra 3.0was released in November 2015, adding a refactored storage engine, materialized views, and more.

Cassandra 4.0is expected to be released in the near future. It will include increased reliability, audit logging, and simplified repair operations, among other things.

Page 3: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

3

Introduction (Cont.)

As an open source project, Cassandra is freely available from the Apache Software Foundation. There are, however, various distributions of Cassandra—one of which is DataStax Distribution of Apache Cassandra™, which is distributed and supported by the same people who wrote the majority of Cassandra’s code.

Cassandra adoption has significantly increased over the last few years, and for good reason: the distributed database delivers a ton of value.

With that in mind, let’s take a look at five of the big benefits of Cassandra.

Page 4: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

4

ScalabilityWhen you scale easily, you win. Period. There’s no substitute for knowing you’ll be able to handle a surge of holiday season traffic—even when you’re asleep.

On the flipside, when scaling is difficult to achieve or adds significant risk like potential downtime, you panic. You never know when a large influx of traffic is headed your way. If your systems can’t scale to accommodate this traffic, your customers will go somewhere else.

Generally speaking, there are two ways to achieve scale at the database level:

1. You can scale upward by adding capacity to a single machine (e.g., memory, storage, and CPU). You won’t have to run multiple servers. But there’s a much bigger chance your infrastructure will fail due to increased strain. You’ll only need to handle a single system—or a small number of systems. However, you’ll also have a single point of failure and you will likely spend a lot of money on implementation (i.e., expensive high-end hardware)—to the point that you’ll likely be completely locked in.

2. You can scale out by adding more servers. Of course, you’ll have to run more servers. Licensing fees and utility costs might go up, too. But overall you’ll spend a lot less cash compared to scaling up. You’ll also enjoy resilience and fault tolerance, both of which can be baked into the foundation of the database cake.

Cassandra enables organizations to scale out easily in a linear fashion—which is quickly becoming the preferred method of scalability for leading enterprises. Scaling out is simple: if you want to double the workload, just double the number of servers. It’s that easy. You can scale out without downtime or impacting performance.

Page 5: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

5

Scalability (Cont.)

Node 1

Node 2

100,000ops/sec

Node 1

Node 3

Node 2

Node 4200,000ops/sec

Node 1

Node 8

Node 7

Node 6

Node 5

Node 4

Node 3

Node 2

400,000ops/sec

Source

Not only will Cassandra’s ability to scale save you tons of money, but you also won’t have to worry about getting stuck into a less-than-optimal vendor’s tech stack, either.

BUSINESS VALUE:

It’s estimated that Amazon lost up to $100 million on a one-minute outage in 2018, ostensibly due to too many users flooding the site simultaneously. With scalable systems in place, your business won’t miss out on opportunities during heavily trafficked periods, and you’ll be avoiding extremely costly outages. Add opportunity. Subtract losses. That’s value.

Page 6: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

6

High Availability via Data Replication

It’s a bit of a paradox, but the world is becoming increasingly connected as it becomes increasingly distributed.

This evolving reality demands a database that can handle data coming from multiple geographically distributed sources.

Traditionally, databases had master-slave architectures. Master nodes could read and write while slave nodes could only read. While this architecture helped ensure consistency, it also introduced serious problems. Database operations, for example, would grind to a halt in the event the master node failed.

That might have been something an enterprise could stomach in the 1980s. But as we approach 2020, no serious organization can absorb such a significant disruption.

Good news: Cassandra’s masterless architecture means that every node can perform read and write operations. This enables data to quickly be replicated across data centers and geographies.

SANFRANCISCO

NEW YORK

EMEAMicrosoft Azure

NORTH AMERICAAmazon EC2

Page 7: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

7

High Availability via Data Replication (Cont.)

As a result, team members and customers spread out across the world can expect an optimal experience each time they interact with applications. Data is always available, no matter where the physical infrastructure is located. In the event a node gets knocked offline, traffic is automatically rerouted to the nearest healthy node.

BUSINESS VALUE:

Recent IBM research revealed that bad data collectively costs U.S. organizations $3.1 trillion each year. Thanks to Cassandra, you won’t have to worry about duplicative work, lost intellectual property, or inaccessible customer data. Automatic data replication means data is never lost, and because of this, you don’t need to invest in a separate disaster recovery data center. Money saved is money earned.

Page 8: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

8

High Fault Tolerance

In a perfect world, your systems would always run as designed—even when one part fails.

Cassandra gives you the ticket to that perfect world.

Thanks to its masterless, peer-to-peer architecture and data replication capabilities, applications never slow down or fail when nodes get knocked offline. If you use the leading distribution of Cassandra, DataStax Enterprise, you’ll have built-in repair services that fix problems immediately after they occur. Cassandra also has transparent fault detection and recovery—nodes that fail can easily be restored or replaced.

When a node goes down, master-slave architectures require administrators to invest a lot of time and energy repairing the database. Cassandra has no such requirements; there’s no need for any manual intervention when a node fails.

With Cassandra, you can forget about fault tolerance altogether. It’s automatic.

BUSINESS VALUE:

According to Gartner, the average company loses $5,600 per hour of downtime. On the high end, an enterprise can lose as much as $540,000 per hour of downtime. Who can afford that?

Page 9: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

9

High Performance

Suffice it to say: speed matters.

We expect prompt service at restaurants, quick delivery of packages, and zero lag from our applications. And when things don’t happen as quickly as we hoped, we are prone to switch to a better service.

The same holds true for websites and applications. Consider these statistics compiled by HubSpot:

N 47% of customers expect a website to load in two seconds or less N 79% of customers are unlikely to support a business that has poor website performance N A one-second delay in page load time translates into an 11% reduction in page views

The end result? Employees can get things done quickly and customers can enjoy positive user experiences in every interaction.

In a world that moves faster than lightning, yesterday’s data is already a dinosaur.

BUSINESS VALUE:

Thanks to Cassandra’s high performance, developer productivity increases as users don’t have high latency or bottlenecks slowing them down. From the customer’s perspective, websites and applications will work as they’re expected to, translating into positive user experiences and improved customer retention.

Page 10: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

10

Multi-Data Center and and Hybrid Cloud Support

In an age where hybrid cloud is quickly becoming the go-to data management environment, this is key.

Cassandra is designed as a distributed system for deployment of large numbers of nodes across multiple data centers. Key features of Cassandra’s distributed architecture are specifically tailored for multiple-data center deployment. These features are robust and flexible enough that you can configure the cluster for optimal geographical distribution, for redundancy for failover and disaster recovery, or even for creating a dedicated analytics center replicated from your main data storage centers.

Cassandra characteristics that are key to multi-data center deployment include: N Replication factor and replica placement strategy – NetworkTopologyStrategy (the

default placement strategy) has capabilities for fine-grained adjustment of the number and location of replicas at the data center and rack level.

N Snitch – For multi-data center deployments, it is important to make sure the snitch has complete and accurate information about the network, either by automatic detection (RackInferringSnitch) or details specified in a properties file (PropertyFileSnitch).

N Consistency level – Cassandra provides consistency levels that are specifically designed for scenarios with multiple data centers.

Your specific needs will determine how you combine these ingredients in a “recipe” for multi-data center operations.

BUSINESS VALUE:

Being able to reliably serve a distributed, global audience with powerful, always-on applications means using multiple data centers. To serve multiple data centers, having an easily scalable database across geographic regions is critical.

Page 11: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

11

How DataStax Distribution of Apache Cassandra Adds Value

As previously mentioned, there are also various distributions of Cassandra out there, one of which is DataStax Distribution of Apache Cassandra.

DataStax Distribution of Apache Cassandra is 100% open source compatible and allows organizations to unlock the true power of Cassandra.

Enterprises that use DataStax Distribution of Apache Cassandra benefit from a production-ready version of Cassandra that’s gone through an intensive QA process. Remember how expensive downtime is? With DataStax Distribution of Apache Cassandra, hotfixes, bug escalation, and upgrades are included—which accelerates time to resolution and reduces maintenance costs.

DataStax Distribution of Apache Cassandra also comes with 8x5 support—with the option of 24x7x365 support—from the folks responsible for writing a majority of the Cassandra codebase.

DataStax Distribution of Apache Cassandra allows you to avoid the maintenance, support, and compliance issues many enterprises that deploy the open source version of Cassandra eventually run into.

Get started with DataStax Distribution of Apache Cassandra here.

Page 12: The 5 Main Benefits of Apache Cassandra - DataStax · Introduction For decades, organizations relied on traditional relational database management systems (RDBMS) to organize, store,

E B O O K | T H E 5 M A I N B E N E F I T S O F A PA C H E C A S S A N D R A ™

12

About DataStax

DataStax delivers the always-on, active-everywhere distributed hybrid cloud database built on Apache Cassandra™. The foundation for personalized, real-time applications at scale, DataStax Enterprise makes it easy for enterprises to exploit hybrid and multi-cloud environments via a seamless data layer that eliminates the issues that typically come with deploying applications across multiple on-premises data centers and/or multiple public clouds.

Our product also gives businesses full data visibility, portability, and control, allowing them to retain strategic ownership of their most valuable asset in a hybrid/multi cloud world. We help many of the world’s leading brands across industries transform their businesses through an enterprise data layer that eliminates data silos and cloud vendor lock-in while powering modern, mission-critical applications. For more information, visit www.DataStax.com and follow us on Twitter @DataStax.

© 2019 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka, and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States, and/or other countries.

Last Update: FEB2019