Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. •...

9
DATA STORAGE IN AN OPEN SOURCE WORLD Sponsored by

Transcript of Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. •...

Page 1: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

DATA STORAGE IN AN OPEN SOURCE WORLD

Sponsored by

Page 2: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

TABLE OF CONTENTS

Executive Summary ��������������������������������������������������������������������������������������������������������������������������������������������������1

Examining the Open Source Database Landscape ��������������������������������������������������������������������������������������������������2

Staying Agile with Ever-Expanding Data �����������������������������������������������������������������������������������������������������������������3

Achieving a Modern Storage Environment ��������������������������������������������������������������������������������������������������������������4

Optimizing Your Open Source Database Environment with Pure ����������������������������������������������������������������������������5

DATA STORAGE IN AN OPEN SOURCE WORLDAUTHORED BY:

Joe McKendrick, Lead Analyst, Unisphere Research, a Division of Information Today, Inc�

Page 3: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

August 20201

1. Executive SummaryOpen source databases have been on the scene

for a number of years as rapidly-deployable databases at the peripheries of enterprises, serving as testing environments and website back-ends. Lately, however, they have been moving into mission-critical production environments in a big way, from Software-as-a-Service (SaaS) providers using the technology to stay on the cutting edge, to supporting interactions and analytics at more traditional enterprises within industries like finance, healthcare and education. Today, an open source database is just as likely to be found behind a bank’s customer relationship management system as it is under the hood of the intensive research center of a university.

However, maintaining open source databases such as MySQL and MongoDB can also lead to growing pains in enterprises struggling to keep up with exploding data volumes and performance requirements. Storage—often an afterthought in smaller-footprint, open-source projects—needs to be addressed on an enterprise level. Performance tuning is a must-have, but this requires special expertise, and its benefits tend to be limited. Increases in processing power may also help boost performance, but constantly upgrading processors and hardware stacks can be expensive. Another option is to adopt clustering solutions to support larger workloads and improve high availability. However, ultimately, the key to achieving maximum performance and scalability in an open-source database environment comes from a modern storage environment, designed to efficiently deliver

data that is leveraged by today’s proliferation of advanced applications.

A modern storage environment that is simple, scalable, adaptable and resilient is essential for going forward in today’s open source world, especially as open source databases take on large enterprise workloads. The growing size and complexity of today’s database environments not only creates challenges in maintaining the performance and availability of mission-critical applications and systems, but for the ongoing day-to-day management of these environments by time-strapped database teams. The ability to automate and simplify routine database maintenance tasks via cloud and rich data services is becoming increasingly important. A modern storage environment should play a central role in simplifying the management of database environments and enabling greater data mobility.

2. Welcome to the Open-Source World

The days when enterprises maintained data environments that were tied to a single vendor or platform are gone. Even among enterprise shops that were initially built around Oracle relational database management systems, open-source databases are proliferating, as shown in a survey conducted among members of the Independent Oracle Users Group by Unisphere research, a division of Information Today. Open-source databases proliferate, including adoption of MySQL (among 44%), PostgreSQL (22%),

Page 4: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

August 20202

MongoDB (18%), MariaDB (11%) and CouchDB (3%).[1]

Additional surveys confirm that today’s enterprise data shops can only be characterized by “polyglot persistence”—or the use of different databases to handle different needs based on

the strengths of each particular database. The average company now leverages more than three database types for applications, a survey from DZone confirms—with some reporting up to nine different database types used.[2] For example, new applications can be more readily stood up with open-source databases, which are more lightweight and quicker to deploy. Many open-source databases are NoSQL databases that more readily support unstructured data, which is a huge part of the data now flowing through enterprises and sought for advanced analytics

and artificial intelligence. Licensing for per-user or per-processor instances is also less expensive and more open than those of commercial databases.

The reasons for adopting open source databases vary, but the two main factors driving adoption

are achieving cost savings and avoiding vendor lock-in. A survey published by Percona finds 77% see cost savings as a benefit, followed by 56% citing vendor lock-in. Users are also attracted by the community support they can receive with open-source environments.[3] At the same time, the expansion of open source databases into mission-critical environments, where they support large enterprise workloads, can highlight limitations in your database and traditional storage environment in areas such as

Oracle (all versions): MySQL: Microsoft SQL Server: PostgreSQL: IBM DB2: MongoDB:

81%44%

64%

22%

21%

18%

SAP HANA: 12%

MariaDB: 11%Amazon DynamoDB: 10%

CouchDB: 3%

Source: Unisphere Research/Information Today Inc�

Database Adoption by Brand at Oracle Enterprise Sites

0 20 40 60 80 100

Page 5: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

3 August 2020

performance, availability, capacity and time to market.

3. Where Does All That Data Come From, and Where Does It Go?

The key challenge for today’s enterprises is supporting and enhancing the mobility of data—data that comes from an ever-expanding array of sources, including machines and sensors, business users, external and social media sites, and transaction systems. This data ends up in data warehouses, data lakes and databases across the enterprise.

With most applications considered mission-critical in today’s enterprises, there’s almost unanimous agreement that availability and performance are the most important services data shops can deliver. Businesses are online and depend upon many elements to keep delivering. The user base is no longer limited to employees who will just sit and patiently wait until things are restored—it involves customers and partners who depend upon and expect information to be delivered, and transactions to be completed and updated, as soon as they happen.

In today’s data centers, everything that affects the business matters. Close to two thirds of data managers, 63%, responding to a survey from Unisphere Research say much of what they manage is mission-critical to their businesses—defined as more than 25% of total database installations. This survey also shows the pace of data growth accelerating. At the time of the survey, about 15% of respondents were in the fastest-growing segment, with data growing at a pace of 50% a year. However, within the next

three years, 31% expect to start seeing this pace of growth. [4]

Data is pouring in from all corners of the enterprise—from ERP, financial systems, production systems, customer relationship management systems, human capital management systems, and everywhere else inside and close to the enterprise. Data is also streaming in from devices, sensors and systems across the Internet of Things. This requires 24x7 data monitoring and management as well. Add to this the data coming out of transactions, social media or other customer engagements which have value in advanced analytics, artificial intelligence and machine learning.

This data growth, of course, is translating into very large databases and data sites that need to be managed. While a multi-terabyte database was seen as exclusive to large enterprises just a few years ago, close to half of the data managers in the survey report having databases exceeding 10 TBs right now, according to the Unisphere Research study. Close to one-third manage databases exceeding 50 terabytes in size. The presence of such massive amounts of data means greater care needs to be taken in keeping information highly available, without the loss of current data in the event of incidents or performance slowdowns.

In fact, availability is the data issue most likely to keep respondents up at night, the survey shows. Close to two-thirds, 63%, say the availability of applications is an “extremely critical” concern for them, according to Unisphere Research. About half also see database and application performance as an extremely critical concern. Overall, data managers and

Page 6: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

4 August 2020

administrators are extremely active in their efforts to assure constant, uninterrupted delivery of data to their users and customers. The majority of companies in the survey are taking steps to boost the performance of their database or even upgrade to new versions.

The pervasiveness of data, combined with users executing queries from so many different domains, calls for a high-performing and resilient storage solution similar to that leveraged by relational database management systems. With this growth of database activity—and proliferation of open source databases, the challenge for data managers is assuring the performance and availability of a wide range of solutions from varying providers. Data managers—database administrators, developers and analysts—are not experts in these multiple environments. Oracle databases, for example, have different protocols than MariaDB databases. This complicates important elements such as overall performance, availability, backup and recovery and disaster recovery.

4. Recommendations for Achieving a

Modern Storage EnvironmentThe increasing complexity of managing

open-source databases up front increases the urgency of maintaining a modern storage environment that serves all platforms across the enterprise. Such a storage platform can enhance performance and simplify operations to the point in which more expensive up-front methods, such as processor upgrades or performance tuning can be avoided. Traditional data storage systems such as direct attached storage (DAS) lack the ability

to meet these modern data demands as they are prone to disruptions, complexity and lack of scalability—all of which leads to administrative overhead. The following are recommendations to achieve a modern storage environment that incorporates open source databases:

• Consider a “software defined storage”

strategy. Software defined storage, or SDS, abstracts storage configuration away from underlying physical hardware and database-dependent features into a standardized and accessible service layer.

• Build in an intelligent storage layer. Such an architecture—structured as a multicloud data plane that eliminates the complexity of operating siloed private and public cloud environments—provides rich data services and data mobility.

• Transform storage infrastructure incrementally. Many organizations have networks of existing storage area networks and network-attached storage arrays. While it is cost-prohibitive and discursive to migrate to newer more intelligent storage layers, such capabilities can be built into new applications and configurations.

• Look to more rapid and efficient storage technologies. The traditional magnetic disk paradigm has proven to be too slow for todays’ always-on, real-time applications requirements, as data needs to make round trips between storage and random access memory in systems. Flash storage arrays and in-memory computing promise data access the instant it is required by applications and users. Make sure your vendor supports NVMe (Non-Volatile Memory

Page 7: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

5 August 2020

express) and NVMe-oF (Non-Volatile Memory Express over Fabrics) storage interconnect technologies, which are based primarily on PCIe, a fast-evolving standard that leverages flash storage.

• Look to the cloud. Today’s cloud providers—particularly Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Database as a Service (DBaaS) vendors—offer almost unlimited capacity, available on a subscription or usage basis. Even data warehouses and data lakes can be efficiently managed in the cloud.

Today’s environment offers many choices for applying the right database for the right application. However, with so many choices comes more responsibility for the proper and efficient storage of data. A modern storage environment provides the foundation for a high-performing data-driven enterprise.

5. Optimize Your Open Source Database Environment with Pure Storage

Whether your workload is transactional, data warehousing or conducting analytics, Pure Storage® can optimize your open source database deployment and improve your application performance. Pure delivers a modern data experience that empowers organizations to run their operations as a true, automated, storage as-a-service model seamlessly across multiple clouds. Pure helps companies use more of their data, while reducing the complexity and expense of managing the infrastructure behind it. Pure’s flash storage solutions are purpose-built to

Modern Storage Platform Checklist

When evaluating a modern storage platform for your open source database management system, look for these key features and capabilities:

• Rapid response time and low latency to database queries

• Speedy database cloning of production databases

• Non-stop availability of database with high storage uptime to help meet SLA targets

• Good data reduction that can provide effective use of capacity and lower TCO

• Consistent experience and data mobility across on-premises and public cloud.

• Open and efficient APIs• Space-efficient snapshots for database

recovery and cloning• Easy setup and maintenance which

reduces administrative overhead • Non-disruptive upgrades• Pay as you go billing flexibility. that

allows buying of storage based on actual consumption

• End-to-end encryption and ransomware protection

Page 8: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

6 August 2020

support the modern data experience and deliver simplicity, flexibility and reliability. FlashArray™ is the industry’s first all-flash 100 percent NVMe shared accelerated storage designed for mainstream enterprise deployments. FlashBlade™ is the industry’s first unified fast file and object (UFFO) storage platform for modern data and applications. Pure as-a-Service™ provides storage as a service for on-premises and public cloud that unifies hybrid clouds with a single subscription. Pure1® enables self-driving storage with full-stack, AI-powered data-storage management and monitoring.

• Faster applications with rapid response times can build customer value and deliver an amazing user experience. Hence fast data is agile data. Pure can help speed databases with low latency to support business applications. Reduce the time and cost of database activities—including copy, clone, and refresh—and provide quick provisioning via APIs, with data copies for dev/test so that your teams are always working off the latest copies of data. Pure offers granular data reduction necessary for virtually any application: pattern removal, deduplication, compression, deep reduction, and copy reduction. This embedded data-reduction capability minimizes both capacity needs and capacity costs.

• Pure solutions are easy to set up and smart enough to manage themselves. This minimizes administrative overhead and eliminates risk. Pure’s cloud-based management tool, Pure1, makes it quick and easy to monitor your storage, wherever you are. Pure1 Meta®, which is an AI-driven workload planner can provide

an optimal outlook by right-sizing capacity allocation.

• Pure solutions provide scalability and uninterrupted uptime, which is critical to meeting customer SLAs. Pure FlashArray provides six-nines availability (99.9999% uptime), inclusive of upgrades and maintenance, across both hardware and software. Additionally, Pure’s Evergreen™ storage program allows for non-disruptive upgrades while supporting long-term compatibility, IT agility, and peace of mind. With Pure as-a-Service, you can purchase storage in a cloud-like fashion to adapt to fluctuating capacity requirements. When it comes to data protection, options start with instant snapshot copies to synchronous replication with ActiveCluster™ and replication resiliency for rapid-restore backup and recovery. Pure’s asynchronous replication with ActiveDR™ provides a “near-zero” recovery point objective. Combined, these offerings can help limit downtime, data loss, and risk.

• Pure’s cloud data services, together with on-premises cloud data infrastructure, enable hybrid applications that run seamlessly across clouds. Take advantage of the agility and innovation of multiple clouds, building applications once, and then running them seamlessly on-premises and in the public cloud. Pure1 enables cloud-based, fleet management; advanced capacity planning; and workload simulation to deliver resources faster, control costs and forecast IT need. Pure as-a-Service delivers pay-as-you-go billing with scale up and down flexibility, competitive on-demand rates and unified subscriptions for both on-premises and cloud. The Modern

Page 9: Data Storage in an Open Source World | Pure Storage · 2021. 7. 23. · data mobility. • Transform storage infrastructure incrementally. Many organizations have networks of existing

7 August 2020

Data Experience from Pure Storage leverages hybrid mobility alongside consistent storage services, resiliency, and APIs across your hybrid environment to give you the most flexibility possible with your database deployments.

For more information, visit Pure Storage for Open Source Databases today.

Addition Resources:• FlashArray• Cloud Block Store• Pure1• Pure as-a-Service• Pure Solutions for MySQL, MongoDB,

PostgreSQL and Cassandra

[1] 2019 IOUG Databases in the Cloud Survey, prepared for Unisphere Research/Information Today, Inc. in cooperation with Amazon Web Services, January 2019.http://www.ioug.org/d/do/8551

[2] 2019 Open Source Database Report, Kristi Anderson, DZone, January 17, 2020. https://dzone.com/articles/2019-open-source-database-re-port-top-databases-pub

[3] 2019 Open Source Data Management Software Survey, Percona, 2019https://learn.percona.com/hubfs/Percona_Open_Source_DataMa-nagement_Software_Survey_2019.pdf

[4] Achieving Your 2018 Database Goals Through Replication: Real-World Market Insights and Best Practices, Unisphere Research, a division of Information Today, Inc., March 2018. https://www.dbta.com/DBTA-Down-loads/ResearchReports/Achieving-Your-Database-Goals-Through-Repli-cation-Real-World-Market-Insights-and-Best-Practices-8555.aspx

About Pure:Get more from your data. Pure Storage empowers innovators to build a better world with data by delivering a simple, evergreen platform that enables organizations of all kinds to turn data into intelligence and advantage.