Hb_Where Cloud Meets Big Data_final

12
Where Cloud Meets Big Data Managing big data in the cloud can be an overwhelming responsibility for IT departments. It’s important they put the right framework in place—or pawn off that work on the right provider. EDITOR’S NOTE KEEPING UP WITH BIG DATA AS A SERVICE FIND YOUR CLOUD BIG DATA PLATFORM MATCH AND HADOOP FOR ALL—OR NOT

description

Big Data Cloud Infraestructure

Transcript of Hb_Where Cloud Meets Big Data_final

Page 1: Hb_Where Cloud Meets Big Data_final

Where Cloud Meets Big Data Managing big data in the cloud can be an overwhelming responsibility for IT departments. It’s important they put the right framework in place—or pawn off that work on the right provider.

EDITOR’S NOTE KEEPING UP WITH BIG DATA AS A SERVICE

FIND YOUR CLOUD BIG DATA PLATFORM MATCH

AND HADOOP FOR ALL—OR NOT

Page 2: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA2

EDITOR’SNOTE

Big Data Decision Time

Today, an application without data ana-lytics is like a car without a steering wheel. It will go, but there’s no controlling its direction. This handbook explores the many emerging and evolving Web- and cloud-based tech-nologies for controlling and using data inside applications. The articles look at full-featured cloud suites, the popular Hadoop programming framework and other business intelligence tools that can be embedded into applications.

Our lead story, by news writer Joel Shore, shares advice on how software pros can use big data as a service (BDaaS) to accommodate the expectations of executives who see the capa-bilities of cloud-based analytics but can’t understand the challenge of integrating with enterprise systems. BDaaS delivers a platform and suite of tools that can speed up builds of analytics applications.

Is BDaaS the right data analytics develop-ment platform for your organization? Find

expert guidance in our second story, in which consultant Tom Nolle lays out approaches ranging from BDaaS to do-it-yourself options and covers the role databases play in these decisions.

Nolle questions a key big data tool assump-tion, that Hadoop fits all situations, in the final story. To Hadoop or not to Hadoop, he advises, depends on such variables as whether data access is centralized, the level of data distribu-tion performance needs, database practices and more.

Are you evaluating solutions for embedding data analytics into applications that we didn’t cover in this handbook? Tell us about your search and projects, and our resident experts can help. n

Jan StaffordExecutive Editor

SearchCloudApplications

Page 3: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA3

ANALYTICS

Keeping Up With Big Data as a Service

If there’s any agreement about big data, it’s that so much is coming in so quickly from many sources in many formats. The speed at which it all needs to be processed, stored and analyzed is simply more than most corporate IT budgets, staffs and infrastructures are able or willing to handle. We are drowning in data, yet often find ourselves starved for informa-tion. For an increasing number of companies, getting a grip on the situation means making it someone else’s problem. That someone is a big data as a service (BDaaS) or a data as a service (DaaS) provider.

Regardless of how a service is configured and delivered, discussion of DaaS focuses as much on analytics as it does on data collection, pre-senting opportunities and challenges to the development side.

“For architects and developers, cloud-based big data offerings are a way to accelerate the time to build analytics applications,” said Nik

Rouda, an analyst focused on big data and ana-lytics at market research company Enterprise Strategy Group. “Without having to wait for IT infrastructure and operations teams to pro-vision resources, developers can start imme-diately on prototyping and then easily roll the new tools into production when ready.”

With the rise in cloud and mobility, business priorities have become crystal-clear: Grow rev-enue and transform the customer experience while reducing costs. Each requires architects and developers to balance traditional values, such as security and cost effectiveness, with the need for speed and agility.

“Architects must figure out how to accom-modate the sky-high expectations of digital executives who have seen the capabilities of analytics in the cloud and yet do not under-stand why it is so difficult to integrate with enterprise systems,” said Brian Hopkins, an analyst for Forrester Research. “Excuses and

Page 4: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA4

ANALYTICS

finger pointing won’t work; those that fail will become irrelevant. This makes emerging Agile, DevOps and data science practices … a critical part of the emerging digital architecture.”

Every company is looking to do more with its data to stay ahead of competitors. Moving data to the cloud makes access and analysis easier for everyone. “Your customers, employees and apps all live in the cloud, so that’s where your data needs to be,” Rouda said. “It’s natural to bring the analytics to the data; you don’t bring the data to the analytics. BDaaS, or DaaS [is] particularly good at doing this.”

Jim Comfort, general manager of cloud ser-vices at IBM, agreed. Analytics is a key driver for turning to DaaS. “It’s one thing to simply store data in the cloud, but it’s the analytics in the cloud that make data useful. If you need one, two or 20 different analytic approaches, you can easily do all of that with the flexibility and agility that a cloud services environment offers,” he said.

The numbers back up Rouda’s and Comfort’s assertions. Data management, typified by the migration of databases from on-premises stor-age into the cloud, is the top IT priority for

this year among 26% of organizations polled by the Enterprise Strategy Group for its 2015 IT Spending Intentions Survey. That ranks second

only to security initiatives, which was cited by 34%. In that same group, 66% plan to boost spending on cloud services in 2015 compared with last year.

Forrester’s Hopkins takes an alternative view. “The truth is that the data on which you do your analytics is usually reasonably sized, only a small subset.” Move just that to the cloud and use DaaS to do the analytics there, he said. “Big data in the cloud is not yet affordable; it’s bet-ter to keeps years and years of historical data on-premises in a hybrid configuration.”

Regardless of where data resides, there is lit-tle doubt that IT is finding it increasingly dif-ficult to keep up with demand. That’s not

“ Your custom ers, employees and apps all live in the cloud, so that’s where your data needs to be.”

—NIK ROUDA, analyst at Enterprise Strategy Group

Page 5: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA5

ANALYTICS

surprising, given the pace at which data is cre-ated. In 2013, Norway’s The Foundation for Scientific and Industrial Research published a widely quoted study that found 90% of the world’s data had been created in the last two years. In 2015, it’s not unreasonable to surmise the percentage has edged higher. IBM itself says that 2.5 quintillion bytes of data are cre-ated every day.

Data warehouse providers can’t live up to their former promises anymore. “Those infra-structures lacked agility, and you needed to declare everything upfront, including the [data-base] schema and the amount of data to be stored,” Comfort said. That’s no longer good enough. “An instantly scalable, quickly imple-mented, cost-effective DaaS is the answer,” he added. —Joel Shore

Page 6: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA6

DATABASES

Find Your Cloud Big Data Platform Match

Users and cloud providers alike are focus-ing on the intersection of big data and the cloud, planning applications and service offer-ings to exploit the technologies. To address this intersection and choose the best cloud big data platform, developers need to decide on a database model, select cloud database services or cloud database platforms and review the fea-tures of each platform against their company’s needs.

There are three popular models for big data: distributed MapReduce, popularized by Hadoop; NoSQL, used for nonrelational, non-tabular storage; and SQL relational systems for relational tabular storage of structured data. You can use all three in the cloud, so in most cases database design and usage concerns will

determine the model choice. After identifying a database model, you can explore cloud options for the model selected.

Most business transactions are best stored and accessed in a relational database manage-ment system, where SQL queries and tabular summarization can be easily supported. Enter-prise users and database architects are most likely to be familiar with this model, and a good rule of thumb is to go for SQL and relational until you can prove another option is better.

The most common deterrent to using SQL is that the data is object-structured rather than tabular. Object data collects informa-tion as a set of properties that may be free-form in the object. If you can’t visualize data as a set of tables with fixed fields and valuable

There are three popular models for big data: distributed MapReduce, NoSQL and SQL relational systems. All three can be used in the cloud.

Page 7: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA7

DATABASES

field-to-field relationships, then a SQL and relational system may be difficult to adopt, and other options may be better.

HADOOP AND NOSQL

Both the Hadoop and NoSQL options are easier to adapt to nonstructured data. Hadoop and NoSQL can be used for applications where unstructured data is stored in clusters dis-tributed on a network, so the choice between the two comes down to object structure. If the database stores information about specific, identified things, then NoSQL is likely best. Data that has no natural structure, like free-form text, is better stored using Hadoop.

Note that you generally can query SQL, NoSQL and Hadoop databases using SQL. The latter two may require an overlay prod-uct, and the lack of tabular organization may make query processing more time-consuming. If you expect most database activity to be in SQL form, you probably have tabular data and should be considering a relational model.

The second point to consider is whether to use a database package from a cloud

provider or host your own database in the cloud. Most people are familiar with Ama-zon, Rackspace, Microsoft and Google, but lesser-known providers Joyent and Qubole also have strong big data credentials. Additionally, Hadoop is usually available from major cloud providers.

DO IT YOURSELF

Another option is to host your own big data application in the cloud using big data software and infrastructure as a service or platform as a service.

A do-it-yourself approach can offer advan-tages. It widens your options for cloud hosting because not all cloud providers will support big data as a service. You can use multiple public clouds or switch between cloud providers with greater ease. And often you can create hybrid big data applications more easily if you adopt the same big data software in the cloud and on-premises. The disadvantage, according to cloud buyers, is that creating in-cloud big data with your own platform tools is more complicated and sometimes more costly.

Page 8: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA8

DATABASES

Obviously, the best platforms for cloud big data depend on your database model. Top-rated Hadoop options include Apache Hadoop, SAP’s HANA and Hadoop combination, Horton-works, Hadapt and VMware’s Cloud Foundry, as well as services provided by IBM, Microsoft and Oracle. For NoSQL, consider Apache Cas-sandra, Apache Hbase or MongoDB. IBM also offers NoSQL for the cloud. Make sure that your final choice supports the level of big data cloud scaling you expect.

SQL big data in the cloud is most often sup-ported by extending your on-premises SQL vendor offering. IBM, Oracle and Microsoft all offer SQL that’s suitable, with some tuning, for big data cloud deployment. HP’s Haven is a general big data architecture for the cloud that embraces both structured and unstructured data and supports SQL queries.

CRITICAL NEEDS

It’s important to understand your needs and evaluate how each platform supports those needs. You may need to run tests to determine

whether a given big data option is efficient for your specific mix of update and access. Be particularly careful about SQL queries against non-SQL databases. Analytics that require extensive use of SQL can create major perfor-mance issues even with relational systems, and more so with other database models. Creative database design and careful use of JOINed databases may make things more efficient.

You also should ensure that distributed big data clusters can be accessed efficiently for combined queries. This can be complicated with cloud-hosted data because users have only limited control over how the data is distrib-uted. Testing to determine optimum data dis-tribution strategies and a contract to ensure, generally, that data stays within those guide-lines is critical.

Cloud big data hosting has considerable vari-ables, so be prepared to gather a lot of operat-ing data on quality of experience to ensure that workers are getting what they need and that costs are managed. Otherwise you’ll end up with something too costly to fix and too slow to accept. —Tom Nolle

Page 9: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA9

TOOLS

And Hadoop for All—or Not

Apache Hadoop has long been the focus of cloud-big data thinking, but there are plenty of refinements to consider in Hadoop planning, and many big data cloud applications aren’t suitable for Hadoop. Developers should ask what the big data storage paradigm will be and if it matches Hadoop’s capabilities, optimizes their database planning for cloud access, and tracks changes in data storage or access policies that could indicate a change is needed.

Hadoop is an open source implementation of a Google concept called MapReduce. It is designed to support the storage and querying of databases distributed across multiple net-work-connected compute clusters. The basic notion is to allow a single query to find and collect results from all the cluster members. This model is suitable for Google’s model of search support.

The value of Hadoop is that distributed data is subject to collective inquiry. Most

enterprises collect information in centralized databases and also create separate abstractions or aggregations of this data for better access. Many vendors, including IBM, recognize this trend and don’t lead their cloud big data ini-tiatives with the assumption that Hadoop is the choice technology. CIOs also agree that it’s rarely wise to use Hadoop on centralized data or to distribute data in the cloud simply to be Hadoop compatible.

Hadoop is ideal where data is naturally sepa-rated, not just within a data center but across multiple data centers. If that’s not the case for your data, then Hadoop isn’t likely the best option—even if you’re moving applications to the cloud.

Other data storage considerations include:

■n Do you routinely query distributed data as though it were centralized? If your data access tends to be directed toward specific

Page 10: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA10

TOOLS

data clusters, providing for overall query capability may have limited value.

■n Are any or all of your query applications per-formance sensitive? Hadoop querying is not as fast as other options for big data. This is particularly true if you’re using Hadoop’s optional SQL capability.

■n Do you create aggregate databases with sum-mary data to support high-level analytics? If so, these databases will likely combine data from multiple data clusters and reduce your need to look at the cluster data directly. However, Hadoop might be helpful here to support the aggregation of information.

The “ideal” Hadoop environment is one where large data volumes are collected and used locally but must also be accessed by analytics applications that deal with raw data rather than summary-level information. If this isn’t your situation, other options may be better.

Hadoop is good at confining mass data access to the clusters, but you can accom-plish something similar by sending queries to

local relational systems at each location and then “joining” the results. Another strategy for avoiding data access issues is to create summary databases for analytics that don’t

require real-time information and are special-ized and small enough to be hosted in the cloud at modest cost or moved into the cloud ad hoc as needed.

The second point in planning for cloud and big data is to remember that true cloud appli-cations are very different from legacy appli-cations. This must be the primary design consideration. Your cloud usage, present and planned, will have a major effect on your big data design, enough to create major problems if you make the wrong choice.

Users vary significantly in how they plan to

Avoid passing large volumes of data across the cloud bound-ary. Store data in the cloud or on-premises, and just move query results and summarized databases.

Page 11: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA11

TOOLS

use the cloud. Some expect to host everything there, some to share or hybridize, and some to use the cloud for failover or cloud bursting. Data access is a part of every application. The primary issue is to avoid passing large volumes of data across the cloud boundary. Store data in the cloud or on-premises and try to pass query results and summarized databases, not large quantities of raw data.

The final issue with big data in the cloud is the increased risk that the combination pres-ents. Cloud computing is evolving, and so is big data. Application design is changing to opti-mize cloud utility, and the notion of data and databases is transforming with mass-collection networks like the Internet of Things. These changes will affect both cloud and big data plans. —Tom Nolle

Page 12: Hb_Where Cloud Meets Big Data_final

HOME

EDITOR’S NOTE

KEEPING UP

WITH BIG DATA

AS A SERVICE

FIND YOUR CLOUD

BIG DATA PLATFORM

MATCH

AND HADOOP

FOR ALL—OR NOT

WHERE CLOUD MEETS BIG DATA12

ABOUT THE

AUTHORS

TOM NOLLE is the president of CIMI Corp., a consultancy specializing in telecommunications and data communi-cations since 1982. He writes for many TechTarget web-sites. Read his blog or email him at [email protected].

JOEL SHORE is a technology journalist, author and editor with nearly 30 years of experience. He is the co-founder and longtime director of the Computer Reseller News Test Center. He is a news writer for SearchCloudApplications and SearchAWS. Email him at [email protected].

Where Cloud Meets Big Data is a SearchCloudApplications.com e-publication.

Jason Sparapani | Managing Editor

Moriah Sargent | Associate Managing Editor

Jan Stafford | Executive Editor

Brein Matturro | Site Managing Editor

Linda Koury | Director of Online Design

Neva Maniscalco | Graphic Designer

Doug Olender | Publisher [email protected]

Annie Matthews | Director of Sales [email protected]

TechTarget 275 Grove Street, Newton, MA 02466

www.techtarget.com

© 2015 TechTarget Inc. No part of this publication may be transmitted or re-produced in any form or by any means without written permission from the publisher. TechTarget reprints are available through The YGS Group.

About TechTarget: TechTarget publishes media for information technology professionals. More than 100 focused websites enable quick access to a deep store of news, advice and analysis about the technologies, products and pro-cesses crucial to your job. Our live and virtual events give you direct access to independent expert commentary and advice. At IT Knowledge Exchange, our social community, you can get advice and share solutions with peers and experts.

COVER ART: FOTOLIA

STAY CONNECTED!

Follow @SearchCloudApps today.