Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive...

11
Got Hadoop? Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Transcript of Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive...

Page 1: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

Got Hadoop?Whitepaper: Hadoop and EXASOL - a perfect combination for processing, storing and analyzing big data volumes

Page 2: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

2www.exasol.com • Share this whitepaper:

Contents

Introduction ..............................................................................3

Hadoop´s humble beginnings ....................................................4

The benefits of Hadoop ............................................................5

The limitations of Hadoop .........................................................6

In-memory to the rescue ...........................................................7

Introducing fast analytics from EXASOL ...................................8

Case Study: Crushing candy with Hadoop and EXASOL .........10

Summary ................................................................................11

Page 3: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

3www.exasol.com • Share this whitepaper:

What happens when your files get too large for your computer? These days, the most common answer is probably to buy a new computer. However, what happens when your business collects so much data it can no longer be stored on a single server effectively? With humanity creating 2.5 quintillion bytes of data daily, this is a question that many companies have answered with Hadoop.

Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop has a mature market that is still growing. IDC predicts that the Hadoop software market will be worth $813 million by 2016 (and this estimate may be low).

Introduction

Page 4: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

4www.exasol.com • Share this whitepaper:

Hadoop´s humble beginnings

As the amount of data has soared this century, many organizations have had to adapt to storing and processing petabytes (roughly equivalent to 200,000 DVDs) of information. To help solve this problem, Doug Cutting and Mike Cafarella created Hadoop in 2006, naming the project after a toy elephant that belonged to Cutting’s son.

Hadoop version 1 was a great first effort, but users began to notice some scalability limitations. These were primarily with the MapReduce engine that had limitations with cluster sizes and the number of concurrent tasks that were allowed to run. There was also the possibility that if certain processes failed, all running jobs would be killed across the cluster, stopping work instantly.

In order to fix these issues, Hadoop was redesigned with version 2 and MapReduce gave way to a new resource scheduler called YARN (Yet Another Resource Negotiator). With version 2 of Hadoop, the YARN scheduler allowed a more dynamic approach allowing Hadoop to scale more easily.

Since Hadoop 2.0 is no longer constricted by MapReduce limitations, using Hadoop 2.0 you’re now allowed unlimited amounts of raw unstructured data, making the data lake concept possible.

“Separating HDFS from MapReduce with YARN makes the Hadoop environment more suitable for operational applications that can‘t wait for batch jobs to finish,” says Margaret Rouse with TechTarget.

222

Page 5: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

5www.exasol.com • Share this whitepaper:

Several tech organizations like Facebook, Google, Yahoo, Twitter, and Amazon have used Hadoop successfully. Other industries have also found uses for Hadoop.

Hadoop is currently being used in many areas, including:

• business intelligence and analytics• e-commerce• energy discovery and savings• fraud detection• healthcare• insurance• online travel• marketing and advertising• security• education and training

Hadoop is also used by the automotive and manufacturing industries, allowing them to capture the massive amount of data from supply chains, R&D activities, enterprise resource planning, internal business operations, and much more.

Hadoop has been used to enhance product quality by allowing manufacturers to better monitor the data from the processes and create insights during the manufacturing. It enables organizations to collect information from call centers, service calls, product surveys, social media and more to improve

The benefits of Hadoop

customer satisfaction. It is also used to monitor processes throughout the supply chain to allow businesses to better optimize their supply chain.

There are many reasons to use Hadoop:

• Hadoop is free for anyone to download, install, and use• It helps organizations store, process, and manipulate big data sets in a cost-effective way• It has built-in fault tolerance

333

Page 6: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

6www.exasol.com • Share this whitepaper:

If you listen to the buzz about Hadoop you’re likely to think that it can conquer any and all Big Data problems. While it excels at storing massive amounts of data across commodity hardware, it does have drawbacks.

Real-time analytics can be slow

Hadoop was built to be a batch processing system and because of this it has trouble with real-time data analysis. This means when you need answers fast from analytics, Hadoop needs some help.

Multiple copies of Big Data can hinder performance

Since HDFS was built for efficiency, it stores three or more copies of data across the cluster (sometimes six or more). This means it can be slow to retrieve this data when used for real-time analytics.

Inadequate SQL support

While there are additional components that attempt to give Hadoop some SQL support, they lack certain high-level SQL functions which are frequently used in analytics.

The limitations of Hadoop

These limitations in Hadoop’s framework can make it ill- equipped to handle a growing organization’s analytics needs. As some businesses store more than 1 petabyte of data per year, Hadoop does a great job of storing the data, but it is unable to keep up with the demands of real-time analytics that these growing companies need to make decisions.

444

Page 7: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

7www.exasol.com • Share this whitepaper:

What would it mean for your business if you could process queries 100 times faster than with a traditional disk-based system? What if you could do this and still keep your existing Hadoop investment?

One way to resolve speed issues with disk-intensive systems like Hadoop is to use an in-memory database. In-memory databases allow for the fastest data retrieval speeds. When you combine this with the decreasing price of RAM, it makes in-memory databases a reality. In-memory systems can take the place of slower mechanical spinning disks with memory-intensive architectures.

Using an in-memory system, Hadoop can increase the speed of analytics which complements existing Hadoop systems. Databases that use in-memory processing deliver results faster. For organizations that need to scale real-time data analytics, in-memory processing is a powerful solution.

In-memory to the rescue

555

Page 8: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

8www.exasol.com • Share this whitepaper:

EXASOL is the world’s fastest high-performance, in-memory database analytics engine. EXASOL is designed specifically for analytics. When combined with Hadoop, it offers:

Faster insights

EXASOL’s in-memory algorithms allow it to process large data sets with increased speed. This is how EXASOL main-tains the number one position for speed, which is proven by tests carried out by the independent Transaction Processing Council on data volumes from 100 GB up to 100 TB.

EXASOL allows you to analyze your Hadoop data more quickly and easily, turning it into actionable insights that will help you drive business. When combined with Hadoop, EXASOL can handle the large-scale analytic workloads that would be unthinkable using Hadoop alone.

EXASOL and Hadoop combined are frequently used to:• accelerate standard reporting• run multi-user ad hoc analytics• perform complex modeling

Better scaling

EXASOL automatically compresses data, reducing the number of I/O operations and allowing more data to be processed in memory. It was built from the ground up to support massively parallel processing (MPP). This allows queries to be distributed and optimized across the cluster.

Introducing fast analytics from EXASOL

EXASOL is scalable, meaning it can grow as your company data grows. It does not require that you replace existing Hadoop systems to bring EXASOL in. It can be implemented alongside Hadoop and speed up its analytics capabilities. This allows you to enhance your systems while maintaining your existing investment.

Easy to implement

EXASOL is designed to run on low-cost commodity hardware that is easy to implement and delivers extreme performance without extreme cost and complexity.

When integrated with Hadoop, you will begin to see the real-world benefits of having the Big Data storage power of Hadoop with the analytics speed of EXASOL.

Easy mobile testing brought to your door

Most data analysts cringe at the idea of replacing an analytics engine; however, EXASOL makes it simple. If internal server resources are not available for testing, the EXASOL team will bring in a mobile testing trolley. This small, portable server trolley can be rolled around and plugged in instantly. This means your data never has to leave the building and testing is totally transparent.

666

Page 10: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

10www.exasol.com • Share this whitepaper:

Many of us are familiar with the popular game Candy Crush, created by King Digital Entertainment. King currently has more than 195 games in over 200 countries. These games are played 1.5 billion times per day.

In order to maintain an edge over their competitors and provi-de a great gaming experience, King has to constantly monitor its analytics to keep its 149 million active users coming back on a daily basis.

To maintain its Big Data, King started using Hadoop to store their 1 petabyte of data per year (which is equivalent to over 200,000 DVDs).

At this point, King realized they couldn’t grow their business with Hadoop alone. “Hadoop was working for us to an ex-tent, but it became clear very quickly that it was not the whole answer. Hadoop is very good at some things like cost-effecti-ve storage of vast quantities of data, but not so good at rapid analysis. So we went looking for something that could comple-ment Hadoop and address its weaknesses,” says Andy Done, Data Platform Lead at King.

After adding EXASOL to their Hadoop infrastructure, King saw massive improvements in their real-time analytics. Done exp-lains: “Having the right data available at the right time is vital for our users. Jobs that previously ran late into the afternoon are now finished and ready before anyone is even in the of-fice. That’s made a huge difference for our users and freed my team up to tackle even harder problems, handle more re-quests, and to be more responsive to the business.”

Case Study: Crushing candy with Hadoop and EXASOL

Adding EXASOL to Hadoop allows real-time analytics to become a reality.

For more information on how King Digital Entertainment complements their Hadoop cluster with EXASOL, go to: www.exasol.com/en/customers/king-case-study/

Together, EXASOL and Hadoop helps you to:

• Make faster decisions for your organization with the right information at the right time• Accelerate standard reporting• Run multi-user ad hoc analytics• Perform complex modeling• Enhance customer loyalty• Drive new revenue streams• Allow your business to extend your Hadoop investment with more speed for real-time analytics

777

Page 11: Got Hadoop? - Exasol · Hadoop is an open source framework that was created to store massive amounts of data on cost-effective, commodity hardware. Created over 10 years ago, Hadoop

11www.exasol.com • Share this whitepaper:

EXASOL turns Hadoop into an analytic repository.

If you are thinking of using Hadoop for data storage and pro-cessing, EXASOL is the perfect match when it comes to data analytics and deriving value from your Hadoop data farms. In-deed, the in-memory, analytic database offers the most pow-erful engine for Hadoop, helping you to transform your HDFS clusters from data lakes to high-powered analytic reservoirs.

As a result, you will dramatically accelerate the speed of re-porting, analysis, and data exploration of all your Hadoop-based data.

For more information on EXASOL and Hadoop, go to: www.exasol.com/hadoop

Summary

About this whitepaper: Information listed here may change after the data sheet has gone to print (November 2016). EXASOL is a registered trademark. All trademarks named are protected and the property of their respective owner. © 2016, EXASOL AG | All rights reserved

888