To Be or Not to Be a BI Appliance Embracer by Haranath Gnana

4

Click here to load reader

Transcript of To Be or Not to Be a BI Appliance Embracer by Haranath Gnana

Page 1: To Be or Not to Be a BI Appliance Embracer by Haranath Gnana

<-- Back to full color view

To Be or Not to Be a BI Appliance Embracer

by Haranath Gnana

Originally published 23 September 2009

Printer-friendlyEmail to a friendEmail to myselfComments

I was at a business intelligence (BI) presentation recently, and a professor fromBerkeley characterized the current data explosion using the phrase “IndustrialRevolution of data.” This resonated nicely as it highlighted a key contributing factor tothe increase in data volumes we are challenged with, i.e., data produced byautomated systems such as self-service teller machines, the Internet, cell phones,etc. Given the continued growth in all of these systems, the rate at which data isexpected to grow is continuing to increase. Enterprises, like it or not, have to bracethemselves for this data onslaught.

Most of the traditional databases such as Oracle, DB2 and Microsoft have managedto deliver business intelligence (BI) value with data volumes up to 4 or 5TB, and thisis possible only with expensive high-end iron. Successful data management beyond5TB has been almost impossible for an average enterprise IT shop to even imaginerunning on these traditional database platforms.

The continued success of Teradata and the very successful initial public offering(IPO) of Netezza just a couple years ago is a clear indication of the value ofinnovation in this space. Teradata’s success has primarily been with its target of theFortune 500 customers who have deep pockets to invest in its proprietaryhardware/software/services solution. Netezza, on the other hand, attempted toexpand the set of customers that could leverage such BI technologies by significantlyreducing the entry-level pricing/affordability of its solution. But it still requires aproprietary hardware/software solution. Both these players pushed the limits of the“shared nothing” massively parallel processing (MPP) architectures to scale to manytens of terabytes. This approach beat the shared architectures of the traditionaldatabase players hands down. However, both these solutions are still relatively moreexpensive, and the proprietary nature of their hardware platform has not been wellreceived by many enterprises.

Google has proven that it is possible to leverage commodity hardware in anextremely effective manner and still deal with data volumes that are orders ofmagnitude greater than that of the largest enterprise datasets. The big upside ofworking with commodity hardware is that you can benefit from the billions of dollarsthat the hardware companies are pumping into their products, constantly lowering the

Page 2: To Be or Not to Be a BI Appliance Embracer by Haranath Gnana

price and improving performance. An architecture that leverages commodityhardware can gain the benefits of this ever-evolving platform.

This commodity hardware-based architecture has became the foundation for severalBI appliance start-ups, attempting to bring to the enterprise “structured” dataenvironments what Google has done for the unstructured world. Players likeGreenplum, Dataupia, Kognitio and Aster Data have all pioneered this approach withsome variations. These new players have also based their solution on the shared-nothing MPP architectures. As expected of start-ups, these players have beenextremely aggressive in highlighting and proving their key value proposition, i.e.price/performance ratio.

I’ve been involved in two BI appliance bake-offs over the last year; and in both cases,these new players have had a very significant upper hand for the price/performancevalue. Also, their ability to scale out linearly, leveraging commodity hardware hasbeen a huge value proposition for enterprises.

Most of these players do offer the choice of either a “software-only” solution onrecommended hardware platforms (restricted more from a support perspective) or apackaged solution which includes hardware and software, providing additionalflexibility for IT organizations to choose the type of hardware they would like to get.

The BI appliance players make big claims of performance gains not just on the“querying” of data, but also the loading process. In one of the proofs of concept(PoCs), I put this loading process to the test. There was a particular load processbuilt with an established ETL company’s solution that was taking about 33 hours tocomplete. This 33-hour process included loading data from flat files to the stagingarea and then into a star schema and then building a set of aggregates. This processincluded data inserts, deletes and updates testing all of the load operations. EachPoC run had to start with the same set of flat files and at the end of the run have datain all of the final tables including the aggregates. We did a table-by-table differentialat the end of each run to compare them with the baseline tables to ensure that the runproduced the same results.

Even though these appliances claimed the ability to deliver the significantperformance gains without aggregates, we ensured that they built all of theaggregates. This was primarily for two reasons:

To ensure that we had a clear baseline for comparison on the load processperformance.These aggregates could not be eliminated as that would require the rewrite of abunch of reporting and analytical applications that had been built on top ofthese aggregates.

It took each of the appliance players less than a week to build the scripts to mimic the33-hour load process. Two of the appliance players that participated in this bake-offhad performance gains that they were able to prove which were mind blowing to saythe least. Both the players were able to bring down the load time from 33 hours toabout 30 minutes. Just incredible!! I do want to state that the process did not havevery complex transformation, but still this performance gain was way too significant toignore.

Even though these BI appliances showcased significant performance gains on thequery side as well. The IT management was so impressed by the load performancegains that it was enough to make a business case for it.

Page 3: To Be or Not to Be a BI Appliance Embracer by Haranath Gnana

I also included the simultaneous load and query tests to see how effective they werein minimizing the downtimes of these BI environments. Both players had architectedtheir systems to support querying and data loads to happen simultaneously,eliminating the traditional bottlenecks and non-availability situations. So they wereable to prove that there was no degradation in performance when dealing with mixedload tasks as well.

Most projects in this customer’s environment required to plan for differentdevelopment, testing and preproduction environments during a project life cycle.Many a times creating these environments was in the project’s critical path, and eachof these environment setups needed anywhere from 3-5 business days. With the newBI appliance platform, this task could be cut down to under an hour which resulted insignificantly lowering the project costs. This was one of the key selling points for theappliance business case.

To conclude, I would strongly encourage every enterprise dealing with growing datavolumes, even as small as a terabyte, to explore the appliance options and leveragethe huge value that it can provide. BI appliances are here to stay and the soonerenterprises embrace them, the sooner they will be able to leverage the performancegains to deliver incredible value to their business users at a price point that does notneed them to file for Chapter 11.

SOURCE: To Be or Not to Be a BI Appliance Embracer

Haranath Gnana

Haranath Gnana is a Senior Principal at Saama Technologies with more than15 years of IT experience. He has spent more than 10 years focusing onenterprise business intelligence and data warehousing services. He hasconsulted for many clients in multiple industry verticals such as high tech, biotechnology and finance. His expertise ranges from helping define the BI roadmap and strategy for an enterprise, to its translation into an operational realityand, as such, has been instrumental in evangelizing BI at many of hisengagements. He has led several cutting-edge initiatives involving BIappliances and BI software-as-a-service (SaaS) models for enterprises. He canbe reached at [email protected].

Related Stories

Who Doesn’t Need a Data Warehouse?The Advantages of Data Warehouse Appliances Revisited

Comments

Page 4: To Be or Not to Be a BI Appliance Embracer by Haranath Gnana

Want to post a comment? Login or become a member today!

Be the first to comment!

Copyright 2004 — 2011. Powell Media, LLC. All rights reserved. BeyeNETWORK™ is a trademark of Powell Media, LLC