Big Datas Big Impact on Data Warehousing_hb_final

download Big Datas Big Impact on Data Warehousing_hb_final

of 16

Transcript of Big Datas Big Impact on Data Warehousing_hb_final

  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    1/16

    EDITORS NOTE BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA PROCESSING

    SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW ANALYTICAL

    ECOSYSTEM

    Big Datas Big Impact on Data WarehousingBig data has taken the IT world by storm, with innovative technologies offering a cost-effective

    way for organizations to process and store large volumes of any type of data. BY WAYNE ECKERSON

    SECOND EDITION

  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    2/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING2

    EDITORS

    NOTE

    Big Data Makes It Huge

    The hype and reality of big data is reaching a

    crescendo. Its clear that The Apache Software

    Foundations Hadoop and NoSQL technologies

    are gaining a foothold in corporate computing

    environments. Most leading big data vendors

    now count hundreds of customers. Big data isno longer the province of Internet and media

    companies with large Web properties; compa-

    nies in nearly every industry are jumping on

    the bandwagon.

    For example, Vestas, a leading maker of wind

    turbines, uses IBMs InfoSphere BigInsights

    to model larger volumes of weather data so it

    can pinpoint the optimal placement of wind

    turbines. In the financial services industry, one

    company is using Hadoop to improve the accu-

    racy of its fraud models by addressing much

    larger volumes of transaction data.

    Technology buyers are already asking

    whether data warehouses and BI tools will

    eventually be folded into Hadoop environments

    or the reverse. Why spend millions of dollars

    on a new analytical database if you can do that

    processing without paying a dime in license

    costs using Hadoop?

    Right now, its too early to predict winners

    and losers. Its possible that all data manage-ment and analysis will run entirely on open

    source platforms and tools. But its just as

    likely that commercial vendors will co-opt

    open source products and functionality and

    use them as pipelines to magnify sales of their

    commercial products.

    More than likely, well get a mlange of open

    source and commercial capabilities. After all,

    30 years after the PC revolution, mainframes

    are still a mainstay at many corporations. In IT,

    nothing ever dies; it just finds its niche in an

    evolutionary ecosystem. n

    Wayne Eckerson

    BI Experts Panel

  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    3/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING3

    OVERVIEW

    Big Data Today Is a Tale of Two Markets

    There are two types of big data tech-

    nologies out there today. There is open source

    software, centered largely on Hadoop, which

    eliminates up-front licensing costs for manag-

    ing and processing large volumes of data. And

    then there are new analytical engines, includ-ing appliances and columnar data stores, which

    provide significantly higher price-performance

    than general-purpose relational databases.

    Both sets of big data software deliver higher

    return on investment than previous genera-

    tions of data management technology but in

    vastly different ways.

    nThe Hadoop hoopla.Hadoop is anopen source

    distributed file systemfrom The Apache

    Software Foundation that is capable of stor-

    ing and processing large volumes of data in

    parallel across a grid of commodity servers.

    Hadoop emanated from large Internet provid-

    ers, such as Google and Yahoo, which needed

    a cost-effective way to build search indexes.

    They knew that traditional relational databases

    would be prohibitively expensive and techni-

    cally unwieldy, so they came up with a low-cost

    alternative they built themselves and eventu-

    ally gave to Apache so others could benefitfrom their innovations.

    Today many companies are implementing

    Hadoop software from Apache as well as com-

    mercial distribution vendors such as Cloudera,

    Hortonworks, IBM, Intel, MapR and Pivotal.

    Developers see Hadoop as a cost-effective way

    to get their arms around large volumes of data

    that theyve never been able to do much with

    before.

    Many companies use Hadoop to store, pro-

    cess and analyze large volumes of Web server

    log data so they can get a better feel for the

    browsing and shopping behavior of their online

    customers. Before, companies outsourced the

    analysis of their clickstream data or simply let

    http://searchbusinessanalytics.techtarget.com/feature/Apache-Hadoop-spurs-hopes-but-creates-hardships-for-analytics-usershttp://searchbusinessanalytics.techtarget.com/feature/Apache-Hadoop-spurs-hopes-but-creates-hardships-for-analytics-usershttp://searchbusinessanalytics.techtarget.com/feature/Apache-Hadoop-spurs-hopes-but-creates-hardships-for-analytics-usershttp://searchbusinessanalytics.techtarget.com/feature/Apache-Hadoop-spurs-hopes-but-creates-hardships-for-analytics-users
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    4/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING4

    OVERVIEW

    it fall on the floor since they couldnt process it

    in a timely and cost-effective way. Companies

    are also turning to Hadoop to process morestructured data to improve analytical models.

    Besides being free to implement, the other

    major advantage of Hadoop software is that

    it is data agnostic. It can handle any type of

    data. Unlike a data warehouse or traditional

    relational database, Hadoop doesnt require

    administrators to model or transform data

    before they load it. You dont define a struc-ture for the data; you simply load and go. This

    significantly reduces the cost of preparing data

    for analysis compared with what happens in a

    data warehouse. Most experts assert that 60%

    to 80% of the cost of building a data ware-

    housewhich can run into the tens of millions

    of dollarsinvolves extracting, transform-

    ing and loading data. Hadoop virtually elimi-

    nates this cost. As a result, many companies

    are using Hadoop as a general-purpose stag-

    ing area and archive for all their data. So, for

    example, a telecommunications company can

    store 12 months of call detail records instead

    of aggregating that data in the data warehouse

    and rolling the details to offline storage. With

    Hadoop, organizations can keep all their data

    online and eliminate the cost of data archival

    systems. They can also let power users queryHadoop data directly if analysts want to access

    the raw data or cant wait for the aggregates to

    be loaded into the data warehouse.

    Of course, nothing in technology is ever free.

    When it comes to processing data, you either

    pay the piper up front, as in data warehousing,

    or at query time, as in Hadoop. Before querying

    Hadoop data, a developer needs to understand

    the structure of the data and all of its anoma-

    lies. With a clean, well-understood and homog-

    enous data set, this is not difficult. But most

    corporate data doesnt fit that description.

    So a Hadoop developer ends up playing the

    role of a data warehousing developer at query

    Experts assert that 60% to

    80% of the cost of building

    a data warehouse involves

    extracting, transforming andloading data. Hadoop virtually

    eliminates this cost.

    http://searchbusinessanalytics.techtarget.com/feature/Handling-the-hoopla-When-to-use-Hadoop-and-when-not-tohttp://searchbusinessanalytics.techtarget.com/feature/Handling-the-hoopla-When-to-use-Hadoop-and-when-not-to
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    5/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING5

    OVERVIEW

    time, interrogating the data and making sure

    its format and content match expectations.

    Querying Hadoop today is a buyer-bewareenvironment.

    Moreover, to run big data software, you still

    need to purchase, install and manage com-

    modity servers (unless you run your big data

    environment in the cloud, say, through Ama-

    zon Web Services). While each server may not

    cost a lot, the price adds up. But whats more

    costly is the expertise and software required toadminister Hadoop and manage grids of com-

    modity servers. Hadoop is still bleeding-edge

    technology and few people have the skills or

    experience to run it efficiently in a production

    environment. These folks are hard to find, and

    they dont come cheap.

    nThe platform approach.The other type of big

    data technology predates Hadoop and NoSQL

    variants. This version of big data is really an

    extension of existing relational database tech-

    nology optimized for query processing. These

    analytical platforms span a range of technol-

    ogy, from appliances and columnar databases to

    massively parallel processing (MPP) databases.

    The common thread among them is that

    most are read-only environments that deliver

    exceptional price-performance compared withgeneral-purpose relational databases origi-

    nally designed to run transaction processing

    applications.

    Teradata laid the groundwork for the analyti-

    cal platform market when it launched the first

    data warehouse appliance in the early 1980s.

    Sybase was also an early forerunner, shipping

    the first columnar database in the mid-1990s.Netezza kicked the market into high gear in

    2003 when it unveiled a popular analytical

    appliance and was soon followed by dozens of

    other startups. Recognizing the opportunity,

    all the big names in software and hardware

    Oracle, IBM, Hewlett-Packard and SAPsub-

    sequently jumped into the market, either by

    building or buying technology, to provide pur-

    pose-built analytical systems to new and exist-

    ing customers.

    Although the price tag of these systems

    often exceeds $1 million, customers find that

    the exceptional price-performance deliv-

    ers significant business value. For example,

    Virginia-based XO Communications recovered

    http://searchoracle.techtarget.com/feature/The-Oracle-big-data-strategy-Weighing-appliance-vs-commodityhttp://searchoracle.techtarget.com/feature/The-Oracle-big-data-strategy-Weighing-appliance-vs-commodity
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    6/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING6

    OVERVIEW

    $3 million in lost revenue from a new revenue-

    assurance application it built on an analytical

    appliance, even before it had paid for the sys-tem. It subsequently built or migrated a dozen

    applications to run on the new system, testify-

    ing to its value.

    Kelley Blue Book in Irvine, Calif., bought an

    analytical appliance to run its data warehouse,

    which was experiencing performance issues,

    giving the provider of online automobile valu-

    ations a competitive edge. For instance, thenew system reduces the time needed to process

    hundreds of millions of automobile valuations

    from one week to one day. Kelley Blue Book

    now uses the system to analyze its Web adver-

    tising businessand deliver dynamic pricing for

    its Web ads.

    Given the up-front costs of analytical plat-

    forms, organizations usually undertake a

    thorough evaluation of these systems before

    jumping aboard. First, a company must deter-

    mine whether an analytical platform outper-

    forms its existing data warehouse to a degree

    that it warrants migration and retraining costs.

    This requires a proof of concept in which the

    customer tests the systems in its own data

    center using its own data across a range of

    queries.

    The good news is that the new analyticalplatforms usually deliver jaw-dropping perfor-

    mance for most queries tested. In fact, many

    customers dont believe the initial results and

    rerun the queries to make sure that the results

    are valid.

    Second, companies must choose from more

    than two dozen analytical platforms on the

    market. For instance, they must decide whether

    to buy an appliance or a software-only system,

    a columnar database or an MPP database, or

    an on-premises system or a cloud-based ser-

    vice. Evaluating these options takes time, and

    many companies create a short list that doesnt

    always contain comparable products.

    Finally, companies must decide what role

    Given the up-front costs of ana-

    lytical platforms, organizationsusually undertake a thorough

    evaluation of these systems

    before jumping aboard.

    http://searchbusinessanalytics.techtarget.com/video/Eckerson-Big-data-analytics-weighs-a-mix-of-the-old-and-newhttp://searchbusinessanalytics.techtarget.com/video/Eckerson-Big-data-analytics-weighs-a-mix-of-the-old-and-newhttp://searchbusinessanalytics.techtarget.com/video/Eckerson-Big-data-analytics-weighs-a-mix-of-the-old-and-newhttp://searchbusinessanalytics.techtarget.com/video/Eckerson-Big-data-analytics-weighs-a-mix-of-the-old-and-new
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    7/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING7

    OVERVIEW

    an analytical platform will play in their data

    warehousing architectures. Should it serve as

    the data warehousing platform? If so, does ithandle multiple workloads easily or is it a one-

    trick pony? Organizations that have tapped out

    their SQL Server or MySQL data warehouses

    often replace them with analytical platforms

    to get better performance. But companies that

    have implemented an enterprise data ware-

    house on Oracle, Teradata or IBM often find

    that analytical platforms are best used when

    they sit alongside the data warehouse so they

    can handle new applications or existing ana-

    lytical workloads offloaded to them. This

    architecture helps them avoid a costly upgrade

    to a data warehousing platform, which might

    easily exceed the cost of purchasing an analyti-cal platform.

    The big data market is really two, separate

    but interrelated markets: one for Hadoop and

    open source data management software and the

    other for purpose-built SQL databases opti-

    mized for query processing. Hadoop avoids

    most of the up-front licensing and loadingcosts endemic to traditional relational data-

    base systems. But since the technology is

    still relatively new, there are hidden costs that

    have thus far kept many Hadoop implemen-

    tations experimental in nature. On the other

    hand, analytical platforms are a more proven

    technology but impose significant up-front

    licensing fees and potential migration costs.

    Companies wading into the waters of the big

    data stream need to carefully evaluate their

    options. n

    Evaluating analytical plat-

    forms takes time, and many

    companies create a short list

    that doesnt always contain

    comparable products.

  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    8/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING8

    TECHNOLOGY

    The Diverse World of Big Data Processing Systems

    Faced with an expanding analytical eco-

    system, business intelligence managers need

    to make many technology choices. Perhaps the

    most difficult involves selecting a data pro-

    cessing system to power a variety of analytical

    applications.In the past, these types of decisions revolved

    around selecting one of a handful of leading

    relational database management systems to

    power a data warehouse or data mart. Often,

    the choice boiled down to internal politics as

    much as technical functionality.

    Today the options arent quite as straight-

    forward, although politics may still play a role.

    Instead of selecting a single data management

    product, business intelligence managers may

    need to select multiple platforms to outfit an

    expanding analytical ecosystem. And rather

    than evaluating four or five alternatives for

    each platform, BI managers are faced with doz-

    ens of options in each category. The once lazy

    database market is now a beehive of activity.

    Staying abreast of all the new products, part-

    nerships and technological advances is now

    a full-time job. Industry analysts who make

    a living sifting through products in emerg-

    ing markets are needed now more than ever.Most analysts will tell you that the first step

    in selecting an analytical platformis to under-

    stand the broad categories of products in the

    marketplace and then make finer distinctions

    from there.

    At a high level, there are four categories of

    analytical processing systems today. The fol-

    lowing describes those categories and can be

    used as a starting point when creating a short

    list of products during a product evaluation

    process:

    1. TRANSACTIONAL DATABASE SYSTEMS

    Transactional relational database management

    http://searchdatamanagement.techtarget.com/feature/Supermarket-co-op-stocks-up-on-big-data-platform-to-help-spur-saleshttp://searchdatamanagement.techtarget.com/feature/Supermarket-co-op-stocks-up-on-big-data-platform-to-help-spur-sales
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    9/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING9

    TECHNOLOGY

    systems (RDBMSes) were originally designed

    to support transaction processing applications,

    although most have been retrofitted with vari-ous types of indexes, join paths and custom

    SQL bolt-ons to make them more palatable

    to analytical processing. There are two types

    of transactional RDBMSes: enterprise and

    departmental.

    nEnterprise hubs.The traditional enterprise

    RDBMSes, such as those from IBM, Micro-

    soft, Oracle and Sybase, are best-suited as data

    warehousing hubs that feed a variety of down-

    stream, user-facing systems but dont handle

    query traffic directly. Although retrofitted with

    analytical capabilities, these systems often hit

    the wall when used for query processing along

    with other workloads and are expensive to

    upgrade and replace. Many customers now use

    these data warehousing systems as hubs to feed

    operational data stores, data marts, enterprise

    reporting systems, data sandboxes and various

    analytical and transactional applications.

    nDepartmental marts.A number of companies

    use Microsoft SQL Server or MySQL as data

    marts fed by an enterprise data warehouse or

    as standalone data warehouses for a business

    unit or small business. Like their enterprisebrethren, these systems also often max out

    when usage, data volumes or query complexity

    increases rapidly. A fast-growing business unit

    or small business often replaces these trans-

    actional RDBMSes with analytical appliances,

    which provide the same or greater level of sim-

    plicity and ease of management as SQL Server

    or MySQL.

    2. ANALYTICAL PLATFORMS

    Analytical platforms represent the first wave

    of big data systems. These are purpose-built,

    SQL-based systems designed to provide bet-

    ter price performance for analytical workloads

    than transactional RDBMSes. There are many

    types of analytical platforms. Most are being

    used as data warehousing replacements or

    standalone analytical systems.

    nMPP database.Massively parallel processing

    databases with strong mixed workload utili-

    ties make good enterprise data warehouses for

  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    10/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING10

    TECHNOLOGY

    analytically minded organizations. Teradata

    was the first on the block with such a system,

    but it now has many competitors, includingPivotals Greenplum and Microsofts Paral-

    lel Data Warehousing option, which are rela-

    tive upstarts compared with the 30-year-old

    Teradata.

    nAnalytical appliance.These purpose-built

    analytical systems come as integrated hard-

    ware-software combinations tuned for analyti-

    cal workloads. Analytical appliances come in

    many shapes, sizes and configurations. Some,

    like IBM Netezza, Greenplum and Oracle Exa-

    data, are more general-purpose analytical

    machines that can serve as replacements for

    most data warehouses. Others, such as those

    from Teradata, are geared to specific analytical

    workloads and can deliver extremely fast per-

    formance or manage mammoth data volumes.

    nIn-memory systems.If you are looking for

    raw performance, there is nothing better than

    a system that lets you put all your data into

    memory. These systems are becoming more

    commonplace, thanks to SAP, which is betting

    its business on HANA, the in-memory data-

    base for transactional and analytical process-

    ing, and is evangelizing the need for in-memorysystems. Another contender is Kognitio. Many

    RDBMSes are better exploiting memory for

    caching results and processing queries.

    nColumnar databases.SAPs Sybase IQ,

    Hewlett-Packards Vertica, Actians ParAc-

    cel, Infobright, Exasol, InfiniDB and Sand are

    columnar databases that offer fast performance

    for many types of queries because of the way

    these systems store and compress databy

    columns instead of rows. Column storage and

    processing is fast becoming a RDBMS feature

    rather than a distinct subcategory of products.

    3. HADOOP DISTRIBUTIONS

    Hadoop is an open source software project run

    by The Apache Software Foundation for pro-

    cessing data-intensive applications in a dis-

    tributed environment with built-in parallelism

    and failover. The most important parts of the

    initial version of Hadoop were the Hadoop Dis-

    tributed File System (HDFS), which stores data

    http://searchdatacenter.techtarget.com/tip/Whats-available-in-columnar-databaseshttp://searchdatacenter.techtarget.com/tip/Whats-available-in-columnar-databaseshttp://searchdatacenter.techtarget.com/tip/Whats-available-in-columnar-databaseshttp://searchdatacenter.techtarget.com/tip/Whats-available-in-columnar-databases
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    11/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING11

    TECHNOLOGY

    in files on a cluster of servers, and MapReduce,

    a programming framework for building parallel

    processing applications that run on HDFS. AHadoop 2 update added a new key component

    called YARN, which took over cluster resource

    management from MapReduce and enabled

    users to run non-MapReduce applications on

    Hadoop systems. The open source community

    is building numerous additional components

    to turn Hadoop into an enterprise-caliber data

    processing environment. The collection of

    these components is called a Hadoop distribu-

    tion. Leading providers of Hadoop distribu-

    tions include Amazon, Cloudera, IBM, Intel,

    Hortonworks, MapR and Pivotal.

    In most customer installations, Hadoop

    serves as a staging area and online archive

    for unstructured and semi-structured data,

    as well as a sandbox for data scientists who

    query Hadoop files directly before the data is

    aggregated or loaded into the data warehouse.

    Hadoop will play an increasingly important role

    in the analytical ecosystem at most companies,

    either working in concert with an enterprise

    data warehouse or assuming most duties of one.

    4. NOSQL DATABASES

    NoSQLalso shorthand for not only SQL

    is the name given to a broad set of databaseswhose only common thread is that they dont

    require the SQL programming language to

    process data, although some support both SQL

    and non-SQL forms of data processing. There

    are many types of NoSQL databases, and the

    list grows longer all the time. These special-

    ized systems are built using either proprietary

    and open source components or a mix of

    both. In most cases, they are designed to over-

    come the limitations of traditional RDBMSes

    to handle unstructured and semi-structured

    data.

    These four categories represent just the start

    of a broader categorization of data processing

    systems geared to analytical workloads. This

    is a fast-changing field. With the multiplic-

    ity of choices available today, BI professionals

    need to understand the differences between

    data management offerings so they can posi-

    tion themselves properly in the new analytical

    ecosystem. n

    http://searchcio.techtarget.com/news/2240178592/Hadoop-framework-breathes-new-life-into-Ancestrycom-legacy-toolshttp://searchdatamanagement.techtarget.com/opinion/Slew-of-disparate-NoSQL-databases-vie-to-displace-RDBMSs-fit-by-fithttp://searchdatamanagement.techtarget.com/opinion/Slew-of-disparate-NoSQL-databases-vie-to-displace-RDBMSs-fit-by-fithttp://searchcio.techtarget.com/news/2240178592/Hadoop-framework-breathes-new-life-into-Ancestrycom-legacy-tools
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    12/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING12

    MANAGEMENT

    Fit In and Function in the New Analytical Ecosystem

    The big data revolution has arrived, and its

    transforming long-established data warehous-

    ing architectures into vibrant, multifaceted

    analytical ecosystems.

    Gone are the days when all analytical pro-

    cessing first passed through a data warehouse

    or data mart. Now data winds its way to users

    through a plethora of corporate data structures,

    each tailored to the type of content it contains

    and the type of user who wants to use it.

    The figure on page 13 depicts a reference

    architecture for the new analytical ecosystem

    that has big datas fingerprints all over it. The

    objects in blue represent the traditional data

    warehousing environment, while those in pink

    represent new architectural elements made

    possible by big data technologiesnamely,

    Hadoop, NoSQL databases, high-performance

    analytical engines and interactive, in-memory

    visualization tools.

    Most source data now flows through Hadoop,

    which primarily acts as a staging area and

    online archive. This is especially true for semi-

    structured data, such as log files and machine-

    generated data, but also for some structured

    data that companies cant cost-effectively store

    and process in SQL enginesfor example, call

    detail records in a telecommunications com-

    pany. From Hadoop, data is fed into a data

    warehousing hub, which often distributes data

    to downstream systems, such as data marts,

    operational data stores and analytical sand-

    boxes of various types, where users can query

    the data using familiar SQL-based reporting

    and analysis tools.

    Today data scientists analyze raw data inside

    Hadoop by writing MapReduce programs in Java

    and other languages. In the future, users will be

    able to query and process Hadoop data using

    SQL-based data integration and query tools.

    The big data revolution is not only about

    (Continued on page 14)

  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    13/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING13

    MANAGEMENT

    pera ona

    sys ems

    (Structureddata)

    CasualuserOperationalExtract,transform and load

    (Batch,nearreal time,orreal time)

    BI

    Operational

    system

    CEPEngine

    Machine

    dataHadoopcluster

    server

    Dept.

    data

    mart

    Datawarehouse

    Topdownarchitecture

    Web data

    V rtua san oxes Bottomuparchitecture

    Inmemory

    BIsandbox

    Free

    standingsandbox

    Audio/videodata

    Analyticalplatform

    ornonrelationaldatabase

    Power

    user

    Documents

    and

    text

    External

    data

    BI Architecture 2020

  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    14/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING14

    MANAGEMENT

    analyzing large volumes and new sources of

    data; its also about balancing data alignmentand consistency with flexible, ad hoc explora-

    tion. As such, the new analytical ecosystem

    features top-down and bottom-up data flows

    that meet all business requirements for report-

    ing and analysis.

    nThe top-down world.Here source data is

    processed, refined and stamped with a pre-

    defined data structuretypically a dimen-

    sional modeland then used by casual users

    with SQL-based reporting and analysis

    tools. In this domain, IT developers create

    data and semantic models so business users

    can get answers to known questions and

    executives can track performance of predefined

    metrics. Design precedes access. The top-down

    world also takes great pains to align data along

    conformed dimensions and deliver clean,

    accurate data. The goal is to deliver a consis-

    tent view of the business entities so users

    can spend their time making decisions instead

    of arguing about the origins and validity of data

    artifacts.

    nThe underworld.Creating a uniform view of

    the business from heterogeneous sets of data

    is not easy. It takes time, money and patience.Many departmental heads and business ana-

    lysts abandon the top-down world for the

    underworld of spreadmarts and data shadow

    systems. Using whatever tools are readily avail-

    able and cheap, these data-hungry users create

    their own views of the business. Eventually,

    they spend more time collecting and integrat-

    ing data than analyzing it, undermining their

    productivity and creating an inconsistent view

    of business information.

    nThe bottom-up world.The new analytical eco-

    system brings these prodigal data users back

    into the fold. It carves out space within the

    enterprise environment for ad hoc exploration

    and promotes the rapid development of analyt-

    ical applications using in-memory departmen-

    tal tools. In a bottom-up environment, users

    cant anticipate the questions they will ask on

    a daily or weekly basis or the data theyll need

    to answer those questions. Often, the data they

    need doesnt yet exist in the data warehouse.

    The new analytical ecosystem supports three

    (Continued from page 12)

    http://searchbusinessanalytics.techtarget.com/ebook/Unlocking-the-business-benefits-in-big-datahttp://searchbusinessanalytics.techtarget.com/ebook/Unlocking-the-business-benefits-in-big-datahttp://searchbusinessanalytics.techtarget.com/ebook/Unlocking-the-business-benefits-in-big-datahttp://searchbusinessanalytics.techtarget.com/ebook/Unlocking-the-business-benefits-in-big-data
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    15/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING15

    MANAGEMENT

    data sandboxesthat enable power users to

    explore corporate and local data on their own

    terms: (1) Hadoop, (2) virtual partitions insidea data warehouse and (3) specialized analytical

    databases that offload data or analytical pro-

    cessing from the data warehouse or handle new

    untapped sources of data, such as Web server

    logs or machine data. The new environment

    also gives department heads the ability to cre-

    ate and use dashboards built with in-memory

    visualization tools that point to a corporate

    data warehouse and other independent sources.

    Combining the top-down and bottom-

    up worlds is not easy. BI professionals need

    to assiduously guard data semantics while

    opening access to data. Business users need to

    commit to adhering to corporate data standards

    in exchange for getting the keys to the king-dom. To succeed, organizations need robust

    data governance programs and lots of commu-

    nication among all parties.

    The big data revolution brings major

    enhancements to the BI landscape. First and

    foremost, it introduces new technologies that

    make it possible for organizations to cost-

    effectively capture and analyze large volumes of

    semi-structured data. Second, it complements

    traditional, top-down data-delivery methods

    with more flexible, bottom-up approaches that

    promote ad hoc exploration and rapid applica-

    tion development. n

    http://searchbusinessanalytics.techtarget.com/feature/Data-sandboxes-help-analysts-dig-deep-into-corporate-infohttp://searchbusinessanalytics.techtarget.com/feature/Data-sandboxes-help-analysts-dig-deep-into-corporate-info
  • 5/20/2018 Big Data s Big Impact on Data Warehousing_hb_final

    16/16

    HOME

    EDITORS NOTE

    BIG DATA TODAY IS A

    TALE OF TWO MARKETS

    THE DIVERSE WORLD

    OF BIG DATA

    PROCESSING SYSTEMS

    FIT IN AND FUNCTION

    IN THE NEW

    ANALYTICAL ECOSYSTEM

    BIG DATAS BIG IMPACT ON DATA WAREHOUSING16

    ABOUT

    THE

    AUTHOR

    WAYNE ECKERSON is a member of TechTargets BI

    Experts Panel. He has more than 15 years experience in

    data warehousing, business intelligence and performance

    management. He has conducted numerous in-depth

    research studies and is the author of Performance

    Dashboards: Measuring, Monitoring, and Managing

    Your Business. Email him at [email protected].

    Big Datas Big Impact on Data Warehousing

    is a SearchDataManagement.come-publication.

    Scot Petersen | Editorial Director

    Jason Sparapani | Managing Editor, E-Publications

    Joe Hebert | Associate Managing Editor, E-Publications

    Craig Stedman | Executive Editor

    Melanie Luna | Managing Editor

    Linda Koury | Director of Online Design

    Neva Maniscalco | Graphic Designer

    Doug Olender | Publisher| [email protected]

    Annie Matthews | Director of Sales

    [email protected]

    TechTarget

    275 Grove Street, Newton, MA 02466

    www.techtarget.com 2014 TechTarget Inc. No part of this publication may be transmitted or re-produced in any form or by any means without written permission from thepublisher. TechTarget reprints are available throughThe YGS Group.

    About TechTarget:TechTarget publishes media for information technologyprofessionals. More than 100 focused websites enable quick access to a deepstore of news, advice and analysis about the technologies, products and pro-cesses crucial to your job. Our live and virtual events give you direct access toindependent expert commentary and advice. At IT Knowledge Exchange, oursocial community, you can get advice and share solutions with peers and experts.

    COVER ART: ALXR/THINKSTOCK

    STAY CONNECTED!

    Follow @SearchDataManagement on Twitter

    mailto:[email protected]://searchdatamanagement.techtarget.com/http://reprints.ygsgroup.com/m/techtargethttp://reprints.ygsgroup.com/m/techtargethttps://twitter.com/sDataManagementhttps://twitter.com/sDataManagementhttps://twitter.com/sDataManagementhttps://twitter.com/sDataManagementhttp://reprints.ygsgroup.com/m/techtargethttp://searchdatamanagement.techtarget.com/mailto:[email protected]