The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights....

6
White Paper IBM Software The value of IBM InfoSphere BigInsights

Transcript of The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights....

Page 1: The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights. Enterprise software integration . Integrating InfoSphere BigInsights—and its analysis

White PaperIBM Software

The value of IBM InfoSphere BigInsights

Page 2: The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights. Enterprise software integration . Integrating InfoSphere BigInsights—and its analysis

2 The value of IBM InfoSphere BigInsights

The IBM® InfoSphere® BigInsights™ software platform helps firms discover and analyze business insights hidden in large volumes of a diverse range of data. This data—including log records, clickstreams, social media data, news feeds, email, electronic sensor output and even some transactional data—is often ignored or discarded because it’s too impractical or difficult to process using traditional means.

IBM designed InfoSphere BigInsights to help firms analyze such data, basing it on open source Apache Hadoop. But what makes this IBM platform unique—that is, what value does it provide? Let’s take a closer look.

An introduction to InfoSphere BigInsightsInfoSphere BigInsights includes a variety of IBM technologies that enhance and extend the value of open source Hadoop software. As Figure 1 shows, these technologies range from application accelerators to analytical facilities, development tools, platform improvements and enterprise software integration.

Applications and accelerators With more than 20 sample applications and two accelerators, InfoSphere BigInsights helps firms quickly benefit from their big data platform. Users can easily launch these software components from the InfoSphere BigInsights web console as well as customize them using graphical tools.

Some of the sample applications include web crawling, data import/export, data sampling, social media data collection and analysis, machine data processing and analysis, ad hoc queries and more. Accelerators—extensive toolkits with dozens of pre-built software artifacts—enable firms to quickly deploy solutions for analyzing social media and machine data (such as log records, sensor data and more).

Figure 1: IBM technologies included with InfoSphere BigInsights.

InfoSphere BigInsights helps customers analyze large volumes of documents and messages with its built-in text processing engine and library of context-sensitive extractors. Developers can quickly query and identify items of interest, such as persons, email addresses, phone numbers, URLs and business alliances, to understand the context and content of relevant business information hidden in unstructured text.

In addition, programmers can use the included Eclipse tools to create their own text analytic functions. Built-in pattern discovery, expression builders and a test environment promote rapid prototyping and validation of custom text annotators.

InfoSphere BigInsights

Select third-partyHadoop distribution

Applications andaccelerators

Analysis andvisualization

Rapid-developmenttooling

Administration and platform enhancements

Enterprise software integration

IBM-supplied and testedHadoop distribution or

Page 3: The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights. Enterprise software integration . Integrating InfoSphere BigInsights—and its analysis

3

To help business analysts and non-programmers benefit from big data, InfoSphere BigInsights offers a spreadsheet-like discovery and visualization facility. Through the InfoSphere BigInsights web console, users can collect and integrate a variety of data into a spreadsheet structure, explore and manipulate that data using built-in functions and macros, create charts and export the results, if desired.

IBM Cognos® Business Intelligence users can leverage the familiar query, reporting and analytical tools with data managed by InfoSphere BigInsights as well as other supported data sources.

Administrative and platform enhancements The web console provides a real-time view of the InfoSphere BigInsights environment. Through this console, you can start and stop nodes, inspect the status of jobs (applications), review log records, assess the overall health of your platform, start and stop optional components, navigate your distributed file system, launch monitoring facilities and more.

In addition, InfoSphere BigInsights provides enhanced security by supporting Lightweight Directory Access Protocol (LDAP) authentication to its web console. LDAP and reverse proxy support enable administrators to restrict access to users with appropriate authorization. With secure REST-based access, programmers can invoke web console services easily. InfoSphere BigInsights also provides a credentials store in its distributed file system, enabling firms to encode and maintain sensitive data (such as passwords) for application use.

Furthermore, IBM InfoSphere Guardium® can monitor and audit data activities for InfoSphere BigInsights, enabling administrators to take prompt action when inappropriate activities and potential security breaches are detected.

Firms concerned about workload management can use a flexible job scheduling mechanism to fine-tune resource allocation among long-running and short-running jobs. For example, the InfoSphere BigInsights scheduler can be directed to allocate maximum resources to small jobs to help ensure they complete quickly (and thereby fulfill average response time objectives). This job-scheduling option is available in addition to Hadoop’s first in/first out (FIFO) and “fair” scheduling approaches.

Other InfoSphere BigInsights performance features include efficient processing of text-based compressed data as well as techniques for processing certain application tasks in a way that adapts to the runtime environment and workload. In addition, developers working with large volumes of text data can take advantage of the sophisticated enterprise search and indexing support offered by IBM InfoSphere Data Explorer. Finally, IBM Platform™ Symphony features a high-performance distributed runtime engine for MapReduce applications.

Rapid application development Application developers can leverage several InfoSphere BigInsights technologies to quickly design, develop, test, deploy and publish their applications in the web catalog. Eclipse tools feature wizards and graphical editors for developing Java MapReduce, Jaql, Hive, Pig and text analytic applications. Plus, programmers and analysts can use graphical tools to chain together published applications, quickly creating workflows for complex scenarios.

Page 4: The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights. Enterprise software integration . Integrating InfoSphere BigInsights—and its analysis

4 The value of IBM InfoSphere BigInsights

Enterprise software integration Integrating InfoSphere BigInsights—and its analysis of big data—with existing enterprise software is a key IBM initiative. This is why InfoSphere BigInsights provides connectors for popular data warehouse platforms, including IBM DB2®, IBM Netezza® and non-IBM offerings. The connectors enable developers to join reference data from a relational database with data managed by InfoSphere BigInsights to refine and expand their analysis. In particular, InfoSphere BigInsights provides developers with specialized JDBC connectors for Netezza and DB2 so they can transfer data to and from these sources in a way that exploits native database parallel processing for efficiency and scalability. To access other relational data sources, InfoSphere BigInsights provides generic JDBC connectivity.

In addition, InfoSphere BigInsights provides sample user-defined functions for Netezza and DB2 that enable users of those offerings to launch queries in InfoSphere BigInsights, join the output with data in their relational databases, and present the results to database users and applications. For sophisticated extract-transform-load (ETL) needs, IBM InfoSphere DataStage® supports InfoSphere BigInsights as both a source and target for data.

For streaming data, InfoSphere Streams can continuously analyze massive amounts of data with very low latency, enabling firms to quickly react to trends and events as they unfold. Programmers can instruct InfoSphere Streams to write data as needed to InfoSphere BigInsights for deep analysis of trends over time. The lessons learned from such

analysis can be captured and fed back to InfoSphere Streams to fine-tune application logic and actions.

The IBM approach addresses the full range of big data needs and ensures that big data will not be isolated from traditional enterprise data. You can use your existing enterprise software platforms to do what they do best, and leverage InfoSphere BigInsights for analytical workloads that are not appropriate or practical for these platforms. Enabling each resource to perform its intended duties allows you to broaden your business analysis capabilities in an integrated fashion.

Quick start, reduced risk To help organizations initiate their big data projects quickly, InfoSphere BigInsights provides a pretested and preconfigured platform that features a compelling collection of popular open source and IBM technologies.

Installation via a web-based toolInstead of iteratively downloading, configuring and testing the individual open source projects required for a comprehensive software platform, firms can invoke the InfoSphere BigInsights web-based installation tool to quickly obtain a working environment. This integrated installation process saves considerable time, as many open source projects have specific software prerequisites that may be incompatible with certain versions of other desired components. Further, some open source offerings require code compilation and other post-installation efforts that can lengthen the time necessary to get up and running.

Page 5: The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights. Enterprise software integration . Integrating InfoSphere BigInsights—and its analysis

5

The flexible InfoSphere BigInsights installation tool lets you specify which optional components to install and how to configure your platform. Details about the installation progress are reported in real time, and a built-in “health check” automatically verifies the success of the installation. Cloudera users who want to take advantage of unique InfoSphere BigInsights features can use the IBM installation tool to quickly deploy InfoSphere BigInsights on that Hadoop distribution. However, if you do not want to install InfoSphere BigInsights on your own hardware, IBM and its business partners offer cloud services—you pay only for the resources you use.

Support via IBM standard agreementsDeploying mission-critical applications on a platform comprising multiple open source projects often requires extra maintenance. While community-based open source support groups and forums can be very helpful, they are not obligated to fulfill any service-level agreements or respond to urgent inquiries, so many firms find that they must maintain deep in-house expertise of the code base to fix any bugs or shortcomings that may surface. Hence, diagnosing and resolving platform problems—as well as upgrading the platform to maintain state-of-the-art technology—becomes each company’s responsibility.

InfoSphere BigInsights helps organizations avoid this complexity and operational risk. The standard IBM software licensing and support agreements apply to InfoSphere BigInsights, eliminating legal concerns about the terms and conditions of certain open source projects and minimizing the operational risk of deploying software that does not include a technical support contact.

The InfoSphere BigInsights Enterprise Edition includes many useful technologies such as Hadoop, Pig, Hive, HBase, Jaql, Lucene, Oozie, Avro, Flume, HCatalog, Sqoop and Zookeeper—as well as IBM-unique software that offers text analytics, a web console, a spreadsheet-like analysis tool, pre-built applications, application accelerators for social and machine data, performance features and more. Refer to the InfoSphere BigInsights wiki (ibm.com/developerworks/wiki/biginsights) and the product InfoCenter (http://publib.boulder.ibm.com/infocenter/bigins/v1r1/index.jsp) for details.

IBM: Your source for big data software and expertiseThe depth and breadth of IBM expertise spans a wide range of enterprise software, hardware and services, enabling firms that partner with IBM to approach their big data projects with confidence and clarity. To learn more about what IBM can do for you and your big data projects, visit: ibm.com/bigdata

For details about InfoSphere BigInsights and to connect with its community, visit: ibm.com/developerworks/wiki/biginsights

Page 6: The value of IBM InfoSphere BigInsights - · PDF fileThe value of IBM InfoSphere BigInsights. Enterprise software integration . Integrating InfoSphere BigInsights—and its analysis

© Copyright IBM Corporation 2013

IBM Corporation Software Group Route 100 Somers, NY 10589

Produced in the United States of America January 2013

IBM, the IBM logo, ibm.com, Cognos, DataStage, DB2, Guardium, InfoSphere and Platform are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at ibm.com/legal/copytrade.shtml

Netezza is a trademark or registered trademark of IBM International Group B.V., an IBM Company.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.

Please Recycle

IMW14684-USEN-00