Business Intelligence and Big Data Analytics with Pentaho

download Business Intelligence and Big Data Analytics with Pentaho

If you can't read please download the document

Transcript of Business Intelligence and Big Data Analytics with Pentaho

Slide 1

Business Intelligence and Big Data Analytics with PentahoWelcome to the webinar onPresented by

&

www.compulinkacademy.com www.ellicium.com

Contents1An Introduction to Pentaho2Overview of Pentaho technology stack3Pentaho ETL4Data Exploration using Pentaho5Big Data with Pentaho6Getting started with Pentaho

Welcome to Open source world

ReportingActuate BIRTJasper ReportsPentahoOpen Reports

Analysis

JPivotMondrian/PentahoPALO

ETL ToolsClover ETLEnhydra OctopusTalendKettle / Pentaho

BI PlatformsJasperPentahoSpagoBI

DatabasesDerbyIngresMySQLPostgreSQL

Data Mining / Statistics Weka / PentahoRA report by the Standish Group states that adoption of open-source software models has resulted in savings of about $60 billion per year to consumers.What it means for BI and analyticsOpen-source software is computer software with its source code made available and licensed with a license in which the copyright holder provides the rights to study, change and distribute the software to anyone and for any purpose. Open-source software is very often developed in a public, collaborative manner. You already use it!!!LinuxNapsterAmazon reviews,YouTube

3

Welcome to Pentaho!!!!Commercial open source alternative for business intelligence (BI) Founded in 2004 by five founders

Management - proven BI and open source veterans from Business Objects, Cognos, Hyperion, JBoss, Oracle, Red Hat, SAS

Pioneer in Commercial open source BI Large reference able customer base, wide range of BI/DW deployments !

It offers a suite of open source Business Intelligence (BI) products called Pentaho Business Analytics providing data integration, OLAP services, reporting, dashboarding, data mining and ETL capabilities

4

Pentaho customers

What analysts are saying about Pentaho

Pentaho is the only open source company featured in Ovum's Ovum Decision Matrix for Business Intelligence. "Pentaho is one of the few vendors that provide a direct integration into Hadoop and NoSQL databases, allowing users to analyse and visualize NoSQL data alongside traditional data sources"Forrester recognized Pentaho as the sole "Strong Performer. "Pentaho provides an impressive Hadoop data integration tool." Pentaho was cited for its rich functionality and extensive integration with Apache Hadoop, and for providing certified integration with distributions from Cloudera, EMC Greenplum and Hortonworks.Passionned's Business Intelligence Tools Survey highlighted the completeness of the Pentaho product suite compared to other vendors, as well as Pentaho's significant cost-saving by pricing products per deployment, not per-user. Pentaho earned recommendation as a complete enterprise solution.Pentaho was included in Gartner's Magic Quadrant for Business Intelligence Platforms. The report, published, offers the analyst firm's insights on business intelligence vendors who meet an inclusion threshold based on annual sales, capabilities, and customer survey responses.

6

Pentaho LicensingThe current version of the Pentaho BI Platform will be distributed under the terms of the GNU General Public License (GPL).

Under the GPL, if you intend to distribute GPL-licensed code to your customers as part of other software you have created, you may, depending on the software you have created, be required to GPL that code.Companies that wish to distribute the Pentaho BI Platform have the option of purchasing a commercial license from Pentaho Corporation. A commercial license would exempt you from GPL obligations.

The GNU General Public License (GPL) is the most widely used free software license, which guarantees end users the freedoms to use, study, share and modify the software. Derived works can only be distributed under the same license terms.

7

Pentaho BI Enterprise Edition

Overview of Pentaho Stack

Pentaho BI Stack

Delivering Value in Different Deployment ModelsCoexistence with traditional proprietary BIMinimize risk/exposure with consolidated vendorsProve technology and services internallyExplore the relationship benefits of a transparent model without software lock-inCo-deployment with traditional proprietary BILeverage existing investmentsPragmatically use what worksReduce overall TCO by incorporating commercial open sourceReplacement of traditional proprietary BIUpgrade BI capabilitiesReduce TCOCapitalize on the opportunity of a disruption (software upgrade, license change, etc.) in your BI environment

Pentaho ETL

Pentaho Kettle ETLPentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:Migrating data between applications or databasesExporting data from databases to flat filesLoading data massively into databasesData cleansingIntegrating applications

Pentaho Kettle ETL

Input StepOutput StepLookup StepTransformation StepJoin StepDw StepMapping StepJob Step

Big Data Stepwww.compulinkacademy.com

Pentaho Kettle ETLSpoonGUI that allows you to design transformations and jobs Transformations and Jobs can describe themselves using an XML file or can be put in a Kettle database repository.Spoon is available as executable script and batch file to make use of tool in heterogeneous environment.PanA program to execute transformations designed by Spoon in XML or database repository. Transformations are scheduled in batch mode to be run automatically at regular intervals CarteSimple web server to execute transformations and jobs remotely.Accept an XML that contains transformation to execute and the execution configuration. Allows to remotely monitor, start and stop the transformations and jobs

Pentaho Kettle ETL

Pentaho Kettle ETL

Data Exploration using Pentaho

Pentaho DashboardsMany ways to design Pentaho dashboards

Pentaho DashboardsWhat is CDE ?* CDE is one of the plug-in for Pentaho BI Server, contributed and maintained by Pentaho Partner webdetails.* We create dashboards using this tool.* Community Dashboard Editor (CDE) was born to simplify the creation, edition andrendering processes of the Dashboards.* CDE is a very powerful and complete tool, combining front end with data sources and custom components in a seamless way.

CDE has 3 major componentsThey are.* Layout* Components* Data Sources.CDE has developed based on MVC-2 architecture of Advanced Java

Overview of Pentaho CDE

Exploring Big data with Pentaho

Main Big Data Technologies

Hadoop

NoSQL Databases

Analytic Databases

HadoopLow cost, reliable scale-out architectureDistributed computing Proven success in Fortune 500 companies Exploding interestNoSQL DatabasesHuge horizontal scaling and high availabilityHighly optimized for retrieval and appendingTypesDocument storesKey Value storesGraph databasesAnalytic RDBMSOptimized for bulk-load and fast aggregate query workloadsTypesColumn-orientedMPPIn-memory

What makes Pentaho different for big data Would you rather do this?

Scheduling

ModelingIngestion / Manipulation / Integration or this?

Pentaho Big Data IntegrationPentaho is integrated with Hadoop at many levels

Traditional ETL - Graphical designer to visually build transformations that read and write data in Hadoop from/to anywhere and transform the data on the way. No coding required HBase Read/WriteHive, Hive2 SQL Query and WriteImpala SQL Query and WriteSupport for Avro file format and snappy compression

Data Orchestration - Graphical designer to visually build and schedule jobs that orchestrate processing, data movement and most aspects of operationalizing your data preparation. HDFS Copy filesMap Reduce Job ExecutionPig Script ExecutionAmazon EMR Job ExecutionOozie integrationSqoop Import/ExportPentaho MapReduce Execution

Pentaho Big Data IntegrationPentaho MapReduce - Graphical designer to visually build MapReduce jobs and run them in cluster. With a simple, point-and-click alternative to writing Hadoop MapReduce programs in Java or Pig, Pentaho exposes a familiar ETL-style user interface.

Traditional Reporting - All data sources supported above can be used directly or blended with other data to drive our pixel perfect reporting engine. The reports can be secured, parameterized and published to the web. The reports can be mashed up with other pentaho visualizations to create dashboards.

Web Based Interactive Reporting - Pentaho's Metadata layer leverages data stored in Hive, Hive2 and Impala for WYSIWYG, interactive, self-service reporting.

Pentaho Analyzer - Leverage your data stored Impala or Hive2 for interactive visual analysis with drill through, lasso filtering, zooming, and attribute highlighting for greater insight.

Getting started with Pentaho

Getting started with PentahoDownload Pentaho from http://community.pentaho.com/Download MySQL from http://dev.mysql.com/downloads/mysql/ Download CDE from www.webdetails.pt/ctools/cde.html Read installation instructions from following blogs:http://pentaho-bi-suite.blogspot.in/2013/04/installation-of-pentaho-bi-server.html We have a Pentaho installation guide available. Please request for guide at: [email protected]

Thank you !!!

Contact us for customized Pentaho training on [email protected] [email protected] Or Call Sameer on +91-8793334411