Calligraph overview for emc 09.09.2011

Calligraph Overview

Common Issues of Traditional BI High reliance on pre-processing of information (cubes

and views) limits user ability to explore the data beyond pre-programmed reports

Explosion of data stored in the DB – less than 10% of productive data, rest is derived data

Difficult for users to create new reports, which basically prevents managers and other decision makers from using the tool in day to day work for real time decision making support

Expensive consultants are needed to re-program reports – costing both money and time

Short overview of Calligraph terminology and design principles

Calligraph delivers integrated OLAP + Query & Reporting functionality for decision making support - On Line, On The Fly, On Demand;

Calligraph is aimed at the end user without any programming skills

User interacts with Calligraph in his native language and in his subject matter terms. We call it User Semantic Layer. User Semantic Layer is created according to user rights to access the data

Tables of any complexity and size and in any theoretical representation are translated into linear set of queries with automatic parallelization. This is a key characteristic which prevents “information blast” and allows to keep information processing time linearly dependent on the volume of information being processed.

Calligraph technology benefits Flexible multidimensional on-line analysis of recent data

for decision making support User generates queries and reports via direct interaction

with the system in terms of his subject domain Semantic layer of the user is formed in a strict conformity

with his rights to access the data Queries are highly parallelizable and their structure

allows optimal execution Supports connection to any relational DB via OLEDB Full conformity to all 12 classical OLAP rules

Types of tables formed by CalligraphListing table example Analytic (or cross) table

example

Important notes to the previous slide

There are only two principal types of homogeneous tables: Listing tables Analytical tables (cross tables)

Any other table is non-homogeneous and can be decomposed into components of either listing or analytic tables

Ergonomics asserts that human perception can only get homogeneous tables easily, any non-homogeneous (composed) table will be perceived partially, by picking out and analyzing homogeneous components

Definition of “Task”

User (such as manager) can have access to different types of information – for example, commercial, HR, logistics and warehouse, finance, etc – from different DBs deployed by the enterprise

To ease perception, User semantic layer can be logically split into linked fragments, which we call Task: “Commerce”, “HR”, “Warehouse”, “Finance” etc.

Technically, Task is a set of fields from different DB tables with all necessary connections between them. Each field has its own user-friendly alias. Thus we create an environment which is clear and convenient to the end user.

Calligraph configuration for the “Company” DB, converted by EMC into Greenplum format

Configuration of the user semantic layer (field names and mapping) De facto, this is example of manual creation of User Semantic Layer (automatic creation is also possible)

Semantic User LayerIs a list of field names accessible to the user, in user language

and in user subject domain terms

Definition of “Gradation”

Gradation is any field from the user semantic layer with a set of boundary conditions

Boundary conditions for the gradation are connected by logical “OR”

Conditions can be grouped into simple or extended totals Gradation is used to create a dimension in analytic table,

in the filter or in “master-detail” section

Difference from OLAP using cubes Any field from the user semantic layer can form a dimension

for analytical table All DB fields are “equal”, without separating them into

“dimensions” and “facts” Boundary conditions for the gradations can also be

described as range, mask or a formula User can create “virtual” gradation (i.e. the gradation which

is calculated by applying a formula), enabling “what-if” analysis on the fly

No need to perform pre-processing and create (and then continually increment) cubes, which limits user ability to perform analysis in a way he/she needs, as user can specify any dimension through direct interaction with Calligraph

User creates table template in any theoretically possible view on-line

All queries are performed on-line and can be parallelized

Definition of “Filter” We use filter if we need apply certain conditions to all data

in the particular query Gradation is the minimal element to form the filter Several gradations connected by a logical “AND” are

called aggregate Filter is a set of aggregates which are connected by

logical “AND” or “OR” (in any order) Calligraph sets no limits on the “depth” of the filter and its

length Filters give user a very easy and visual way to create data

filtering rules on the fly

Definition of ”Master-Detail” Any complex table can be automatically split into a set of

simpler tables by drag and drop of any gradation in “master-detail” query

Simpler table are formed by using dropped gradation boundary conditions to select the information

Example: analytic report on EMC business around the world can contain gradation “Continents”. If user moves this gradation in “master-detail”, then complex table will be split into several simpler tables which contain only information about business in every continent. If you further move gradation “country”, then every table containing information on continents will be further split in several tables with information on every country.

Definition of “Drill-Down” Any cell of the analytical table contains data which was

filtered based on the boundary conditions set for its column and row, as well as those defined by the “master-detail”.

Decision making often requires detailed understanding of the information in the analytical table – such as to understand the reasons behind unsatisfactory results.

Calligraph provides an easy way to achieve this, with maximum allowed detailing according to user rights for data access.

User can select a cell (or cells) of the table and press “Drill-Down”, and get automatically generated listing table with all fields from the analytical table.

Demo block diagram

Greenplum Master Server

Segment Segment SegmentSegment

Windows Client Machine

Calligraph

Remote Desktop

Sample Database

DEMO

Lets go to a live Calligraph demo

Current status of Calligraph Version 5.2 is available as a standalone Windows

application Hundreds of copies have been sold and are being used

within big and small enterprises Some of the Calligraph enterprise customers include

Atommash, Novorosiyskiy port, in big medical institutes and hospitals, in government (Republican Statistical Service, Russian State Parliament, Tax authorities, police departments, etc) and in small and medium businesses.

Calligraph is registered in the Russian agency of patents and trademarks

Possible ways to further develop Calligraph technology

Cloud service External reporting unit for CASE system Automatic configuration Support for Hadoop Voice input etc.

Benefits of Calligraph to EMC/Greenplum Full alignment with Greenplum data analytics and

exploration focus On demand, on the fly analysis Parallelization and speed of query execution

Calligraph can be developed as cloud service, giving access to data analytics to every user in the enterprise

Loyalty of users through ease of use and convenience Highly competitive offer in terms of functionality and

price Easier to demonstrate business value of Greenplum DB

and data analytics to the customers

Calligraph overview for emc 09.09.2011

Data & Analytics

Transcript of Calligraph overview for emc 09.09.2011