Calligraph overview for emc 09.09.2011
-
Upload
vladimir-losev -
Category
Data & Analytics
-
view
28 -
download
2
Transcript of Calligraph overview for emc 09.09.2011
Calligraph Overview
Common Issues of Traditional BI High reliance on pre-processing of information (cubes
and views) limits user ability to explore the data beyond pre-programmed reports
Explosion of data stored in the DB – less than 10% of productive data, rest is derived data
Difficult for users to create new reports, which basically prevents managers and other decision makers from using the tool in day to day work for real time decision making support
Expensive consultants are needed to re-program reports – costing both money and time
Short overview of Calligraph terminology and design principles
Calligraph delivers integrated OLAP + Query & Reporting functionality for decision making support - On Line, On The Fly, On Demand;
Calligraph is aimed at the end user without any programming skills
User interacts with Calligraph in his native language and in his subject matter terms. We call it User Semantic Layer. User Semantic Layer is created according to user rights to access the data
Tables of any complexity and size and in any theoretical representation are translated into linear set of queries with automatic parallelization. This is a key characteristic which prevents “information blast” and allows to keep information processing time linearly dependent on the volume of information being processed.
Calligraph technology benefits Flexible multidimensional on-line analysis of recent data
for decision making support User generates queries and reports via direct interaction
with the system in terms of his subject domain Semantic layer of the user is formed in a strict conformity
with his rights to access the data Queries are highly parallelizable and their structure
allows optimal execution Supports connection to any relational DB via OLEDB Full conformity to all 12 classical OLAP rules
Types of tables formed by CalligraphListing table example Analytic (or cross) table
example
Important notes to the previous slide
There are only two principal types of homogeneous tables: Listing tables Analytical tables (cross tables)
Any other table is non-homogeneous and can be decomposed into components of either listing or analytic tables
Ergonomics asserts that human perception can only get homogeneous tables easily, any non-homogeneous (composed) table will be perceived partially, by picking out and analyzing homogeneous components
Definition of “Task”
User (such as manager) can have access to different types of information – for example, commercial, HR, logistics and warehouse, finance, etc – from different DBs deployed by the enterprise
To ease perception, User semantic layer can be logically split into linked fragments, which we call Task: “Commerce”, “HR”, “Warehouse”, “Finance” etc.
Technically, Task is a set of fields from different DB tables with all necessary connections between them. Each field has its own user-friendly alias. Thus we create an environment which is clear and convenient to the end user.
Calligraph configuration for the “Company” DB, converted by EMC into Greenplum format
Configuration of the user semantic layer (field names and mapping) De facto, this is example of manual creation of User Semantic Layer (automatic creation is also possible)
Semantic User LayerIs a list of field names accessible to the user, in user language
and in user subject domain terms
Definition of “Gradation”
Gradation is any field from the user semantic layer with a set of boundary conditions
Boundary conditions for the gradation are connected by logical “OR”
Conditions can be grouped into simple or extended totals Gradation is used to create a dimension in analytic table,
in the filter or in “master-detail” section
Difference from OLAP using cubes Any field from the user semantic layer can form a dimension
for analytical table All DB fields are “equal”, without separating them into
“dimensions” and “facts” Boundary conditions for the gradations can also be
described as range, mask or a formula User can create “virtual” gradation (i.e. the gradation which
is calculated by applying a formula), enabling “what-if” analysis on the fly
No need to perform pre-processing and create (and then continually increment) cubes, which limits user ability to perform analysis in a way he/she needs, as user can specify any dimension through direct interaction with Calligraph
User creates table template in any theoretically possible view on-line
All queries are performed on-line and can be parallelized
Definition of “Filter” We use filter if we need apply certain conditions to all data
in the particular query Gradation is the minimal element to form the filter Several gradations connected by a logical “AND” are
called aggregate Filter is a set of aggregates which are connected by
logical “AND” or “OR” (in any order) Calligraph sets no limits on the “depth” of the filter and its
length Filters give user a very easy and visual way to create data
filtering rules on the fly
Definition of ”Master-Detail” Any complex table can be automatically split into a set of
simpler tables by drag and drop of any gradation in “master-detail” query
Simpler table are formed by using dropped gradation boundary conditions to select the information
Example: analytic report on EMC business around the world can contain gradation “Continents”. If user moves this gradation in “master-detail”, then complex table will be split into several simpler tables which contain only information about business in every continent. If you further move gradation “country”, then every table containing information on continents will be further split in several tables with information on every country.
Definition of “Drill-Down” Any cell of the analytical table contains data which was
filtered based on the boundary conditions set for its column and row, as well as those defined by the “master-detail”.
Decision making often requires detailed understanding of the information in the analytical table – such as to understand the reasons behind unsatisfactory results.
Calligraph provides an easy way to achieve this, with maximum allowed detailing according to user rights for data access.
User can select a cell (or cells) of the table and press “Drill-Down”, and get automatically generated listing table with all fields from the analytical table.
Demo block diagram
Greenplum Master Server
Segment Segment SegmentSegment
Windows Client Machine
Calligraph
Remote Desktop
Sample Database
DEMO
Lets go to a live Calligraph demo
Current status of Calligraph Version 5.2 is available as a standalone Windows
application Hundreds of copies have been sold and are being used
within big and small enterprises Some of the Calligraph enterprise customers include
Atommash, Novorosiyskiy port, in big medical institutes and hospitals, in government (Republican Statistical Service, Russian State Parliament, Tax authorities, police departments, etc) and in small and medium businesses.
Calligraph is registered in the Russian agency of patents and trademarks
Possible ways to further develop Calligraph technology
Cloud service External reporting unit for CASE system Automatic configuration Support for Hadoop Voice input etc.
Benefits of Calligraph to EMC/Greenplum Full alignment with Greenplum data analytics and
exploration focus On demand, on the fly analysis Parallelization and speed of query execution
Calligraph can be developed as cloud service, giving access to data analytics to every user in the enterprise
Loyalty of users through ease of use and convenience Highly competitive offer in terms of functionality and
price Easier to demonstrate business value of Greenplum DB
and data analytics to the customers