1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

12
1 Jumbune Data Analyzer

Transcript of 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

Page 1: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

1

Jumbune Data Analyzer

Page 2: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

2

Agenda

Enterprise Data Lake

Data Analyzer

Data Analysis Challenges

?

Page 3: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

3

Data ETLing from all possible sources to Enterprise Data Lake throughReal time ingestionMicro batch ingestionBatch ingestionA unified hub makes analysis, management and access of data easier.Enterprise data lake enables ecosystem tools to collaboratively manage data.A place to store all data in its original fidelity, with the flexibility to run a variety of Enterprise workloads.

One Unified System: An Enterprise Data Lake

Page 4: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

4

Data Quality – data values as per business KPI

Data Profiling– statistical assessment of data

Data Governance – management of data

Data Lineage – define data lifecycle

Data Security – protecting data from unauthorized users

Key elements of an Enterprise Data Lake

BIG DATA

Page 5: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

5

Incremental imports may ingest Bad DataAnalyzing anomalies in HDFS dataTracking data quality over timeTracing bad data out of billions of rowsDisplaying concise meaningful results

Major challenges in Data Analysis

Page 6: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

6

Jumbune’s Data Analyzer

Page 7: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

7

Gain a better control over Data Analysis

• Quality

• Profile

• Control

• Analyse

Timelines

ViolationsBusiness

Rules

Anomalies

• Gives a centralized dashboard for profiling data quality to gain better control

• Leverage Jumbune’s infrastructure to get capabilities of remote profiling capabilities

• No data movement required for performing data profiling

• No specialized MapReduce or coding skills are required to validate data.

Page 8: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

8

Offering Data Quality and Data Profiling to Enterprise Data Lake

• Tracing the conservation of data quality on timeline, even in massive data offloading environment.

• Real time data quality monitoring tracked against customizable KPIs

• Statistic assessment of data values within a data set for consistency, uniqueness and logic.

• Gauging the data profiles as per the business rules.

Data Quality Timeline

Data Profiling

Page 9: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

9

jumbune

Data Analysis Component Data Analysis Process

HDFS/NFSRecords AnalysisData Profiling & Quality Reports

Page 10: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

10

Validates inconsistencies in data in form of :Null ChecksData Type ChecksRegular ExpressionsIn depth record level data violation reports, can be drilled to line and field level.Offers to generically specify data quality requirements according to user’s data lake.Makes impossible looking quality checks on Big Data Lake possible.Doesn’t require data to be moved out of Hadoop for testifying anomaliesCurrently, Jumbune supports HDFS, NFS as Data Lake.

Data Quality: Provides Generic way of testifying Anomalies

Page 11: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

11

Data Profiling: Provides lake insights

Remote

Centralized

Integrate

Generic

• Statistical analysis of data values present in the enterprise data lake.

• Computes various profiles that help you become familiar with data.

• Evaluating structure of the data set in the enterprise data lake according to the set of business rules.

• Helps to know whether existing data can be used for more analytics.

Page 12: 1 Jumbune Data Analyzer. 2 Agenda Enterprise Data Lake Data Analyzer Data Analysis Challenges ?

Let’s provision a clean Enterprise Data Lake

Website• http://jumbune.org

Contribute• http://github.com/impetus-opensource/jumbune• http://jumbune.org/jira/JUM

Social• Follow @jumbune Use #jumbune• Jumbune Group: http://linkd.in/1mUmcYm

Forums• Users: [email protected] • Dev: [email protected]• Issues: [email protected]

Downloads• http://jumbune.org• https://bintray.com/jumbune/downloads/jumbune