Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta

27
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta June 30, 2016 Apache Atlas

Transcript of Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Extend Governance in Hadoop with Atlas Ecosystem:Waterline, Attivo & Trifacta

June 30, 2016

Apache Atlas

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Disclaimer

This documentmaycontain productfeatures andtechnology directions thatareunderdevelopment, may beunderdevelopment inthefutureormayultimately notbedeveloped.

Projectcapabilities arebased oninformation thatis publicly available within theApache Software Foundationprojectwebsites ("Apache"). Progress oftheprojectcapabilities canbetrackedfrominception toreleasethroughApache, however, technical feasibility, marketdemand, userfeedback and theoverarching ApacheSoftware Foundation community development process canalleffecttiming andfinal delivery.

This document’s description ofthesefeatures andtechnology directions does notrepresentacontractualcommitment, promise orobligation fromHortonworks todeliver thesefeaturesinany generally availableproduct.

Product featuresand technology directions aresubject tochange, andmust notbeincluded incontracts,purchase orders, orsales agreements ofanykind.

Sincethis document contains anoutline ofgeneral productdevelopment plans, customers should notrelyupon itwhenmaking purchasing decisions.

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

STRUCTURED

UNSTRUCTURED

Vision - Enterprise Data Governance Across Platforms

TRADITIONALRDBMS

METADATA

MPP APPLIANCES

Project 1

Project 5

Project 4

Project 3

METADATA

Project 6

DATALAKE

Atlas: Metadata Truth in Hadoop

Data Managementalong the entire data lifecycle with integrated provenance and lineage capability

Modeling with Metadataenables comprehensive data lineage through a hybrid approach with enhanced tagging and attribute capabilitiesInteroperable Solutionsacross the Hadoop ecosystem, through a common metadata store

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Governance Ready Certification Program

DiscoveryTagging

Prep /Cleanse

ETL

GovernanceBPM

Self Service

Visualization

Choice: Customers choose features that they want to deploy—a la carte versus vendor lock

Curated & Fast: Selected group of vendor partners to provide rich, complimentary and complete features ready to deploy

Agile: Low switching costs, Faster deployment and innovation

Centralized: Common SLA & common open metadata store

Flexibility: Interoperability of products through Atlas metadata

Safe: HDP at core to provide stability and interoperability

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Governance Ready Certification Program

DiscoveryTagging

Prep /Cleanse

ETL

GovernanceBPM

Self Service

Visualization

The Apache open source community is committed to collaboration which critical for proper data governance. Partners have adopted this commitment and are extending governance capabilities by integrating their products with Atlas -- which is providing a rich innovative community with a common metadata store backed by Atlas. This session will showcase 3 vendors:

– Waterline– Attivo– Trifacta

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Additional Atlas Sessions

• BOF:ApacheKnoxandApacheRangerprovideHadoopsecuritywhileAtlasprovidesaHadoopmetadatastoreandenterprisecompliance.Comelearnanddiscusssecurity&governanceinnovationsandfuturedirections.

Thursday 5-7 PM @ Room 210A

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn More:

• Hortonworkslinks:http://hortonworks.com/solutions/security-and-governance/

• Tutorials:https://github.com/hortonworks/tutorials/tree/atlas-ranger-tp/tutorials/hortonworks/atlas-ranger-preview

Waterline DataThe Smart Data Catalog Company

Unlock The Value Of The Data Lake With Waterline Data’s Smart Data Catalog

Time To Value

AUTOMATICALLY catalog data assets across ALL the data AND enable SELF-SERVICE access

Tribal Knowledge Sharing

AUGMENT semantic discovery by CROWDSOURCING tribal data knowledge

Trust

Enable AGILE GOVERNANCE with automated tagging, data stewardship, and SECURE SELF-SERVICE access to data based on role and policy

Shopping Metaphor For “Managed” Self-Service: Amazon.com

Catalog Find, Understand And Collaborate Provision

Waterline Data Is Like Amazon For The Data Lake

Catalog Find, Understand And Collaborate Provision

Workflow Of Enabling Self-Service Analytics With Hortonworks

Hortonworks Atlas And Ranger

Data Prep Analytics & Visualization

Smart Data Discovery

Profiling, Sensitive Data & Data

Lineage Discovery, Automated Tagging

Data Stewardship

Curate Tags

Self-Service Data Catalog

Find, Collaborate And Take Action

Metadata, Tags, Data Lineage

Metadata, Tags, Roles & Access Control

Roles & Access Control

Demo

Waterline DataThe Smart Data Catalog Company

UNIFY YOUR DATA ACROSS SILOSJoe LichtmanVice President, [email protected]

1 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL

WHO IS ATTIVIO?

Attivio unifies your data across silos to provide a 360° view of your

business

2 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL

Gartner Magic Quadrant For Enterprise Search, Q3 2015

Forrester Wave: Big Data Search and Knowledge Discovery Solutions,Q3 2015

Forrester Wave:Big Data Text AnalyticsPlatforms, Q2 2016

Gartner Magic Quadrant For Enterprise Search, August 2015

Forrester Wave:Big Data Search and Knowledge Discovery Solutions, Q3 2015

LEADER IN SEARCH, DATA DISCOVERY AND TEXT ANALYTICS

3 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL

SEMANTIC DATA CATALOG

Attivio radically reduces time spent finding and understanding data sources to speed time-to-analytics.

• Catalogs all your enterprise information• Identifies what’s most relevant• Unifies all structured and semi-structured sources

in a visual model• Provisions leading BI and predictive analytics

tools such as Qlik, R, RapidMiner, Spotfire, and Tableau

58%of the effort for BI initiatives is wasted on data exploration and integration

33%of businesses cite big data discovery as a challenge they are facing

50%Businesses use less than half of their available data for BI

4 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL

CATALOG ALL YOUR ENTERPRISE INFORMATION

• Spiders and extracts metadata for all information types

• Automatically catalogs data and content with semantic meaning

• Applies human expertise to fine-tune tagging and align with business rules

5 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL

IDENTIFY THE RIGHT INFORMATION

• Delivers natural language and keyword search

• Provides an eCommerce-like shopping cart for data

• Recommends the most relevant data for your context

6 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL

UNIFY THE INFORMATION FOR YOUR ANALYTIC CONTEXT

• Automatically generates data models

• Correlates all structured data and unstructured content

• Simplifies provisioning to BI and advanced analytic tools

7 © 2016 ATTIVIO | PROPRIETARY AND CONFIDENTIAL

PROVISION & OPERATIONALIZE DATA AS A STRATEGIC ASSET

• Provision directly to agile BI and analytics tools

OR

• Rationalize the data warehouse for greater simplicity and lower cost

• Power domain-specific apps

THANK YOU!

Trifacta + Hortonworks:Apache Atlas Integration

DATA WRANGLING

What is Data Wrangling?

2

QUESTION ANALYZE INSIGHTDISCOVER STRUCTURE CLEANSE ENRICH VALIDATE PUBLISH

INGESTION

ACCESS

DATA SOURCES

Transactional Databanking

credit cardslendingwealth

mortgagesledgerstrades

payments

Interaction Datasocial

webchat

Analytics

Reporting

Data Product Models

BUSINESS OPERATIONS

Data Wrangling within the Hortonworks Data Lake

Discovery Zone

SharedZone

Raw DataZone

Trifacta + Hortonworks & Apache Atlas

GovernanceIngestion

Metadata & APIs

Data Wrangling

Analysis & Consumption