The Data Driven University - Automating Data Governance and Stewardship in Autonomous and...

29
The Data Driven University Automating Data Governance & Stewardship in Autonomous & Decentralized Environments Pieter De Leenheer, PhD Cofounder and VP Innovation

Transcript of The Data Driven University - Automating Data Governance and Stewardship in Autonomous and...

The Data Driven UniversityAutomating Data Governance & Stewardship in

Autonomous & Decentralized Environments

Pieter De Leenheer, PhDCofounder and VP Innovation

What we talk about when we talk about

no Data Governance

Who approved this?

I wish these guys

spoke our

language

I can’t understand

this report !

I’ve never seen this

funding code! Who

introduced this ?

Are we sure this

definition of

‘professor’ is correct

?

The Problem

This rule is

different on our

campus!

Are we allowed to share this

student data with IR?

Glossary Search

• How frequently do you look up a word for your business?

• To what purpose?

ClarificationDifferentiation

• What are your main sources?

• Hierarchy-based navigation or key-word based search?

• Authoritative Truth or trust?

Overview

• Data Governance Operating Framework

Data Governance

Data Stewardship

Data Management

• Implementations

Stanford University Data Stewardship (SUDS)

George Washington University

Brigham Young University

• The Bigger Picture

Inter-university Data Governance in

the Flanders Research Information Space

Data Governance FrameworkData Governance Council: Governance Operating Model

Roles &

Responsibilities

Processes &

Workflow

Asset Types &

Traceability

Data Governance

Organization

Data Stewardship Activities

Data Quality

Development

IT / Operational Data Management Activities

Data

Modeling

Metadata

Lineage

Establishes & drives

Aligns & Coordinates

Reports & Escalates

Monitors & Remediates

Metadata

Scanning

Reference Data

Authoring

Data

Integration

Collibra Business

Semantics Glossary (BSG)

Collibra Reference Data

Accelerator (RDA)

Hierarchy

Management

Business &

Data Definitions

Business

Traceability

Semantic

Modeling

Mapping

Specifications

Policy

Management

Business

Rules

Data Quality

Rules

Data Quality

Reporting

Issue

Management

Reference Data

Crosswalks

Master Data

StewardshipData Quality Profiling

DQ Defect

Resolution

Collibra Data Stewardship

Manager (DSM)

Collibra Platform

Other Data Management

Vendor products

...

https://compass.collibra.com/display/COOK/Data+Governance+Operating+Model

Stanford University Data Stewardship

(SUDS)

• All Materials available here

dg.stanford.edu

• Establish foundation for

Institutional Research

• Data Quality

How many faculty do we have?

• Context and Meaning

What does faculty mean in which

context?

How is faculty data structured and

where is it stored?

• Data Usage Request

Am I allowed to use faculty or student

name and age for external reporting?

SUDS: Approach

• Decentralized

1 DG coordinator (also show vacancy)

Project staff

cross-functional working groups : natural scope

and resources

focus on BI reporting, with input from above

projects

sign off by DG coordinator and end user through

usage (full cycle)

• Step-by step; success by success

SUDS: First Success in OBIEE

reporting

REST / JSON / CSV / Excel

DG Operating Model

• What do we want to capture?

Asset Type: Business Terms, Policies, Rules, Code

Values

Attribute/Relation Type: Name, Definition, Example,

Derivations, Specializations

• Who should be involved in this process?

Communities: Finance, HR, Student, Research

Domains / subject areas: Task Management

Users and User groups

• How to execute and Monitor the process?

Key events and workflow chains

Validation rules

Roles and Responsibilities: RACI

SUDS Data Dictionary Example

+4000 data elements

Community context: Finance, HR, Research and Student

Custom attribute types and relation types

What attribute- and relation-types do we want to capture?

Out of the box but also customattribute types and relation types

What attribute- and relation-types do we want to capture?

• https://stanford.app.box.com/CollibraQuickReference

• https://stanford.box.com/UsingCollibraFields

Who is involved in the

process?

• https://compass.collibra.com/display/COOK/Role+Ty

pes

ResponsibleAccountable Informed Consulted

Who? User groups and

Dashboards

Who? – User groups and Dashboards

How to execute and monitor?

From Best Practice to Auto-Validation Rules

http://web.stanford.edu/dept/pres-provost/cgi-bin/dg/wordpress/?p=577

(generic example – not from SUDS)

How to execute and monitor?

• Status Types and Workflows

E.g., For Domains, Terms, Users, and later for Issues and Data Sharing

Agreements, we first define a “finite state machine” and then a set of

workflows that each define a transition between states. This means

workflows can trigger each other and form a complex chain.

BUSINESS SEMANTICS GLOSSARY

Candidate In Progress

Under Review

Accepted In Revision

Rejected

Term requested on

the domain page 1 1

1

2

2

3

3

2

3

Depricated

4

5

Workflows

1

2

Propose Business Term

Edit Business Term

3 Onboarding Business Term

4 Deprecate Business Term

5 Reactivate Business Term

How it it to be governed? Onboarding Workflow

(Not Stanford content - illustrative example only)

How it it to be governed? Approval Workflow

(not Stanford content - illustrative example only)

Stanford DG Program Key Results (from http://web.stanford.edu/dept/pres-provost/cgi-bin/dg/wordpress/wp-content/uploads/2014/11/Stanford_DS_CAIR_v2.pdf

• Understand data from multiple

perspectives

• Central repository of verified information

(and better data infrastructure)

• Easier access to information; less reliance

on ‘oral tradition’

• Improved data quality, consistency

• Increased understanding; thoughtful

decision-making around data

SUDS Future Directions

• Continue building engagement around

data governance (define policy), in

addition to data stewardship (enforce

policy)

• Continue building engagement, especially

by executive-level leadership

• Continue increasing visibility and

consumption of definitions and other

metadata

George Washington University(by courtesy of Ron Layne, GWU)

• centralized

• run by the DG Office division of IT

• mapping data dictionaries, rules and metrics and data sharing

agreements

• Integration with Informatica Data Quality

Flanders Research Information Space

• Providing Scientific Research Information and

Services

• Easy

• Transparent

• Open

• Timely

• Unambiguous

• Supported by Data Governance

• Qualitative meta data: e.g., definition for

project, funding codes, mappings,

classifications, etc.

• Roles and responsibilities for Information

Providers and Stiweto

• Collaborative workflows between Information

Providers and Stiweto

By courtesy of G. Van Grootel, EWI

FRIS’ Data-driven Innovation Engine

By courtesy of G. Van Grootel, EWI

The Data providers landscape

25

Universities

Research Institutes

Funders

Others

Strategic ResearchCenters

Universitiy Colleges

By courtesy of G. Van Grootel, EWI

FRIS Metamodel: an example

By courtesy of G. Van Grootel, EWI

Traceability diagram

Node Description

JRC (Joint Research Centre) The Business Term representing the Funding Source

Zevende Kader Programma.. The Business Term representin the parent Funding Source

3723 Generation 1 Funding Code Value

258 Generation 2 Funding Code Value

G3 The Funding Stream Code ValueBy courtesy of G. Van Grootel, EWI

Conclusions

• Case by Case, success by success

• Identify key events and design workflow

‘chains’ to automate governance

• To support your specific use case and the

growing DG platform you need extend

asset, relation, attribute types

• Collaboration and business user

friendliness

• BOK http://compass.collibra.com

Questions For Audience

• How much % of data user need to look up

the definition of a term?

• How many % wants to know where data

around a term is stored.

• How many business terms do you have?

• Who is in charge for data quality /

governance ?

• How much % of data definition decisions

depends on business?