Zen and the Art of Datanauting

22
Exploration of large and complex data estates to gain an accurate understanding of the data structures and data quality Zen, and the art of Datanauting Carl Bray Product Manager, Ontology Systems Matt Clark Design Authority, BSkyB

description

Exploration of large and complex data estates to gain an accurate understanding of the data structures and data quality. Presentation given by Ontology Systems and BSkyB at SemTechBiz - The Semantic Technology & Business Conference on October 2nd 2013

Transcript of Zen and the Art of Datanauting

Page 1: Zen and the Art of Datanauting

Exploration of large and complex data estates to gain an accurate understanding of the data structures and data quality

Zen, and the art of Datanauting

Carl Bray Product Manager, Ontology Systems

Matt ClarkDesign Authority, BSkyB

Page 2: Zen and the Art of Datanauting

DatanautingBoldly going where no data integrator has gone before…

2

Page 3: Zen and the Art of Datanauting

3

15 years of transaction data

10 million+ customers

900 engineers making changes

30 TB of data

20+ Applications

Q) How do you start to understand this data estate?

Page 4: Zen and the Art of Datanauting

The company

• UK subsidiary of a global media organisation

• Provides fixed line telephone, Internet and television entertainment services to UK residents

• 10 million+ customers, trading for 15 years

Business drivers:

• Driven by marketing innovation

• Extend and upsell to customer base

• React to competitive threats

• Technical infrastructure impacting commercial agility

The motivation behind the projectBackground and Business Drivers

4

Page 5: Zen and the Art of Datanauting

Objective

• Significantly reduce the time to capture new business strategies in IT systems

Significant change in IT delivery

• Embrace Agile delivery of new functionality

• Develop new payment and sales systems

• Access and extend existing data

• Multiple SCRUM teams using test-driven development

• Phased delivery

Short-term technical drivers

• Quickly understand the structure, nature and consistency of the existing data

Longer term technical drivers

• Introduce a service-based semantic agent to access software services

Fundamentally changing the way IT functionality is deliveredA new IT Strategy

5

Page 6: Zen and the Art of Datanauting

Subject matter experts (SMEs)

• Understanding the data means interfacing with SMEs

• Multiple SCRUM teams need access to SMEs

• Knowledge is in Silos and not co-located with SCRUM teams

• SMEs may not know the answers

Bottleneck / Choke point

• SCRUM teams need quick answers to data / process questions

• SME bandwidth stifles SCRUM agility

• Introduces a single project bottleneck/choke point

Overwhelming the SMEs

• Free and unfettered access to the SMEs would create chaos

• Need to filter questions to the SMEs

ChallengesMany technical challenges stood in their way

6

CRM

Billing

Ref Data

Debt

Orders

Ticketing

Content

Product

SME

SCRUM

SCRUM

SCRUM

SCRUM

SME

SME

SME

SME

SME

SME

SME

Page 7: Zen and the Art of Datanauting

Many systems with complex interdependencies

• CRM

• Billing

• Reference Data

• Debt processing

• Order handling

• Trouble ticketing systems

• Subscriber card management systems

• Content access entitlements

• Product catalogue

Fragmentation

• Business entities fragmented

• “Customer” properties in many systems

The Scope and Scale of the ProblemPayments and sales system involving 20+ systems and legacy data

7

Page 8: Zen and the Art of Datanauting

Data estate problems

• Data quality isn’t consistent

• Data fragmentation is high

• Understanding the data is complex

• How are business entities stored in different applications and data sources?

• What impact should processes have on the data – flags, statuses, etc.

• When data is duplicated, which data sources should take preference?

• Scale of data

• 30+ TB of historic trading data

• 3 Vs - The Variety and Volume of data are very high

The Data30TB of transactional data over 15 years of system changes

8

?

Page 9: Zen and the Art of Datanauting

Non-semantic alternatives

• Train more SMEs

• Work around SME’s other priorities

• Educational workshops

• Take time to document systems

Data-profiling alternatives

• Reverse engineering schemas

• ETL Tooling

• Didn’t want to create yet another data warehouse

Chose a datanauting approach

• Supports their commitment to Agile development

• Allows SCRUM teams to explore and ask questions of the data without overloading SMEs

AlternativesAlternative approaches to solving the problem were considered

9

Page 10: Zen and the Art of Datanauting

What we do, and why we’re different

• Ontology leverages graph and semantic search technologies to address enterprise data issues

• We address complex data integration problems

• Data Acquisition

• Data Correlation

• Data Migration

• We produce fully fledged operational applications that use semantic search in

• Telecommunications

• Media

• Financial services

• The Ontology Difference

• Inherently agile – no schema

• Datanauting: data-first, structure later

• Just enough modelling

• Structured and unstructured data

How we approached the problemThe Ontology Approach

10

Page 11: Zen and the Art of Datanauting

Exploration of data sources…The Ontology Approach - Datanaughting

Identify sources

Connect to sources•Index source

Search for entities•Refactor entities•Create URI pattern matching•Map entities to RDF

Search for linked entities•Add references

Search for equivalent entities•Create matching URIs•Map entities to RDF

Page 12: Zen and the Art of Datanauting

• DBs

• SPARQL Endpoints

• Structured files

• MS Excel, CSV, XML, RDF

• CISCO and other device configurations

• Propriety formats

• Unstructured files

• MS Word, PDFs, etc.

The Ontology Approach - DatanaughtingIdentify sources

Identify sources

Connect to sources

Search for entities

Search for linked entities

Search for equivalent

entities

Page 13: Zen and the Art of Datanauting

• Setup the connection

• Index sources

• Add search facets

• Tokenise compound values e.g.

• Service names are concatenated “Service-LON/01”

• Product names use “CamelCase”

The Ontology Approach - DatanaughtingConnect to sources

Identify sources

Connect to sources

Search for entities

Search for linked entities

Search for equivalent

entities

Page 14: Zen and the Art of Datanauting

• Search for business entities

• Refactor “denormalised” data

• Choose a URI pattern to represent instances

• Set a type for the entity

• Map properties to owl:DatatypeProperty

The Ontology Approach - DatanaughtingSearch for entities

Identify sources

Connect to sources

Search for entities

Search for linked entities

Search for equivalent

entities

Page 15: Zen and the Art of Datanauting

• Search for entities that should be linked

• Add references (owl:ObjectProperty) between entities that are to be linked

The Ontology Approach - DatanaughtingSearch for linked entities

Identify sources

Connect to sources

Search for entities

Search for linked entities

Search for equivalent

entities

Page 16: Zen and the Art of Datanauting

• Search for semantically equivalent entities in other data sources

• Search based on property names

• Search based on strict value matching/weighting

• Search based on sub-string matching/weighting

• Reuse the URI pattern

• Create references

The Ontology Approach - DatanaughtingSearch for equivalent entities

Identify sources

Connect to sources

Search for entities

Search for linked entities

Search for equivalent

entities

Page 17: Zen and the Art of Datanauting

High-level solution to the problems the organisation faced

• Removed the SME bottleneck - a key enabler for the Agile / SCRUM approach

• Creates a searchable domain model, breaking the data into discrete “chunks”

• Ontology allows the SCRUM teams to understand the legacy data through ad-hoc queries

• Can understand how business concepts are mapped across multiple contradictory data repositories

• The quality and suitability of data can more easily be assessed

• Provides a definitive view of the commercial position for a given subscriber or set of subscribers

• Backlog and sprint priorities are based on a complete understanding of the complexity of the task

• Provide data to facilitate mock ups and test harnesses

Ontology provides SCRUM members with insight into the dataProject Results

17

Page 18: Zen and the Art of Datanauting

18

Project ResultsSCRUM teams gain insight into data

CRM

Billing

Ref Data

Debt

Orders

Ticketing

Content

Product

SME

SCRUM

SCRUM

SCRUM

SCRUM

SME

SME

SME

SME

SME

SME

SME

}

Page 19: Zen and the Art of Datanauting

Project ResultsProduct Architecture

19

Modeller

External Event

Sources

Web UI

Ontology Intelligent 360 Ontology Integrity Manager

Semantic Graph Store

Query API

Universal Search Core

Semantic Processing Core

Universal Search Core

Aut

hen

ticat

ion

and

N

otifi

catio

n

LDAP Server

(optional)

Mail Server

(optional)

HTTPS

RTIA

Fully Modelled Data Sources

CSV

RDBMS

XML

JDBC

XLS

Other Data Sources

DOC PDF XLS MAIL

XML

Ontology 4 Modeller Ontology 4 RuntimeHTTPS

End Users(Browser Access)

Page 20: Zen and the Art of Datanauting

Variety

• Ability to access data in a variety of formats

• Avoid integration to live systems

• Possible to work from database - dumps avoids politics

• Embracing change – inherently agile

Volume

• Ontology techniques for managing data scale

• Partial index of data

• Partial modelling

• Semantic search with SQL query to live systems

Velocity

VarietyVolume

Project ResultsDealing with two large Vees

20

Page 21: Zen and the Art of Datanauting

Why Ontology?

• Agile response through inherently agile technology

• Datanauting provides agile response to SCRUM teams

• SME time can now be used for valuable queries

Technical advantages

• No Schema, No Integration, No Big Bang, No Search Restrictions, No Upfront Risk

Benefits delivered

• Speed – Greatly accelerated the analysis phase of the project

• Risk – Project is not viable without an understanding of the data

No Upfront

Risk

No Schema

No Integration No Big Bang

No Search Restrictions

Zen, and the art of DatanautingAdvantages of the Ontology approach to Data Integration

21

Page 22: Zen and the Art of Datanauting

Learn More

To learn more about Ontology Systems,

or to access more detailed information

about our products and services, please

either:

Call +44 20 7239 4949

Visit ontology.com

Email [email protected]

Subject to change. All rights reserved. © 2013

 

No part of this document may be reproduced in any

form or by any means for any purpose without our

written permission. All other trademarks appearing

in this document are acknowledged as the trademarks

of their respective owners.

 

Ontology-Partners Limited trading as Ontology Systems

Ontology Systems

Phoenix Yard,

65 Kings Cross Road,

London WC1X 9LW,

UNITED KINGDOM

Registered in England No. 5794201.

Registered Office.

Dalton House,

60 Windsor Avenue,

London SW19 2RR

UNITED KINGDOM