New Upstream Data Management in the...

Post on 12-Oct-2020

1 views 0 download

Transcript of New Upstream Data Management in the...

Upstream Data Management in the 2020s:New uses for old skills in the world of Big Data

Dr Duncan Irving

DEJ Aberdeen – September 29th, 2015

2

Our workflows haven’t really changed much since the first data started coming back to shore with the oil…

3

©2015 Google

How do you optimise logistics

and maintenance to minimise

shut-ins?

©2015 Google

… but now we must run our workflows at the pace of the business processes to ensure maximum value…

4

… but now we must run our workflows at the pace of the business processes to ensure maximum value…

©2015 Google

Data-driven operations

Since 2004, UPS has eliminated

millions of miles off delivery

routes through advanced

analytics. They’ve:

• Saved 45 million litres of fuel

($3.5M/yr)• Reduced CO2 emissions by

100,000 tons, equivalent to

5,300 passenger cars off the

road for an entire year.

5

©2015 Google

Where do you site a well to

maximise profitability?

...from an operational viewpoint…

6

©2015 Google

Data-driven development

Walmart, and many other

retailers, routinely integrate

functional domains and value

chains. New stores are sited

based on:

• Potential revenues

of product mix• Cost of supply to store

from one of Walmart’s 3200

Supercenters• Daily/Monthly/Seasonal

effects• Long-term demographics

...from an operational viewpoint…

7

How do you optimise production

and development on mature fields?

…and from a strategic planning viewpoint

8

…and from a strategic planning viewpoint

Data-driven planning

Since 2007, Daimler has

relied on an integrated view

of all production line,

diagnostic and dealership

data for cars and trucks.

They can:

• Identify emerging

problems in vehicles

within weeks, and

correct production

processes accordingly.

• Reduce long-term

warranty cost and

development risk.

9 © 2014 Teradata

Industry 4.0 is changing traditional manufacturing relationships

From isolated optimized cells…

Oil & Gas isn’t the only industry where everything is more joined up and going faster

10

Oil & Gas isn’t the only industry where everything is more joined up and going faster

© 2014 Teradata

Industry 4.0 is changing traditional manufacturing relationships

…to fully integrated data and product flows across borders

Source: BCG

11

Key Activities

© 2014 Teradata

Field

Reservoir

Production

Condition

Technical Domains

DEVELOP

MAINTAIN

PRODUCE

MANAGE

How does this look in E&P? Currently…

12 © 2014 Teradata

Where we need to get to – the Connected Well

Key Activities

Field

Reservoir

Production

Condition

Technical Domains

DEVELOP

MAINTAIN

PRODUCE

MANAGE

COTS

Systems

Integrators

Drilling

Companies

Service

Companies

Service

Companies

Systems

Integrators

Engineering

ConsultanciesIn-house

Expertise

13

The amount of data captured and the speed at which it must be contextualised is our Big Data problem

What does this mean for data management?

Upstream IT is the way it is for a reason – (my six S’s)

• Size

• Science

• Spatial

• Speed

• Sustainability

• Sikkerhet

14

Why is it so difficult?

Source: xkcd.com

15

Our data managers are highly skilled “librarians” who want to deploy their domain expertise much more than they do

16

How can data managers enable this?

• Use your understanding of the data to enable better re-use for analytics

• Use your knowledge of the business processes to build-in discovery and scalability

• Don’t fear the unknown – other industries have travelled down this path too

You are the answer!

What will the new data manager look like?

© 2014 Teradata

17

How to experiment with mixing data together

18

How to experiment with mixing data together

Business

TechnicalScientific

UnstructuredMulti-

structuredStructured

FORTRAN

SQL

Matlab

19

How does a company get started?

Source: xkcd.com

20

6 week project undertaken by aMSc student at the University ofManchester with public accessdata from New Zealand:

• Clean data

• Establish context

• Define targets

• Establish reliability

Supported by Teradata Oil & Gas and Analytics teams

What can we learn from a basin’s worth of subsurface data in six weeks?

Basin-scale prospectivity analytics

© 2015 Teradata

21

Created pragmatic data model from:

• LAS files

– 521 wells

– 25,081 curves

– 81,141,281 indexed data points

• Well headers

• Mud logs

• Well summary

• Completion Report

A well constrained vocabulary was fundamental to enabling numerical analysis

Establishing logical, spatial and stratigraphic relationships across wells

Technical goal: Integrate data

© 2015 Teradata

UnstructuredMulti-

structuredStructured

22

1/ Take wells apart and add geological context

ID

ID

ID

ID

ID ID

Wells Table

Log fact Table

ASCII log Table

Formation and Member Tables

Geological elements

Header

Metadata

Log curves

Well summary

Logs taken

Well parameter

ASCII Logs

Other

File version

Calcareous sandstone interbedded

with calcareous siltstone.

Interbedded mudstone, siltstone

and sandstone.

Interbedded sandstone, siltstone

and mudstone with rare limestone

stringers.

Calcareous mudstone.

Limestone.

Argillaceous siltstone and

mudstone.

Sandstone, calcareous cement.

Glauconitic siltstone and

sandstone.

Calcareous sandstone interbedded

with calcareous siltstone.

Interbedded mudstone, siltstone

and sandstone.

Interbedded sandstone, siltstone

and mudstone with rare limestone

stringers.

Calcareous mudstone.

Limestone.

Argillaceous siltstone and

mudstone.

LAS files

Geological data

Cle

an

sin

g

23

Cleaning the text data

Data problem

Words instead of numbers

Both formation and members

were missing

Repeated names signifying

the same formation/

member

Quality control

Manual replacement of strings

with integer 0.

The cell (s) was titled

‘unnamed’ and not used.

Manual vocabulary reduction

Description

‘surface’ and ‘seabed’

Entire geological

information missing

E.g. Moki A sand, Moki A and

Moki A sandstone.

24

Finding distinct peaks – e.g. thermal “spikes” around hot shales

Symbolic Aggregate approXimation

Uw

i

De

pth

Cu

rve

na

me

cu

rve

va

lue

base

dataset

analytical

dataset

Uw

i

De

pth

FTE

MP

cu

rve

va

lue

pa

rtition

ID

data

demographics

Calculate mean and standard

deviations for each

temperature curve

SAXify

Apply symbolic notation based on

data demograp

hics

match similar

text

σ, Tmean

Match with standard regular

expression notation e.g.

dd*ggdd*[d-h]

…ddddddiggddddddd…,

…dddddfhfdddddd…,

…bbbbbcddbbccbcbebb…

locate

Project matches back to original

depths in core data

set and validate

ddddddiggddddddd,

dddddfdehfdddddd,

bbbbbbbbcbbccbbbbb

Saxification

Symbol size = 3

BAABCBBBAfter

conversion

25

Finding distinct peaks – e.g. thermal “spikes” around hot shales

Symbolic Aggregate approXimation

Uw

i

De

pth

Cu

rve

na

me

cu

rve

va

lue

base

dataset

analytical

dataset

Uw

i

De

pth

FTE

MP

cu

rve

va

lue

pa

rtition

ID

data

demographics

Calculate mean and standard

deviations for each

temperature curve

SAXify

Apply symbolic notation based on

data demograp

hics

match similar

text

σ, Tmean

Match with standard regular

expression notation e.g.

dd*ggdd*[d-h]

…ddddddiggddddddd…,

…dddddfhfdddddd…,

…bbbbbcddbbccbcbebb…

locate

Project matches back to original

depths in core data

set and validate

ddddddiggddddddd,

dddddfdehfdddddd,

bbbbbbbbcbbccbbbbb

26

Workflow for classification of interbedded sandstone/mudstone and sandstone/siltstone facies:

Dynamic Time Warping

base

dataset

analytical

datasetnPath

Rebuild and

pivot

Apply time

warpinglocate

Uw

i

De

pth

Cu

rve

na

m

e cu

rve

va

lu

e

Uw

i

De

pth

GR

cu

rve

va

lu

e pa

rtition

ID

create

multiple

windows

(paths) along

datasets

Flip paths of

points into

separate

key-value

pairs and

rebuild as

table

Create

template and

map onto

partitions in

pre-built table

then calculate

goodness of fit

using DTW

Take top centile of

matches and validate

Exploit

parallelism to

match

multiple

partitionsNot performed as

a closed loop or

with any degree of

automation or

learning

Uw

i

De

pth

GR

cu

rve

va

lu

e pa

rtition

ID

Massive

storage and

rapid access

enables

novel access

paths by

spreading

partitions

across

compute

nodes

27

Workflow for classification of interbedded sandstone/mudstone and sandstone/siltstone facies:

Dynamic Time Warping

base

dataset

analytical

datasetnPath

Rebuild and

pivot

Apply time

warpinglocate

Uw

i

De

pth

Cu

rve

na

m

e cu

rve

va

lu

e

Uw

i

De

pth

GR

cu

rve

va

lu

e pa

rtition

ID

create

multiple

windows

(paths) along

datasets

Flip paths of

points into

separate

key-value

pairs and

rebuild as

table

Create

template and

map onto

partitions in

pre-built table

then calculate

goodness of fit

using DTW

Take top centile of

matches and validate

Exploit

parallelism to

match

multiple

partitionsNot performed as

a closed loop or

with any degree of

automation or

learning

Uw

i

De

pth

GR

cu

rve

va

lu

e pa

rtition

ID

Massive

storage and

rapid access

enables

novel access

paths by

spreading

partitions

across

compute

nodes

28

• A much clearer, simpler model of the reservoir with 62 members in 17 formations

• Identified un-noticed features (hot shales) and re-classified others (interbedded facies)

• Ask any question of the data with spatial, chronological and logical relationships – at scale

• An open-ended model to incorporate other data – (next step: production histories)

New insights

© 2015 Teradata

29

Conclusions

3030 © 2014 Teradata