Setting the Scene for Big Data in Europe, Looking Ahead to the Case Studies

13
BYTE: Se*ng the scene for Big Data in Europe, looking ahead to the case studies Guillermo VegaGorgojo – Universitetet i Oslo Big data roadmap and cross-disciplinary community for addressing societal Externalities

Transcript of Setting the Scene for Big Data in Europe, Looking Ahead to the Case Studies

BYTE:                        

Se*ng  the  scene  for  Big  Data  in  Europe,  looking  ahead  to  the  case  studies

Guillermo  Vega-­‐Gorgojo  –  Universitetet  i  Oslo

Big data roadmap and cross-disciplinary community for addressing societal Externalities

@BYTE_EU www.byte-project.eu

So  far,  what  we  have  learned  in  BYTE? ◦  Big  data,  more  than  “the  3Vs”  ◦  Defini7on,  dimensions,  ac7vi7es,  applica7ons,  data  flows,  policies  

◦  Big  data  ini7a7ves  ◦  Technologies  and  infrastructures  for  big  data  ◦  Posi7ve  and  nega7ve  societal  externali7es  ◦  Economic,  legal,  social,  ethical,  poli7cal…  

 

@BYTE_EU www.byte-project.eu

What  we  expect  to  learn  through  the  case  studies?

1.  Inves7gate  which  posi%ve  and  nega%ve  societal  externali%es  do  organiza7ons  create  through  the  use  of  big  data  

2.  How  have  they  worked  to  amplify  posi%ve  externali%es  3.  How  have  they  addressed  the  nega%ve  externali%es  they  have  

encountered  

@BYTE_EU www.byte-project.eu

A  template  for  the  case  studies  in  BYTE CASE  STUDY  OVERVIEW  1.  Organiza7on  2.  Sector  3.  Case  study  moQo  4.  Execu7ve  summary  5.  Business  processes  6.  Rela7on  to  big  data  ini7a7ves  7.  Illustra7ve  user  stories  

SOURCES  OF  INFORMATION  ◦  Semi-­‐structured  interviews  ◦  Organiza7on  documents  

TECHNICAL  PERSPECTIVE  8.  Data  sources  9.  Data  flows  10. Relevant  big  data  policies  11. Main  technical  challenges  12. Big  data  dimensions  

SOCIETAL  EXTERNALITIES  13. Posi7ve  societal  externali7es  14. Nega7ve  societal  externali7es  15. Amplifying  posi7ve  externali7es  16. Addressing  nega7ve  externali7es  

@BYTE_EU www.byte-project.eu

A  model  for  the  societal  externaliKes

Ci%zens  

Public  Sector  

Private  Sector  

@BYTE_EU www.byte-project.eu

Examples  of  posiKve  and  negaKve  societal  externaliKes

Ci%zens  

Public  Sector  

Private  Sector  

+  support  communi7es  -­‐  con7nuous  and  invisible  surveillance  

+  innova7ve  business  models  -­‐  inequali7es  to  data  access  

-­‐  need  to  reconcile  different  laws  and  agreements  

+  economic  growth  through  community  building  -­‐  compe77ve  disadvantage  of  newer  businesses  and  SMEs  

+  commercializa7on  of  new  goods  and  services        +  data-­‐driven  employment  offerings              -­‐  private  data  misuse                    -­‐  invasive  use  of  informa7on  

+  accelerate  scien7fic  progress  +  transparency  and  accountability  -­‐  distrust  of  government  data-­‐based  ac7vi7es  

@BYTE_EU www.byte-project.eu

The  case  studies

Case  study   Organiza%on   Contact  partner  

Environment   ESA  and  others   CNR  

Crime   XXX   TRI  

Smart  ci7es   Siemens   Siemens  

Culture   Europeana   TRI  

Energy   Statoil   UiO  

Health   Ins7tute  of  Child  Health   TRI  

Transport   Rolls  Royce/Farstad  shipping   DNV  

@BYTE_EU www.byte-project.eu

Preliminary  case  study  analysis  for  Statoil   Case  study  overview

1.   Organiza%on    Statoil  

2.   Sector    ENERGY  

3.   Case  study  moQo    Improve  decision  making  in  oil  &  gas  explora7on  in  the  presence  of  par7al  informa7on  and  limited  7me.  

5.   Business  processes  Oil  &  gas  explora7on  decision-­‐making  

6.   Rela%on  to  big  data  ini%a%ves  Research  projects:  OPTIQUE  

 

4.   Execu%ve  summary    In  the  early  phases  of  the  explora7on  process  of  oil  and  gas  many  prospects,  i.e.  poten%al  projects,  are  at  any  7me  under  evalua7on  in  order  to  select  just  a  few  of  them  for  further  inves7ga7on.  These  decisions  are  oken  of  cri7cal  importance  for  Statoil.  However,  in  most  cases  prospects  have  to  be  selected  on  a  short  no%ce  and  on  the  basis  of  only  par%al  informa%on.  Typically,  explora7on  experts  in  these  very  early  phases  of  an  explora7on  project  spend  just  a  few  days  collec7ng  relevant  informa7on  before  they  embark  on  further  analyses;  the  data  that  is  not  found  within  this  7me  frame  is  then  simply  ignored,  and  will  hence  not  influence  the  important  selec7on  of  prospects.  If  the  geophysics  and  geology  (G&G)  experts  u7lize  all  the  data  available,  this  will  reduce  the  risk  factor  in  the  selec7on  process,  and  hence  also  increase  the  chances  that  the  ‘right’  prospects  are  selected.  In  the  end  this  will  in  all  likelihood  increase  the  number  of  successful  explora%on  projects  for  Statoil.  

@BYTE_EU www.byte-project.eu

Preliminary  case  study  analysis  for  Statoil   Technical  descripKon

8.   Data  sources      Name:  Subsurface  

   Short  descrip7on:    ◦  Seismic  survey  ◦  Seismic  &  geophysical  data  ◦  Well  and  wellbore  data  ◦  Acquisi7on  reports  

   Domain:  geophysics  and  geology  

   How  is  collected:    ◦  Seismic  shots  ◦  Well  data  from  drilling  opera7ons  ◦  Reports  from  value-­‐adding  analysis  

   Size:  ~8  PB  

   …  

 

11.   Main  technical  challenges  

   Data  storage  and  access:  VERY  CHALLENGING  ◦  G&G  experts  in  explora7on  spend  16%  of  

their  7me  on  finding  the  relevant  data  sets  and  documents  (internal  survey  of  Statoil  in  2005)  

◦  There  is  a  plethora  of  tools  to  access  and  process  the  different  kinds  of  data,  amplified  by  the  segrega7on  into  silos  

   Data  integra7on:  CHALLENGING  ◦  There  is  a  clear  need  to  integrate  the  data  

scaQered  across  different  repositories  and  databases  from  mul7ple  vendors.  For  instance,  the  provided  user  story  reflects  that  the  Subsurface  database  was  not  up  to  date  due  to  limited  integra7on  with  the  OpenWorks  project  databases  

…  

 

12.   Big  data  dimensions  

   Volume:  YES    ◦  Some  datasets  are  at  a  scale  of  PBs  ◦  Extremely  complex  queries  that  can  involve  

more  than  30  joins  

   Velocity:  NO  ◦  No  streaming  data  processing  

   Variety:  YES  ◦  Need  of  different  data  models  to  reflect  the  

views  of  Drilling  Engineers,  Petrophysicists,  Geophysicists,  Geologists  and  Reservoir  Engineers  

◦  Very  complex  data  models:  ~K  of  tables  and  ~10K  columns  

   Veracity:  YES  ◦  Some  of  the  employed  data  sources  are  

more  trustworthy  than  others    

 

@BYTE_EU www.byte-project.eu

Preliminary  case  study  analysis  for  Statoil   Societal  externaliKes

Statoil  –  Ci%zens    +  Reduced  risk  for  environment  +  Demand  for  hiring  big  data  analysts  

Statoil  –  Other  corpora%ons  +  New  work  processes  and  vendor  ecosystems  -  Data  lock-­‐in,  contracts  prohibit  access  to  data  for  third  par7es    -  Increased  risk  of  exposing  confiden7al  data  

Statoil  –  Public  sector  +  BeQer  informed  decisions  for  drilling  opera7ons  based  on  open  government  data  (FactPages)  -  Compe77ve  advantage  of  the  private  sector  w.r.t  open  data  (Statoil  doesn’t  have  to  open  their  

data,  while  it  has  access  to  public  data)  

@BYTE_EU www.byte-project.eu

Societal  externaliKes  (1-­‐3)

+  Gather  public  insight  by  iden7fying  social  trends  and  sta7s7cs  

+  Accelerate  scien7fic  progress  +  Tracking  environmental  challenges  +  Transparency  and  accountability  of  the  public  sector  +  Increased  ci7zen  par7cipa7on  +  Foster  innova7on,  e.g.  new  applica7ons,  from  

government  data  +  BeQer  services,  e.g.  health  care  and  educa7on,  

through  data  sharing  and  analysis  +  More  targeted  services  for  ci7zens,  through  profiling  

popula7ons  +  cost-­‐effec7veness  of  services  +  crime  preven7on  and  detec7on,  including  fraud  

-  Distrust  of  government  data-­‐based  ac7vi7es  -  Unnecessary  surveillance  -  Compromise  to  government  security  and  privacy  -  Private  data  misuse,  especially  sharing  with  third  par7es  

without  consent  -  Threats  to  data  protec7on  and  personal  privacy  -  Threats  to  intellectual  property  rights  (including  scholars'  

rights  and  contribu7ons)    -  Public  reluctance  to  provide  informa7on  (especially  

personal  data)  

Public  sector  –  Ci%zens    

@BYTE_EU www.byte-project.eu

Societal  externaliKes  (2-­‐3)

+  Rapid  commercializa7on  of  new  goods  and  services  +  Free  use  of  services,  e.g.  email,  search  engines  +  Enhances  in  data-­‐driven  R&D  +  Making  society  energy  efficient  +  Op7miza7on  of  u7li7es  through  data  analy7cs  +  Data-­‐driven  employment  offerings  +  Marke7ng  improvement  +  Increased  insight  of  goods  (more  transparency)  +  Increased  transparency  in  commercial  decision  

making  +  Fostering  innova7on  from  opening  data  +  Increase  awareness  about  privacy  viola7ons  and  

ethical  issues  of  big  data    +  Time-­‐saving  in  transac7ons  if  personal  data  were  

already  held  

-  Employment  losses  for  certain  job  categories  -  Invasive  use  of  informa7on  -  Risk  of  informa7onal  rent-­‐seeking  -  Discriminatory  prac7ces  and  targeted  adver7sing  -  Distrust  of  commercial  data-­‐based  ac7vi7es  -  Unethical  exploita7on  of  data  -  Reduced  market  compe77on  -  Consumer  manipula7on  -  Crea7on  of  data-­‐based  monopolies  (plaxorms  and  services)  -  Private  data  accumula7on  and  ownership    -  Private  data  leakage  -  Private  data  misuse,  especially  sharing  with  third  par7es  without  consent  -  Privacy  threats  even  with  anonymized  data  and  with  data  mining  -  Threats  to  intellectual  property  rights    -  Public  reluctance  to  provide  informa7on  (especially  personal  data)  -  “Sabotaged"  data  prac7ces  

Private  sector  –  Ci%zens    

@BYTE_EU www.byte-project.eu

Societal  externaliKes  (3-­‐3) Ci%zens  –  Ci%zens    +  Support  communi7es  -  Con7nuous  and  invisible  surveillance      Private  sector  –  Private  sector  +  Opportuni7es  for  economic  growth  +  Innova7ve  business  models  -  Barriers  to  market  entry    -  Inequali7es  to  data  access  -  Market  manipula7on    -  Challenge  of  tradi7onal  non-­‐digital  services  -  Dependency  on  external  data  sources,  plaxorms  and  services  -  Compe77ve  disadvantage  of  newer  businesses  and  SMEs  -  Reduced  growth  and  profit  among  all  business  -  Threats  to  commercially  valuable  informa7on    

Public  sector  –  Private  sector  +  Opportuni7es  for  economic  growth  +  Innova7ve  business  models  +  Support  communi7es  -  Open  data  puts  the  private  sector  at  a  compe77ve  

advantage  -  Inequali7es  to  data  access,  especially  in  research  -  Taxa7on  leakages  -  Lack  of  norms  for  data  storage  and  processing    Public  sector  –  Public  sector  -  Geopoli7cal  tensions  due  to  surveillance  out  of  the  

boundaries  of  states  -  Need  to  reconcile  different  laws  and  agreements,  e.g.  

"right  to  be  forgoQen"  Barriers  to  market  entry