Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av...

9
10.09.2015 1 COMBINING NON-SENSITIVE DATA REVEALING SENSITIVE INFORMATION A customer driven student project at NTNU Seminar at AFIN (Avdeling for forvaltningsinformatikk), UIO, 10. September 2015 2 ACANDO IS ONE OF THE LEADING NORDIC MANAGEMENT AND IT CONSULTING COMPANIES 2014 More than 1850 employees in 19 locations in eight countries Four areas of business: IT Management Consulting Management Consulting Enterprise Solutions Digital Consulting & Solutions 1990 2000 2010 1981 Frontec grunnlegges 1982 Resco grunnlegges 1999 Acando stiftes av Custos, Kinnevik, Orkla & Bank of America 2003 Sammenslåing av Acando og Frontec 2006 Sammenslåing med Resco og navnbytte til Acando 2006 Oppkjøp av IQ Consultancy og dannelse av Acando UK 2007 Oppkjøp av Abeo A/S og dannelsen av Acando Norge Acando in Norway: 2014 Oppkjøp av Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA WE HAVE A STRONG PRESENCE IN NORTHERN EUROPE Stockholm Hamburg Falun Riga Kristiansand Västerås München Pori Ålesund Bangalore Malmö Ludvika Göteborg Stuttgart Düsseldorf Helsinki Vantaa Oslo Frankfurt am Main Trondheim Braunschweig Indien Johan H. Gedde-Dahl Rådgiver [email protected] Mobile: +47 97 13 86 46 Acando Norge Tordenskioldsgate 8-10 0160 Oslo +47 93 00 10 00 www.acando.no Follow Acando Facebook Twitter Linkedin 4 Johan is an experienced Enterprise architect with more than 25 years of experience from SpareBank 1, Nordea, Forsvaret and NETS. He has expertise in transforming business strategies to a working enterprise architecture, Master Data Management and IT infrastructure. He has more than 10 years of experience as project leader. Johan has a real commitment and the power to succeed. He has extensive experience with complex decisions models and negotiations from large IT- contracts. Johan is part of a strategic focus area in Acando involving Semantics, Analytics and Big/Smart Data.

Transcript of Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av...

Page 1: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

1

COMBINING NON-SENSITIVE DATA REVEALING SENSITIVE INFORMATION A customer driven student project at NTNU

Seminar at AFIN (Avdeling for forvaltningsinformatikk), UIO,

10. September 2015

2

ACANDO IS ONE OF THE LEADING NORDIC MANAGEMENT AND IT CONSULTING COMPANIES

2014

• More than 1850 employees in 19 locations in eight countries

• Four areas of business:

• IT Management Consulting

• Management Consulting

• Enterprise Solutions

• Digital Consulting & Solutions

1990 2000

2010 1981 Frontec

grunnlegges

1982 Resco

grunnlegges

1999 Acando stiftes av

Custos, Kinnevik, Orkla & Bank of

America

2003

Sammenslåing av Acando og

Frontec

2006 Sammenslåing

med Resco og navnbytte til Acando

2006 Oppkjøp av IQ

Consultancy og dannelse av Acando UK

2007 Oppkjøp av

Abeo A/S og dannelsen av Acando Norge

Acando in Norway:

2014 Oppkjøp av

Connecta i Sverige

2014 Oppkjøp av

e-vita i Norge

NORWAY

SWEDEN

FINLAND

LATVIA

INDIA

WE HAVE A STRONG

PRESENCE IN NORTHERN

EUROPE

Stockholm Ha

mbu

rg

Falun

Riga

Kristiansand

Västerås

München

Po

ri

Ålesund

Bangalore

Malmö

Ludvi

ka

Göte

borg

Stuttgart

sseld

orf

Helsinki Vantaa

Os

lo

Frankfurt am Main

Trondheim Braunschweig

Indien

Johan H. Gedde-Dahl Rådgiver [email protected] Mobile: +47 97 13 86 46

Acando Norge Tordenskioldsgate 8-10 0160 Oslo +47 93 00 10 00 www.acando.no

Follow Acando Facebook Twitter Linkedin

4

Johan is an experienced Enterprise architect with more than 25 years of

experience from SpareBank 1, Nordea, Forsvaret and NETS. He has expertise in

transforming business strategies to a working enterprise architecture, Master

Data Management and IT infrastructure. He has more than 10 years of

experience as project leader.

Johan has a real commitment and the power to succeed. He has extensive

experience with complex decisions models and negotiations from large IT-

contracts.

Johan is part of a strategic focus area in Acando involving Semantics, Analytics

and Big/Smart Data.

Page 2: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

2

SOME AREAS OF INTEREST:

SEMANTISKE LØSNINGER

Får dataene dine tilstrekkelig oppmerksomhet?

FORVALTNING AV DATASikre tilgang på riktig, forståtte data

med kjent kvalitet på riktig sted og til

rett tid

DOKUMENTASJON AV DATAOversikt over og riktig tolkede data

KOMMUNIKASJONEnhetlig og felles språk i verbal

og digital samhandling.

Begrepsforvaltning.

GJENFINNING, ANALYSE & PREDIKSJONFinne «nåla i høystakken».

Kunnskapsbasert beslutningsgrunnlag.

5

SEMANTISKE LØSNINGER

Begrepsforvaltning

− Leverer begrepsforvaltningsløsning (Skattedirektoratet)

− Metode og begrepsarbeid (innhold) for flere etater

Samarbeidspartner på løsning Knowledge Integration

Informasjonsarkitektur – fra forretningsnivå til løsninger

Informasjonsforvaltning, inkl. arkitekturforvaltning

− Forvaltning og styring, inkl. tilgjengeliggjøring av data

− Metadata

− Begreper

− Datakvalitet

− Informasjonssikkerhet

Analyse & prediksjoner («Big data»)

− grunnlag med eller uten kjent/predefinert mening

− strukturert, ustrukturert og/eller semistrukturert informasjon

Samarbeidspartner med ulike aktører basert på behov/kunde«case»

Systemutvikling inkl. «semantic web»-løsninger

7

Kundesegment for satsningene er

både offentlig og privat sektor.

Løsningsstrategi er behovsbasert og

ikke låst til en «teknologistack».

SOME AREAS OF INTEREST:

SEMANTISKE LØSNINGER

Får dataene dine tilstrekkelig oppmerksomhet?

FORVALTNING AV DATASikre tilgang på riktig, forståtte data

med kjent kvalitet på riktig sted og til

rett tid

DOKUMENTASJON AV DATAOversikt over og riktig tolkede data

KOMMUNIKASJONEnhetlig og felles språk i verbal

og digital samhandling.

Begrepsforvaltning.

GJENFINNING, ANALYSE & PREDIKSJONFinne «nåla i høystakken».

Kunnskapsbasert beslutningsgrunnlag.

6

STUDENT DRIVEN PROJECT AT NTNU

7

12 CHOSEN PROJECTS FOR TDT4290 – AUTUMN OF 2015 - OUT OF 48 APPLICATIONS

8

TDT4290 KPRO 2015 - project groups, customers and supervisorsGroup 1 - room 054 (floor 0 IT-building)Akuttmottaket, St. Olav: Decision support tool for managing non-cooperative patients. Contact: Florentin Moser

Group 2 - room 454 (floor 4 IT-building)Oracle: GIS Query Tool. Contact: Norvald H. Ryeng

Group 3 - room 454 (floor 4 IT-building)iMaxfocus/Olympiatoppen: iMaxfocus Olympic. Contact: Sten Nilsen

Group 4 - room 454 (floor 4 IT-building)DataDrivenFinance AS: Data-Driven Claims Handling. Contact: Jan-Martin Hunderi

Group 5 - room 354 (floor 3 IT-building)Netlight AS: CrowdShelf. Contact: Peder Kongelf

Group 6 - room 242 (floor 2 IT-building)Nidaros Domkirkes Restaureringsarbeider: Cultural digital experiences in Nidaros Cathedral and The archbishops Palace. Contact: Inge Sørgård

Group 7 - room Applab (floor 0 IT-building)Seniornett Trondheim: Support for Senior Citizen participation in the information Society. Contact: Arne Sølvberg

Group 8 - room Oasen (floor 1 IT-buildingThales Norway AS: Automated XMPP conformance testing. Contact: Christian Tellefsen

Group 9 - room 260 (floor 2 IT-buildingAcando AS: Creating sensitive data from open sources. Contact: Lars Bjørnar Listhaug, Hjørdis Hoff, Johan H. Gedde-Dahl, John Arne Øye

Group 10 - room Oasen (floor 3 IT-building)Sintef Fisheries & Aquaculture: Virtual Reality 3D-Visualization. Contact: Jørgen Haavind Jensen

Group 11 - room Oasen (floor 4 IT-building)UNINETT, NTNU, Dokkhuset: User-friendly management of distributed networked musical and dance collaborations. Contact: Otto J Wittner, Leif Arne Rønningen

Gruppe 12 - room Oasen (floor 2 IT-builing)Syklistenes Landsforening: CountMe – Mobile app for counting syclists and pedestrians. Contact: Richard Sanders

Page 3: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

3

PURPOSE OF PROJECT – INTEGRATING NON-SENSITIVE DATA TO REVEAL SENSITIVE INFORMATION

Background:

More and more Governmental offices, Companies and private persons are issuing data to the public

that in nature are not sensitive.

• Open data is available through service interfaces and as data sets made available through web sites like

DIFI: http://data.norge.no/.

• When data is made available to “anyone” as Open Data, the data owner has little or no control of how

the data is used.

• Public data can be misused.

Goal of project:

• Verify the hypothesis is that integrated datasets and analysis based on combined data may

result in sensitive and/or “unwanted”/revealing information.

• How much (and what) open, non-sensitive data can be combined before it changes character from being

data to be sensitive information.

• Can open, non-sensitive semantically enriched data become sensitive when combined / integrated with

other non-sensitive data and analyzed?

9

SOME MORE HYPOTHESES…

• Individuals in small communities are more easily identifiable than people in larger towns

• Celebrities and well known individuals are more easily identifiable in large datasets than

ordinary people

• You need less than 4 different sources to identify individuals

• Compliance with the intention of “PersonOpplysningsLoven” (POL) is only met if you look at

only one source

• The definition of “sensitive” information is not sufficient. It needs a context.

• Other???

10

RESOURCES AVAILABLE TO THE PROJECT - IN AND FROM OTHER ORGANIZATIONS

Resources available to the project in and from other organizations

• Brønnøysund Register Centre is an example of a large governmental office that has large amounts of

information concerning a wide range of public and private life. Some of it is sensitive and protected

and some is open and public, like data from the register for business enterprises / Enhetsregister.

− Data from Brønnøysundregistrene is available from the last ten years on an FTP-server ready to use

− An update has been made available 7. September including final information from 2014

• IBM has made available a portion of their "Bluemix" sky-service ready to use

− Including the analysis platform Watson

• Norge.no

− A site with a lot of public information

• Twitter

• Blogs

• Other sources on the net

− … be creative ….

11

WHAT IS SENSITIVE INFORMATION?

According to Norwegian law (personopplysningsloven):

§ 2.Definisjoner, avsnitt 8

- sensitive personopplysninger er opplysninger om:

a) rasemessig eller etnisk bakgrunn, eller politisk,

filosofisk eller religiøs oppfatning,

b) at en person har vært mistenkt, siktet, tiltalt eller

dømt for en straffbar handling,

c) helseforhold,

d) seksuelle forhold,

e) medlemskap i fagforeninger

12

• Other possible view points on revealing

«unwanted» truths about someone:

− Privileged or proprietary information which, if

compromised through alteration, corruption, loss,

misuse, or unauthorized disclosure, could cause

serious harm to the organization owning it.

− Sensitive information is defined as information

that is protected against unwarranted disclosure

− Information that, if revealed, gives a

person/party an advantage over an individual

− knowledge that might result in loss of an

advantage or level of security if disclosed to

others

All economic aspects, relationships and friends, preferences, hobbies, etc are all non-

sensitive according to Norwegian law…..

Page 4: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

4

WHAT'S GOING ON IN THE MARKET

• Governmental authorities are being measured by making data available

− Kartverket

− Brønnøysundregistrene

− And all the others…

• The term sensitive information is not well defined in the context of Internet

− Is being (mis)used

− Our experience: Governmental regulations are not sufficient when combining different data sets

− example: "Personopplysningsloven"

− If person A can extract information about person B, giving person A an advantage:

− What kind of information is this and how should this information be exposed?

• Apparently there is no common holistic governmental plan for the information provided by

governmental authorities

13

STAKEHOLDERS AND INCREASED FOCUS IN THE MARKET

• Who does this topic concern?

− Everyone - this should concern each and every one of us…

• Who can get an advantage out of information made available by governmental authorities?

• Data owners

− Can the Data owner be liable for making data available without validating the potential usage?

14

PROJECT DELIVERABLES AND DATES

15

November 18th: Final presentation day.

Recipe :

• 7 student

• 350 hours each

• One interesting topic

= Interesting findings!!!

16

Page 5: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

5

COMPETENT We are experts within our

chosen industries and offerings

COMPREHENSIVE Our offering covers all aspects

within management and it consulting

CAPACITY We are one of the largest

consultancy firms in the Nordics

WHY ACANDO?

CLOSE We work close to our clients and

Have a strong presence in northern Europe

CULTURE We value the Nordic business culture and its heritage and strongly believe

that our clients, business partners and employees thrive/excel in an environment where work is not work but work is fun

18

COMBINING NON-SENSITIVE DATA REVEALING SENSITIVE INFORMATION A student project at NTNU

25. August 2015

Additional slides

EXAMPLES

20

Page 6: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

6

BEGREPER

21

WHAT IS A CONCEPT

• A concept is a unit of thought

• Corresponds to an object

• An object can be anything

• Concepts can have a definition

• Usually have a name (term)

• Optional alternative names

• Each term can have translations – one per language

• May be assosiated with symbols

PROPERTIES Example:

EXAMPLES: CONCEPTS FROM LÅNEKASSEN

23

Multilingual concepts.

English concepts does not seem to be available on the Lånekassen web site.

HVORFOR ER ET KJENT OG VELDEFINERT SPRÅK VIKTIG?

Eksempler på hva et veldefinert og kjent språk kan bidra til

kommunikasjonen blir mer effektiv fordi vi

snakker samme språk – lik benevnelse på samme ting

samme budskap formidles og tolkes likt hver gang

kan gir tilgang til definisjonene bak ordene vi bruker - åpenhet

støtter flere språk

færre misforståelser og feil beslutninger

enhetlige metadata om digitalt innhold (strukturert, semistrukturert, ustrukturert)

kortere vei til aktuell kunnskap – treffsikre søk og navigasjon

Knytte aktuelle temaer/interesser til roller og/eller brukere

24

http://www.vg.no/nyheter/innenriks/norsk-politikk/byraakratspraak-koster-300-millioner-i-aaret/a/23402276/

Page 7: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

7

Snik: ” person som skaffer seg fordeler

ved å smiske og holde seg frampå” (Språkrådet)

My free translation: A person that aquires advatages by bootlicking and being pushy

Correct message:

Noen ganger er det ikke nødvendig å vente på oss. Vi

hjelper så fort vi kan, men iblant kan du raskt og enkelt

hjelpe deg selv. På nett.

My translation:

You don’t always have to wait for us. We will help you as

fast as we can, but sometimes you can quickly and easy

help yourself. Online.

COMMUNICATION IS DIFFICULT! From NAV (Norwegian Labour and Welfare Administration)

(Snik=to sneak or a «sneaker»)

EXAMPLES OF CONCEPT USAGE

26

Home page Digital schema /

Electronic dialog

Digital information

- look up

- «tagging»

Common dictionary

Data Data exchange Search and

discovery

Analysis and prediction

Common language

Enterprise

architecture

DOCUMENTED DATA

METADATA

27

20060701

Data type Date

Format YYYYMMDD

Label Creation date

Class Business organisation /

Entity

Entity identifier 990 983 291

Entity name NAV ICT

Registration date 20110912

Page 8: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

8

WHEN CORRECT DATA IS NOT ENOUGH

29

NEEDLE IN THE HEYSTACK

30

ANALYSIS, SEARCH AND DISCOVERY

31

FIND KNOWLEDGE/PATTERNS IN DATA SETS WHERE DATASETS MAY BE INTEGRATED

32

Page 9: Combining non-sensitive data revealing sensitive information · Connecta i Sverige 2014 Oppkjøp av e-vita i Norge NORWAY SWEDEN FINLAND LATVIA INDIA ... Master and IT infrastructure.

10.09.2015

9

SEARCH AND ANALYSIS: UNCOVER INFORMATION

33