Taking the mystery out of Big Data - Berlin - Feb 2014

22
Taking (some of) the mystery out of Big Data Claus Stie Kallesøe 7th Berlin Conference on IP in Life Sciences Focus on Big Data February 7, 2014 1

Transcript of Taking the mystery out of Big Data - Berlin - Feb 2014

Taking (some of) the mystery out of Big Data

Claus Stie Kallesøe

7th Berlin Conference on IP in Life Sciences

Focus on Big Data

February 7, 2014

1

2

Introducing myself

Current roles:

Board of Directors, Pistoia Alliance

Head of Global Research Informatics

Background:

MSc. Pharm, Uni of Pharma Sciences, Copenhagen, 1997

Diploma Software Development, School of Engineering, Copenhagen, 2002

E-MBA, INSEAD, France, 2007

Linkedin: http://www.linkedin.com/in/clausstiekallesoe

Introduction

3

NOT FOR PROMOTIONALUSE

Big Data –

Either VERY large datasets AND/OR other complexities

4

Characteristics of big data

Source: IBM methodology

A couple of words about scale

100’s of Megabytes

This should not be a problem. Can be hand led with Matlab, R, Ruby

10’s of Gigabytes

This can all be loaded into the RAM of a laptop

100/500 Gigabytes – 1Terabyte

2 Terabyte harddrives can be bought in the local shop for €100

Connect it to your laptop and install postgresql or a no-sql database on it

> 5 Terabytes

Now you might have a size issue

5Inspired by: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html

NOT FOR PROMOTIONALUSE

Big Data - Definition

6

"Big Data is high volume, high velocity, and/or high variety

information assets that require new forms of processing to enable enhanced decision making, insight discovery and process

optimization."

Cool, but remember where we are!Gartner Hype Cycle 2013

7

Big Data in Pharma R&D

8

What is Big Data in Pharma R&D?

Many ideas/possibilities across Pharma R&D and marketaccess

But many of them are likley NOT real Big Data problems!

Are they relevant and can they bring insights?Yes, very much so

Should we than find a way to handle them?Absolutely

9

NOT FOR PROMOTIONALUSE

Linking R&D data

Semantic, Text indexes and search tools

10

Purpose: Build text indexes which enables fast searches across

large data sets of linked data – both internal and external data

10

Research

Databases

ClinicalTrials.gov

Clinicaltrialsregister.eu

2)

1)4)

External

databases

Clinical

Databases

3)

Today

NOT FOR PROMOTIONALUSE

What about patents?

Text mining, linking and indexing

11

Text mining of patent databases and other

sources…

Including chemicalname => structure

….followed by:

1. Convert to RDF => link with Semantic technologies

2. Enrich and load into a text index like Solr or similar

NOT FOR PROMOTIONALUSE

Pharmaceutical R&D – Future Big

Data Opportunities

12

Online social networks and health records offer a huge repository of

real-world patient data that can be used to:

identify undiagnosed patients and serious adverse events

improve understanding of health outcomes and comparative

effectiveness

TechnologiesCan we do anything on our own

13

For many people/companies”Big data technology” is a black box

14

”A lot of stuff”

And then the vendors go:

If

{ box = magic or money}

then

{ box = expensive}

Working within a communityA lot of tools available

15From: http://people10.com/blog/ruby-on-rails-the-popular-platform-for-web-development/

New visualisations – easy and free

http://philogb.github.io/jit/demos.html

Automated calculationsLSP Front End

Job submitted to asynccalculation server

1

2

3

4

5

5a

5b

5c

Etc……

https://circleci.com/

Also a lot of great tools to handle data

18

Elasticsearch text indexes

All research assay metadata=> Google like search to find the relevant assay

All research project sharepoint workspaces=> Enable easy, fast cross project queries to find trends

19

Conclusion – Big data in Pharma R&D

Many opportunitites across R&D and market access

More data linking and data analytics than Big Data

You can use freely available tools on ”normal” hardware

No magic ”Under the hood” – it’s just data

BUT you still need to define the questions you want to answer – before diving into technology!

20

Please go home and read….

21http://blog.mongohq.com/you-dont-have-big-data/

http://ask.debian.net/