Lisa Brown and Charles Thomas LAWNET 2002 Taking the Mystery Out of Project Management.
Taking the mystery out of Big Data - Berlin - Feb 2014
-
Upload
claus-stie-kallesoe -
Category
Health & Medicine
-
view
627 -
download
3
Transcript of Taking the mystery out of Big Data - Berlin - Feb 2014
Taking (some of) the mystery out of Big Data
Claus Stie Kallesøe
7th Berlin Conference on IP in Life Sciences
Focus on Big Data
February 7, 2014
1
2
Introducing myself
Current roles:
Board of Directors, Pistoia Alliance
Head of Global Research Informatics
Background:
MSc. Pharm, Uni of Pharma Sciences, Copenhagen, 1997
Diploma Software Development, School of Engineering, Copenhagen, 2002
E-MBA, INSEAD, France, 2007
Linkedin: http://www.linkedin.com/in/clausstiekallesoe
NOT FOR PROMOTIONALUSE
Big Data –
Either VERY large datasets AND/OR other complexities
4
Characteristics of big data
Source: IBM methodology
A couple of words about scale
100’s of Megabytes
This should not be a problem. Can be hand led with Matlab, R, Ruby
10’s of Gigabytes
This can all be loaded into the RAM of a laptop
100/500 Gigabytes – 1Terabyte
2 Terabyte harddrives can be bought in the local shop for €100
Connect it to your laptop and install postgresql or a no-sql database on it
> 5 Terabytes
Now you might have a size issue
5Inspired by: http://www.chrisstucchio.com/blog/2013/hadoop_hatred.html
NOT FOR PROMOTIONALUSE
Big Data - Definition
6
"Big Data is high volume, high velocity, and/or high variety
information assets that require new forms of processing to enable enhanced decision making, insight discovery and process
optimization."
What is Big Data in Pharma R&D?
Many ideas/possibilities across Pharma R&D and marketaccess
But many of them are likley NOT real Big Data problems!
Are they relevant and can they bring insights?Yes, very much so
Should we than find a way to handle them?Absolutely
9
NOT FOR PROMOTIONALUSE
Linking R&D data
Semantic, Text indexes and search tools
10
Purpose: Build text indexes which enables fast searches across
large data sets of linked data – both internal and external data
10
Research
Databases
ClinicalTrials.gov
Clinicaltrialsregister.eu
2)
1)4)
External
databases
Clinical
Databases
3)
Today
NOT FOR PROMOTIONALUSE
What about patents?
Text mining, linking and indexing
11
Text mining of patent databases and other
sources…
Including chemicalname => structure
….followed by:
1. Convert to RDF => link with Semantic technologies
2. Enrich and load into a text index like Solr or similar
NOT FOR PROMOTIONALUSE
Pharmaceutical R&D – Future Big
Data Opportunities
12
Online social networks and health records offer a huge repository of
real-world patient data that can be used to:
identify undiagnosed patients and serious adverse events
improve understanding of health outcomes and comparative
effectiveness
For many people/companies”Big data technology” is a black box
14
”A lot of stuff”
And then the vendors go:
If
{ box = magic or money}
then
{ box = expensive}
Working within a communityA lot of tools available
15From: http://people10.com/blog/ruby-on-rails-the-popular-platform-for-web-development/
New visualisations – easy and free
http://philogb.github.io/jit/demos.html
Automated calculationsLSP Front End
Job submitted to asynccalculation server
1
2
3
4
5
5a
5b
5c
Etc……
Elasticsearch text indexes
All research assay metadata=> Google like search to find the relevant assay
All research project sharepoint workspaces=> Enable easy, fast cross project queries to find trends
19
Conclusion – Big data in Pharma R&D
Many opportunitites across R&D and market access
More data linking and data analytics than Big Data
You can use freely available tools on ”normal” hardware
No magic ”Under the hood” – it’s just data
BUT you still need to define the questions you want to answer – before diving into technology!
20