Data: The Good, The Bad & The Ugly
-
Upload
lee-harland -
Category
Healthcare
-
view
116 -
download
3
Transcript of Data: The Good, The Bad & The Ugly
![Page 1: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/1.jpg)
Data: The Good, The Bad& The Ugly
Lee Harland @SciBitely
http://www.scibite.comhttp://www.slideshare.net/scibitely
Lee HarlandLilly Global IT Meeting November 2016
![Page 2: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/2.jpg)
Context• This is an invited talk I gave at Lilly’s Internal Global IT meeting on the
subject of “data”
![Page 3: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/3.jpg)
The Good
![Page 4: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/4.jpg)
http://www.nejm.org/doi/full/10.1056/NEJMp1606181
![Page 5: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/5.jpg)
![Page 6: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/6.jpg)
![Page 7: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/7.jpg)
![Page 8: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/8.jpg)
What matters to me!
![Page 9: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/9.jpg)
The Bad
![Page 10: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/10.jpg)
+ =
…. (Promotion of) the nutritional importance of spinach over other foods, lead to an increase of over 30 per cent in its
consumption during the 1920s and 30s.
The action of S. Oleracea on cardiovascular output and muscular tone
![Page 11: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/11.jpg)
Bad, Bad Data Point
1870 35.2 mg Fe/100g1937 3.52 mg Fe/100g
The mythical strength-giving properties of spinach are ... credited to a simple mistake concerning the iron content of the vegetable.
In 1870, Dr E von Wolf published figures which were accepted until the 1930s, when they were rechecked
This revealed that a decimal point had been placed wrongly and that the real figure was only one tenth of Dr von Wolf's claim
![Page 12: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/12.jpg)
Still Making Headlines After 140 Years2013
![Page 13: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/13.jpg)
There Is No Decimal Point
Error
![Page 14: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/14.jpg)
X X
![Page 15: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/15.jpg)
X
![Page 16: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/16.jpg)
Spinach: One Small Data Point, One Huge Mess
1870 35.2 mg Fe/100g1937 3.52 mg Fe/100g
✓✓
Both Values Are Correct – The difference is down to the assay conditions
![Page 17: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/17.jpg)
http://www.merriam-webster.com/dictionary/provenance
![Page 18: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/18.jpg)
35.2
35.2
The datapoint + its provenance (experimental context)
What people saw
![Page 19: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/19.jpg)
So What?
![Page 20: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/20.jpg)
……estimates for the reproducibility of preclinical research range from 51 percent to 89 percent. They estimate that at least half of all U.S. preclinical biomedical research funding—about $28 billion annually—is therefore squandered……
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165
![Page 21: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/21.jpg)
http://www.merriam-webster.com/dictionary/provenance
![Page 22: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/22.jpg)
Provenance Is A Critical Component of Reproducibility
What L cells, where from, how old, epigenetic profile
etc etc?
When, how often, in what way, using what
system?????
What, when, how?
Could you accurately reproduce this experiment from this method?
* I was responsible for this paragraph
![Page 23: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/23.jpg)
http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html
A first-of-a-kind analysis of Bayer's internal efforts to validate 'new drug target' claims now not only supports this view but suggests that 50% may be an underestimate; the company's in-house experimental data do not match literature claims in 65% of target-validation projects, leading to project discontinuation.
![Page 24: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/24.jpg)
This is where Informatics & Data Science can add real
value toDrug Discovery
![Page 25: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/25.jpg)
Open PHACTS https://www.openphacts.org/
![Page 26: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/26.jpg)
Open PHACTS: Adding Provenance To Data
http://nanopub.org/
![Page 27: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/27.jpg)
.sub:Head {this: np:hasAssertion sub:assertion ;np:hasProvenance sub:provenance ;np:hasPublicationInfo sub:pubinfo ;a np:Nanopublication .}
sub:assertion {nx:NX_P35712 bfo:BFO_0000066 ts:TS-0276 ; # Protein NX_P35712 is localized in tissue TS-0276ro:has_quality "positive" .}
sub:provenance {<http://www.nextprot.org/help/quality_criteria/silver> a eco:ECO_0000205 ;rdfs:label "neXtProt silver"^^xsd:string .sub:_1 a efo:EFO_00027688 .sub:_10 a eco:ECO_0000218 .sub:_2 a eco:ECO_0000218 .sub:_9 a efo:EFO_00027688 .sub:assertion prv:usedData <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000087&organ_id=EV:0100115&gene_id=ENSG00000110693> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000088&organ_id=EV:0100115&gene_id=ENSG00000110693> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000090&organ_id=EV:0100115&gene_id=ENSG00000110693&stage_children=on> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000092&organ_id=EV:0100115&gene_id=ENSG00000110693&stage_children=on> , <http://bgee.unil.ch/bgee/bgee?page=expression&action=data&stage_id=HsapDO:0000094&organ_id=EV:0100115&gene_id=ENSG00000110693&stage_children=on> ;wi:evidence <http://www.nextprot.org/help/quality_criteria/silver> ;a eco:ECO_0000220 ;rdfs:comment " data, NX_P35712 is expressed in Endometrium"^^xsd:string ;prov:wasDerivedFrom sub:_1 , sub:_3 , sub:_5 , sub:_7 , sub:_9 ;prov:wasGeneratedBy sub:_10 , sub:_2 , sub:_4 , sub:_6 , sub:_8 .}
sub:pubinfo {sub:_11 a eco:ECO_0000205 .sub:_12 a eco:ECO_0000205 . sub:_15 a eco:ECO_0000205 .this: dcterms:created "2014-09-19T00:00:00.0Z"^^xsd:dateTime ;dcterms:rights <http://creativecommons.org/licenses/by/3.0/> ;dcterms:rightsHolder <http://nextprot.org> ;prv:usedData "neXtProt database" ;pav:authoredBy "CALIPHO project" , <http://orcid.org/0000-0001-6710-1373> , <http://orcid.org/0000-0001-6818-334X> , <http://orcid.org/0000-0002-1303-2189> , <http://orcid.org/0000-0003-1813-6857> ;pav:versionNumber "3" ;prov:wasGeneratedBy sub:_11 , sub:_12 , sub:_13 , sub:_14 , sub:_15 .} http://nanopub.org
![Page 28: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/28.jpg)
https://explorer.openphacts.org
![Page 29: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/29.jpg)
One of the few user interfaces where provenance is intrinsically “there”
![Page 30: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/30.jpg)
The Ugly
![Page 31: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/31.jpg)
80-90% of all potentially usable business information may originate in unstructured form
https://en.wikipedia.org/wiki/Unstructured_data
The Ugly
![Page 32: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/32.jpg)
“Carboxypeptidase B2” “Thrombin-ActivatableFibrinolysis Inhibitor”
“Plasma CPU”
The True Picture(they are the same thing)
![Page 33: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/33.jpg)
It hasn’t just got 3 names its got LOTScarboxypeptidase B-like protein OR thrombin-activatable fibrinolysis
inhibitor OR CPB type 2 OR Carboxypeptidase type B2 OR plasma carboxypeptidase type B OR carboxypeptidase type B2 OR
CPB2 OR Plasma carboxypeptidase type B OR CPB-2 OR carboxypeptidase B2 (plasma),carboxypeptidase U OR
Carboxypeptidase type U OR carboxypeptidase type U OR plasma carboxypeptidase B2 OR carboxy-peptidylase U OR thrombin-
activable fibrinolysis inhibitor OR plasma carboxypeptidase type B2 OR carboxypeptidase B2 (plasma OR CPU OR
carboxypeptidase B2 OR PCPB OR pCPB OR Carboxypeptidase U OR plasma carboxypeptidase B OR TAFI OR Carboxypeptidase B2
OR Plasma carboxypeptidase B OR Thrombin-activablefibrinolysis inhibitor OR carboxypeptidase B2 plasma OR
carboxypeptidase R
![Page 34: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/34.jpg)
“We also manually standardized data related to lab measurement units and terminology related to patient race and ethnicity, geographical study regions, and names of drugs and drug families. “
Yet Another Issue
![Page 35: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/35.jpg)
(an accident waiting to happen)
![Page 36: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/36.jpg)
VARCHAR2PROJ_TITLE
EXPERIMENT_INFO
ASSAY_DESCRIPTION
KEYWORDS
USER_PROFILE SUMMARY
EXPT_METADATA
SETTINGS_INFO
REPORT_TEXT
EXPT_NAME
Databases: Where Knowledge Goes To Die
MEETING_MINUTES
PROJ_ACTIONS
ASSAY_CONLCUSIONCOHORT_DESC
INCLUSION_CRITERIA
POLICY_DETAILS
PROJECT_OVERVIEWRATIONALE
JUSTIFICATION
![Page 37: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/37.jpg)
Text2Data MicroService
TERMiteSupports basic keyword search only
TEXT Rich substrate for search and discovery & insight
DATA
![Page 38: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/38.jpg)
Just What Is “The Data”?• Mentions of all
• Genes, Diseases, Drugs, Tissues, Cells, Techniques, Assays, Measures, Protocols, Compounds, Regimens, Companies, People, Locations, Pathologies, Adverse Events, Pathways, Metabolism, Manufacturing Concepts, QC/QA, Pathogens, Strains, Animals … and so on...
• … And their relationships to each other• … And their locations (section, database column)• … Inferring relationships between documents/entries• … Regardless of actual keyword used
![Page 39: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/39.jpg)
Systems Integration Guide
http://yourcompany.com/termite?text=<content>app=<application name>index=<e.g. page, table or column name>
![Page 40: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/40.jpg)
ELN Screening Registry
PDMRegistry
ProjectManagement Sharepoint
Whats going on, right now
![Page 41: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/41.jpg)
Trending Today
![Page 42: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/42.jpg)
![Page 43: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/43.jpg)
![Page 44: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/44.jpg)
Why Give Ugly Data A Makeover?• ELN annotation using Bioassay Ontology
• Find all experiments using any Cell Flourescence technique”• Pharmacovigilance
• Monitoring newsfeeds & internal data for safety signals• Automatic Process Notification
• Alert groups based on content of CRO documents Etc• Synergise Both Semantic Technology & Information Professionals
• Re-energise Therapeutic Area Literature Searching• Build Knowledge Chains (Assertional Provenance)
• Project Management à ELN Data à Screen SOP
![Page 45: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/45.jpg)
Before I go…..
![Page 46: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/46.jpg)
Spinach: The Truth Is Out There!
Spinach is highin iron (!)
..oxalic acid in spinach prevents more than 90% of iron from being
absorbed..
Acknowledgement
![Page 47: Data: The Good, The Bad & The Ugly](https://reader033.fdocuments.in/reader033/viewer/2022042722/589d37271a28abd17d8b5a57/html5/thumbnails/47.jpg)
Acknowledgements
IMI Open PHACTS Team(many more involved, I just don’t have a photo L )http://openphacts.org
SciBite Teamhttp://scibite.com