Clinical Standards Workshop
Scope of today’s meeting• Challenge:
– Understand why implementation, curation, and recording against ontology/standards is so challenging
• Existing Solutions:– Understand the existing standards, tools, and ontologies that
can be leveraged and identify what specific improvements can be made
• Future Improvements: – Define how current tools can be leveraged for greater uptakes
by stakeholders
Oct 2016
Clinical Standards Workshop
My COI Declarations• I was an academic for 10 years, worked with
various communities OBO, W3C• Worked with pharma before and after and I’m
now commercial• I founded EFO ~8 years ago• I’m married to a former curator, now developer
(she’s paid more as a developer – this is important because I’m going to talk about value)
Oct 2016
Clinical Standards Workshop
The Challenge• Two components I want to talk about:• Technical – hard for users• Institutional – evaluating the value
proposition
Oct 2016
Clinical Standards Workshop
Technical Challenges• CDISC is a wonderful
thing• Has helped harmonise
clinical templating in amazing ways
• But it’s complicated…• Domain has a vast
range of technical skills
Oct 2016
Clinical Standards Workshop
Technical challenges• There are 519 bio-ontologies – where do start?
Oct 2016
• Ontologies are directed graphs, graphs are complex• So much so there’s a mathematical discipline;
Graph Theory, so it must be hard
Clinical Standards Workshop
Seven Bridges of Königsberg
Oct 2016
Clinical Standards Workshop
Ontologies reflect biology, so they’re complicated
Oct 2016
Clinical Standards Workshop
Finding the right terms• How does a user know if the term is right?• How does a user know which ontologies to use?• How does a user report an error or a missing term?• In www these things are easy to do, e.g. wikipedia
– search, edit, comment, etc.• Users have come to expect this of most online
resources• And so they should
Oct 2016
Clinical Standards Workshop
Language in ontologies • Can be obtuse, opaque, mired in philosophy, not always written in user’s
domain language• Can be effective in capturing knowledge but not in communicating to the user
Oct 2016
Clinical Standards Workshop
Ontology speak a different language• Assay class• Translation:• an assay is a planned process which
evaluates a material and produces a measurement on that material
• User should not see that… • …but how then can they evaluate the
fitness and correctness of the class?• Accessibility barrier – also barrier to
engagement?
Oct 2016
Clinical Standards Workshop
Institutional Challenges• Harder than technical – about mindsets• Curation is perceived as a ‘separate job’• Process is seen as:
– Slow– Manual– Expensive– Of lower value than pure “Big Data” approaches
• “I do not have budget to hire curators” is something I’ve heard recently from multiple sources
• Does ontology use/applying standards buy you anything?
Oct 2016
Clinical Standards Workshop
Spectrum of skills & perceived value
Oct 2016
Computational skills
Biol
ogic
al e
xper
tise
Sys Admin
Application Developer
UX
Medical/clinician
Bioinformatician
Curator
Bench biologist
Mythical Being
Undergrad
• Curators are probably the best fit for skills required
• And arguably the least ‘in-demand’
• Who will play the role of curator?
Clinical Standards Workshop
The Value Proposition • Not just about short-term data integration• Curating data with ontologies/standards means you can:
– sharing it in ways that are widely understood,– help make it reproducible– Increase possibility of spotting errors, inconsistencies– academics - add possibility of citing your data when someone reuses
• Conflicts with institutional demands• Organisations want results today, possibly tomorrow• Next year is a lifetime away, cycles are mostly <1 year• “For wider adoption, the value proposition has to be that curation with
ontologies and other standards provides value immediately.”
Oct 2016
Clinical Standards Workshop Oct 2016
Obvious examples: e.g.Query “Male”
Clinical Standards Workshop
Existing Solutions• Three main categories of solution:
– 1. Fully automated – pure computer science– 2. Fully manual – pure biologist– 3. Hybrid approach – bit of both
Oct 2016
Clinical Standards Workshop
Big Data approaches• Rely on the power of algorithms such as Map-Reduce, NLP or
‘machine learning’ features to align data in some way• Can do a great job at de-duplication and alignment on text
Oct 2016
Clinical Standards Workshop
..But challenges still remain• Postpone, not address, the issue of
applying standards• Develop local ‘schema’, resolves local
integration• Ignores wider integration – pre-competitive
collaboration increasing• Many islands of integration, disconnected
Oct 2016
Clinical Standards Workshop
Avoiding Data ‘Parallel Play’• Developmental stage (also
called social coaction) • Playing separately from
others but close to them and sometimes mimicking their actions, but not interacting
• Data, integrated, together, separately from other efforts
Oct 2016
Credit: Photo by Tup Wanders, CC BY 2.0
Clinical Standards Workshop
Modes of access are important• In our own annotation
software we tested various guises of editing
• Still ended up with spreadsheets
• Familiar modes of access are preferred
• User should be first thought not after thought
• Software should help as much as possible
Oct 2016
https://kusp.factbio.com
Clinical Standards Workshop
Future Improvements
Oct 2016
Clinical Standards Workshop
Standards matter, semantics matter
• Aligning on ontologies is not the same as aligning on a word or a number
• The ontology class means something
• You can ask questions• It refers to a model of
knowledge which can be queried
Oct 2016
Clinical Standards Workshop
Standards matter, semantics matter
• Should not be just about supporting import of a format
• Should be about supporting the creation and export using standards
• Future proof: linked data is not going away• COI: FactBio are trying do this
Oct 2016
Clinical Standards Workshop
Reward for the data generator• ‘Citation of data’ not high enough reward• Often person generating data, is not beneficiary of post-
experimental curation
They should be– They should be able to ask new questions– Gain new insights– Be offered new recommendations– Impact their understanding– Save time in the short and long term – Benefit now and in future (what I call data prosperity)
Oct 2016
Clinical Standards Workshop
Applying standards at each step
• ‘Born semantic’ is best• But otherwise
enabling addition of metadata during each stage also good
• Should become part of each production process step
Oct 2016
Born semantic
Added soon after generation
Added by someone else
soon after
Added months later
Never added
Perfect
Worst case
Average? -
Clinical Standards Workshop
The Ontology Paradox: Access needs to be simple but we need ways of saying complex things
Oct 2016
• Ontologies can describe detailed connections between things
• Annotation is only one step• Context is important• Timelines are important• Editing ontologies is a dark art• We need simpler ways to say
complex things• I call this my ‘professional life
goal’
Clinical Standards Workshop
Semantic Wikis• Seemed like a step in right direction• But are not widely used• I’m not sure I understand why they
have not prospered• Wiki too informal?• No centre of authority (vis a vis
Wikipedia)?• Wikidata gaining some momentum,
though semantics are ‘loose’
Oct 2016
Clinical Standards Workshop
Conclusion – my take homes1. If you don’t use standards (e.g. ontologies)
and focus on local integration you delay sharing and understanding problems
2. Encouraging standards application requires familiar modes of access…
3. …and a reward for data generator to improve value proposition
Oct 2016
Clinical Standards Workshop
www.factbio.com
James Malone: [email protected] Stephenson: [email protected]
Oct 2016
Acknowledgements
Tony StephensonRichard HollandSimon JuppAnna FarneAll out early testersCurators everywhere
Top Related