Seeing things differently: Innovation in Computational ...€¦ · Seeing things differently:...
Transcript of Seeing things differently: Innovation in Computational ...€¦ · Seeing things differently:...
Seeing things differently: Innovation in Computational
Mass SpectrometryRob Smith, Ph.D.
Associate ProfessorDepartment of Computer ScienceUniversity of Montana
“Pain-free MS data processing”
Founder | CEO
“I undertook something that not everyone may undertake: I descended into the depths, I bored into the foundations.”
—Nietzche, “Dawn of Morning”
Overview: Where is
the innovation?
Innovation
Innovation
Innovation
InnovationInnovation
Innovation
Innovation
Innovation
Current Limits
Current Limits
Cur
rent
Lim
itsC
urrent Limits
InnovationInnovation
Innovation
Innovation
Innovation
Current Limits
Current Limits
Cur
rent
Lim
itsC
urrent Limits
InnovationInnovation
Innovation
Innovation
Innovation
Current Limits
Current Limits
Cur
rent
Lim
itsC
urrent Limits
InnovationInnovation
Innovation
Innovation
Innovation
Why don’t we go there?
Current Limits
Current Limits
Cur
rent
Lim
itsC
urrent Limits
InnovationInnovation
Innovation
Innovation
Innovation
Why don’t we go there?
• Need to identify the limits.
• Need to take risks.
What does the journey look like?
You are on the right track
when…
a) The old guard says, “why would you want to
do that?”
Innovation
Innovation
Innovation
Their world looks like this:
Current Limits
Current Limits
Cur
rent
Lim
itsC
urrent Limits
InnovationInnovation
Innovation
Innovation
Innovation
Not this:
Inside the Box
“There are no unsolved problems.” - A Developer
“Conversations with 100 scientists in the field reveal a bifurcated perception of the state of mass spectrometry software.” R. Smith, Journal of Proteome Research, 2018.
Inside the Box
“How could you possibly make significant improvements to the state of
the art?!” - A Bigwig
Outside the Box
• “All scientific software sucks. It is idosyncratic, it makes no sense, it has glitches, it is a pain in the ass!” - A User
• “[There are ] a few mediocre ones, the rest are absolute crap.” - A User
• “They are complete trash.” - A User
“Conversations with 100 scientists in the field reveal a bifurcated perception of the state of mass spectrometry software.” R. Smith, Journal of Proteome Research, 2018.
b) You ask “why not,” and you find there isn’t a sufficiently good reason.
Current Limits
Current Limits
Cur
rent
Lim
its
Current Lim
its
InnovationInnovation
Innovation
Innovation
Innovation
Why not?
The limits of the possible can only be defined by going beyond them into the impossible.
-Arthur C. Clarke
c) You need a new vocabulary to describe
your solution.
Innovation occurs in the space between reality and the
language we use to describe it.
d) You are able to see and measure limitations
in the status quo.
Outline
• The old guard says, “why would you want to do that?”
• You ask “why not,” and you find there isn’t a sufficiently good reason.
• You need a new vocabulary to describe your solution.
• You are able to see and measure limitations in the status quo.
Outline
• The old guard says, “why would you want to do that?”
• You ask “why not,” and you find there isn’t a sufficiently good reason.
• You need a new vocabulary to describe your solution.
• You are able to see and measure limitations in the status quo.
Part 1: Words and Concepts
“Many problems are caused by the difference between how things actually work, and the
language / tools / paradigms / tropes we use to describe and engage with them.”
-Gregory Bateson
“Language allows you to have ideas otherwise un-haveable, and that by
extension people who own different words live in different conceptual worlds.”
-Joshua Hartshorne
Innovation occurs in the space between reality and the
language we use to describe it.
You can’t code what you can’t describe.
“Current controlled vocabularies are insufficient to uniquely map molecular entities to mass
spectrometry signal” Smith et al., BMC Bioinformatics 16(7), 2015.
Part 2: Asking different
questions
• What we think we are asking
• What we are actually asking
• What we should be asking
• What we think we are asking
• What we are actually asking
• What we should be asking } Not the
same!
What we think we are doing What we are actually doing
p(x) p(x|a,b,c,….)}
What we want to measure
}Our assumptions
An analog or estimate
p(x|a,b,c,….)}
Our assumptions
EASIER TO CALCULATE
But what if a,b,c,…. are wrong?
What we think we are doing What we are actually doing
Given: -a spectrum -context
…what do I have?
Assuming: a single species.
the most abundant ions are from the same species.
ion abundance = parent abundance.
there are little to no modifications.
database contains the correct match.
…what matches best?
What we think we are doing What we are actually doingWhat is the likelihood that a match is correct (FDR)?
Assumes:Target/decoy accurately simulates
the likelihood of a false positive match.
Decoy sequences are dissimilar to target sequences.
The database size is chosen such that the FDR is accurate.
What is the similarity between matched spectra and shuffled or reversed spectra?
What we think we are doing What we are actually doing
Correspondence Alignment
Elution order never changes
MS/MS ID rates are high
m/z doesn’t shift
RT shifts are monotonic
Assumes:
“LC-MS alignment in theory and practice: a comprehensive algorithmic review.” Smith et al. Briefings in Bioinformatics 16(1), 2015.
What we think we are doing What we are actually doing
Which PTMs are in this sample? Does this sample contain this particular PTM?
At what index are these peptides modified?
At what index is this particular PTM found?
One modification at a time.
Only the modification we are looking for.
What we think we are doing What we are actually doingValidating accuracy w/ CV Measuring consistency w/ CV
Sameness -> correctnessCorrect peak integration
What we think we are doing What we are actually doing
Validating algorithms Measuring agreement between algorithms
Sameness -> correctness
If we had more time…
• Signal to noise is meaningful.
• DIA >> DDA
• 2-dimensional signals should be used (XICs, TICs, etc.)
• Predicting spectra is hard; machine learning can make it easier.
Summarizing
• Analogs are not the same as equals.
• We ignore massive and often provably incorrect assumptions.
• Bad assumptions = incorrect results.
Summarizing
• What is the space between reality and the language we use to describe it?
• Are our estimates actually any good?
• Can estimates be improved?
• Can we actually measure what we are currently only estimating?
Acknowledgements
www.primelabs.ms
Smith Computational Mass Spectrometry Lab
NSF Career Award 1552240
NSF SBIR 1819290
NSF I-Corps 1741270
MTBRCT 19-51-031
Funding:
“Pain-free MS data processing”