Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Scientific Benchmarks for Structure Prediction

CodesJack Snoeyink & Matt O’Meara

Dept. Computer ScienceUNC Chapel Hill

With thanks to:

Collaborators Brian Kuhlman, UNC Biochem Many other members of the RosettaCommons Richardson lab, Duke Biochem

Funding NIH NSF

Key Points… Scientific Models, esp. for Structural Molecular Biology

Models are the lens through which we view data Models are predominantly geometric Computational models are complex Models evolve, so testing becomes crucial

Focus on statistical/computational models with a sample source, observable local features, chosen functional form,

fit parameters, & visualization/testing methods Capture assumptions and date used to build models to:

Visualize for making design decisions while building Fit parameters to ensure best performance Record as scientific benchmarks

Case Study: Rosetta protein structure prediction software [B]

Science views nature thru models

Scientists view nature thru models

People view the world thru models

Geometric molecular models

Model complexity

Physical and Conceptual models Kept simple to aid understanding

Statistical and Computational models Evolve by combining simple models Even when complex can still be effective at

Validation (Molprobity) or Prediction (Rosetta)

Model complexity

Computational model life cycle

Spiral development, much like software Discover problematic features in some data Create an energy function to adjust them Fit parameters to improve results Check into the software as a new option Make default option if everyone likes it Occasionally refactor and rewrite, removing

outdated or unused modelsBut less support for testing…

Computational model testing

Our goal: Capture data and assumptions from model building for use in model visualization and testing.

Our computational models

Abstraction: A simple component of a complex computational model consists of:

One or more sample sources giving Pdb files from native or decoys

Observable local features having a Hydrogen bond distances and angles

Chosen functional form that Energy from distances and angles

Depends on fitting parameters Weights for combining terms

KMB’03

data set A

data set B

data set Z

SQL query

ggplot2spec

statistics

gatherfeatures

filter transform

Tool schematic

Visualization

Implemented tools Compare distributions from sample sources Tufte’s small multiples via ggplot Kernel density estimation Normalization

Opportunities for Statistical analysis Dimension reduction …

Normalization

[KMB’03]Histogram of Hbond A-H distances in natives

Tool uses…

Scientific unit tests native, HEAD, ^HEAD run on continuously testing server

Knowledge-base score term creation native, release, experimental turn exploration into living benchmarks

Test design hypotheses native, protocol, designs how strange is the this geometry?

Rotamer recovery

Key Points… Scientific Models, esp. for Structural Molecular Biology

Models are the lens through which we view data Models are predominantly geometric Computational models are complex Models evolve, so testing becomes crucial

Focus on statistical/computational models with a sample source, observable local features, chosen functional form,

fit parameters, & visualization/testing methods Capture assumptions and date used to build models to:

Visualize for making design decisions while building Fit parameters to ensure best performance Record as scientific benchmarks

Case Study: Rosetta protein structure prediction software [B]

Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Documents

Transcript of Jack Snoeyink & Matt O’Meara Dept. Computer Science UNC Chapel Hill.

Origin of life, astrobiology, synthetic life Brian O’Meara EEB464 Fall 2015.

UNC Chapel Hill David A. O’Brien Chain Growing Using Statistical Energy Functions David A. O'Brien Balasubramanian Krishnamoorthy: Jack Snoeyink Alex Tropsha.

Large Mesh Simplification using Processing Sequences Martin Isenburg UNC Chapel Hill Peter Lindstrom LLNL Livermore Stefan Gumhold GRIS Tubingen Jack Snoeyink.

Martin Isenburg UC Berkeley Jack Snoeyink UNC Chapel Hill Early Split Coding of Triangle Mesh Connectivity.

Invasive humans Brian O’Meara EEB464 Fall 2015 BBC Monsters We Met.

Computing the Delaunay triangulation for PDB data Yuanxin(‘Leo’) Liu Jack Snoeyink.

Mother Teresa World History Honors Scrapbook Maisie O’Meara.

B ONO World History Honors Scrapbook Maisie O’Meara.

Phylogenetics Brian O’Meara EEB464 Fall 2015

Phylogenetics in the cloud Brian O’Meara

Natural Selection Download mutationSelection.R from Brian O’Meara EEB464 Fall 2015 .

Speciation 2 Brian O’Meara EEB464 Fall 2015 .

J M O’Meara Department of Physics University of Guelph

Joan O’Meara & Team: Joan, Tom, Katie, and Mary ... · Q2 2017 | Single Family Sales in Harrison/Purchase Q2 2017 | HL Luxury Market Snapshot Joan O’Meara & Team: Joan, Tom, Katie,

Filettatura Americana UNC UNF UNEF - Tamburini Group · unc unf unef unc unf unef unc unf unef unc unf unef unc unf unef unc unf unef unc unf unef unc unf unef unc unf unef unc 14,

O’Meara S, Cullum NA, Nelson EA - Smith & Nephe · [Intervention Review] Compression for venous leg ulcers Susan O’Meara 1, Nicky A Cullum , E Andrea Nelson2 1Department of Health

Biogeography & Phylogeography Brian O’Meara EEB464 Fall 2013 atch?v=T1-cES1Ekto.

Evolutionary Medicine Brian O’Meara EEB464 Fall 2015.

William O’Meara - St. Michael's Choir · PDF fileWilliam O’Meara Organist St. Michael’s Cathedral St. Michael’s Choir School Toronto, CANADA PRELUDES & POSTLUDES October 2,

Diversification Brian O’Meara EEB464 Fall 2015 .