A blind search for patterns Unravelling low replicate data.
-
Upload
juliet-wedgeworth -
Category
Documents
-
view
215 -
download
0
Transcript of A blind search for patterns Unravelling low replicate data.
A blind search for patternsUnravelling low replicate data
ExSpec Pipeline
Data: Structure and variability
Structure Between 500-10,000+ features
Each feature has an associate ion count for each sample aligned.
Data is not normally distributed.
Variability Up to 30% technical variability
Each feature is effected differently
Data Structure and variability
Data: Structure and variability
The majority of features that are detected are singletons.
Low Replicate data
“Suck it and see” One off project
Pump priming projects
Medical samples Biopsy
Difficult to access Ecological data
Resampling is difficult
Methods
Finger printing
PCA
Basic scoring
PDE model
Gradient search
Differential analysis
PCA
Very simple
Can be highly informative Depends on the data
Used in pipeline Data quality
Bruno Project Samples :
Human biopsy Replication – biopsy cut into
equal parts
PCA Analysis
N group Non-cancer biopsy
T group Cancer biopsy
Using PCA clustering we are able to distinguish between healthy and sick patients
PCA Analysis
PCA reveled profile similarity which correlated with biological evidencePCA
Analysis
PCA Analysis
Human Urine project• 22 patients sampled• 11 healthy and 11 sick
patients • Sample labels dropped
PCA Analysis
Ecological Data
Large number of samples without clear replication.
PCA Analysis
Cluster pattern: Find the features which hold the cluster pattern
PCA Analysis
Using PCA and profile similarity analysis subset of features of interest were found
Basic Scoring
Use Z-score to sort data Use this to pull out important features.
Control – Exp With two class problem we can use PDE modelling.
Basic Scoring : PDE modelling
Multi class problem
Plants Wild type
act ko mutant
Treatments Normal light
High light
Gradient Analysis
Use rate of change of abuandace to Mine data for spesifc trends
Find features of intrest
Use PDE modelling of rates
Gradient Analysis
Mining for features which showed rapid increase due to a specific treatment
Data Provided by:
Brno Ted Hupp
Rob O’Neill
Urine study Steve Michell
John Mcgrath
Ecological data Dave Hodgson
Nicole Goody
Gradient analysis John Love
Data scoring Nicholas Smirnoff
Mike Page
Metabolomics and Proteomics Mass Spectrometry Facility @ The University of Exeter
Nick Smirnoff (Director of Mass Spectrometry) [email protected]
Hannah Florance (MS Facility Manager) [email protected]
Venura Perera (Bioinformatics and Mathematical Support) [email protected]
http://biosciences.exeter.ac.uk/facilities/spectrometry/http://bio-massspeclocal.ex.ac.uk/
About me
Background Applied Maths
Untargeted metabolite profiling
Research interests Data driven modelling
Small molecule profiling
Gene regulatory network modelling
Application of mathematical methods
Metabolite identification using LC-MS/MS