A blind search for patterns Unravelling low replicate data.

A blind search for patternsUnravelling low replicate data

ExSpec Pipeline

Data: Structure and variability

Structure Between 500-10,000+ features

Each feature has an associate ion count for each sample aligned.

Data is not normally distributed.

Variability Up to 30% technical variability

Each feature is effected differently

Data Structure and variability

Data: Structure and variability

The majority of features that are detected are singletons.

Low Replicate data

“Suck it and see” One off project

Pump priming projects

Medical samples Biopsy

Difficult to access Ecological data

Resampling is difficult

Methods

Finger printing

PCA

Basic scoring

PDE model

Gradient search

Differential analysis

PCA

Very simple

Can be highly informative Depends on the data

Used in pipeline Data quality

Bruno Project Samples :

Human biopsy Replication – biopsy cut into

equal parts

PCA Analysis

N group Non-cancer biopsy

T group Cancer biopsy

Using PCA clustering we are able to distinguish between healthy and sick patients

PCA Analysis

PCA reveled profile similarity which correlated with biological evidencePCA

Analysis

PCA Analysis

Human Urine project• 22 patients sampled• 11 healthy and 11 sick

patients • Sample labels dropped

PCA Analysis

Ecological Data

Large number of samples without clear replication.

PCA Analysis

Cluster pattern: Find the features which hold the cluster pattern

PCA Analysis

Using PCA and profile similarity analysis subset of features of interest were found

Basic Scoring

Use Z-score to sort data Use this to pull out important features.

Control – Exp With two class problem we can use PDE modelling.

Basic Scoring : PDE modelling

Multi class problem

Plants Wild type

act ko mutant

Treatments Normal light

High light

Gradient Analysis

Use rate of change of abuandace to Mine data for spesifc trends

Find features of intrest

Use PDE modelling of rates

Gradient Analysis

Mining for features which showed rapid increase due to a specific treatment

Data Provided by:

Brno Ted Hupp

Rob O’Neill

Urine study Steve Michell

John Mcgrath

Ecological data Dave Hodgson

Nicole Goody

Gradient analysis John Love

Data scoring Nicholas Smirnoff

Mike Page

Metabolomics and Proteomics Mass Spectrometry Facility @ The University of Exeter

Nick Smirnoff (Director of Mass Spectrometry) [email protected]

Hannah Florance (MS Facility Manager) [email protected]

Venura Perera (Bioinformatics and Mathematical Support) [email protected]

http://biosciences.exeter.ac.uk/facilities/spectrometry/http://bio-massspeclocal.ex.ac.uk/

mailto:[email protected]



http://biosciences.exeter.ac.uk/facilities/spectrometry/

http://bio-massspeclocal.ex.ac.uk/

About me

Background Applied Maths

Untargeted metabolite profiling

Research interests Data driven modelling

Small molecule profiling

Gene regulatory network modelling

Application of mathematical methods

Metabolite identification using LC-MS/MS

A blind search for patterns Unravelling low replicate data.

Documents

Transcript of A blind search for patterns Unravelling low replicate data.