Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Post on 05-Jan-2016

214 views 0 download

Tags:

Transcript of Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Patricia HernandezGeneva, 28th September 2006

Swiss Bio Grid: Proteomics Project (PP)

Definition:

Set of technologies and methodologies for large-scale studies of proteins identification / characterization / quantification

Context: Proteomics

Definition:

Set of technologies and methodologies for large-scale studies of proteins identification / characterization / quantification

Typical proteomic study:

identify proteins that are differentially expressed between two samples (e.g. normal vs disease state)

Context: Proteomics

Definition:

Set of technologies and methodologies for large-scale studies of proteins identification / characterization / quantification

Typical proteomic study:

identify proteins that are differentially expressed between two samples (e.g. normal vs disease state)

Technology:

mass spectrometry (MSMS) = mass measurement of protein fragments

Context: Proteomics

Identification of proteins: principle

Many available tools; all work in the same way a LIST OF MSMS SPECTRA processed sequentially

a LIST OF POSSIBLE SOLUTIONS e.g. a list of known protein sequences

thousands to miostens to thousands

solutions are (sequentially) evaluated against the spectra using a COMPARISON FUNCTION

some display (OUTPUT) of the identified proteins (with/without additional features such as statistics, result export, etc.)

Identification of proteins: principle

thousands to miostens to thousands

Key idea:

Give access through a unique web portal to several spectrum analysis software in a workflow-oriented data analysis platform.

the swissPIT platform

Key idea and main objectifs of the PP

Key idea:

Give access through a unique web portal to several spectrum analysis software in a workflow-oriented data analysis platform.

the swissPIT platform

Main objectifs:

- increase the coverage of identified proteins

- automatise analysis workflows

- provide a environment for parameter optimisation studies and for benchmarking

Key idea and main objectifs of the PP

Interaction with the user:

MSMS data upload

choice of workflows and parameter configuration

result visualisation

data/result sharing

swissPIT overview: three distinct parts

Execution of the analysis workflow selected by the user

 data exploitation or high-throughput centered workflows

task-specific workflows (=personalized for a given lab)

swissPIT overview: three distinct parts

Easy parallelisation

In a workflow, several analysis tools may be called in the same time (and independently)

For a given identification tool, the spectrum list and/or the db can be splitted into bundles and each bundle analysed independently

swissPIT overview: three distinct parts

Use of distributed resources

Each site decides what databases and tools to install and maintain.

Corresponds to the « reality ». Research groups and proteomics facilities are geographically scattered and need to collaborate.

swissPIT overview: three distinct parts

Current status of swissPIT

Web-based interface

4 protein identification tools Phenyx X!Tandem Popitam InsPecT

2 protein sequence databases uniProtKB/swissProt (>230’000 entries) uniProtKB/trEMBL (> 3’180’000 entries)

swissBioGrid compatible (submission to a grid is transparent for the user)

User layer System layer

Current status: swissPIT from inside

User layer System layer

Current status: swissPIT from inside

User layer System layer

submit

mgf

Current status: swissPIT from inside

User layer System layer

submit

mgf

Current status: swissPIT from inside

User layer System layer

submit

mgf

Grid/cluster

Current status: swissPIT from inside

User layer System layer

submit

mgf

Grid/cluster

Current status: swissPIT from inside

http://swisspit.cscs.ch/

usernamepwd

Current status: swissPIT from outside

some globalparameters

Current status: swissPIT from outside

List of software that are installedCheck/uncheck boxes to select software to be run on the data

Current status: swissPIT from outside

Click on link to display and configuresoftware specific parameters

Current status: swissPIT from outside

Click on link to display and configuresoftware specific parameters

Press to run the software

Current status: swissPIT from outside

Go to the user spaceBrowse old/new project

Current status: swissPIT from outside

Results are visualized in native format raw text for Popitam, InsPecT XML with style sheet for X!Tandem advanced java interface for Phenyx

Current status: swissPIT from outside

Code improvement improve readability and maintainability of code

Upcoming work

Code improvement improve readability and maintainability of code

Standardisation unify parameters as much as possible display results in one format

Upcoming work

Code improvement improve readability and maintainability of code

Standardisation unify parameters as much as possible display results in one format

Workflows find a way to implement workflows using xml

configuration files

Upcoming work

screen for unsuspected modificationsscreen for proteinsremove spectra with low peak statistics

Ron Appel

Patricia Hernandez

Celine Hernandez

Andreas Quandt

Marc Tuloup

Pierre-Alain Binz

Markus Müller

Alexandre Masselot, Nicolas Budin

Vital-it team + Bruno Nyffeler

Peter Kunszt, Sergio Maffioletti, Arthur Thomas

Involved people, acknowledgments

Thankyoufor

yourattention

Involved people, acknowledgments