Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

31
Patricia Hernandez Geneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Transcript of Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Page 1: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Patricia HernandezGeneva, 28th September 2006

Swiss Bio Grid: Proteomics Project (PP)

Page 2: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Definition:

Set of technologies and methodologies for large-scale studies of proteins identification / characterization / quantification

Context: Proteomics

Page 3: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Definition:

Set of technologies and methodologies for large-scale studies of proteins identification / characterization / quantification

Typical proteomic study:

identify proteins that are differentially expressed between two samples (e.g. normal vs disease state)

Context: Proteomics

Page 4: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Definition:

Set of technologies and methodologies for large-scale studies of proteins identification / characterization / quantification

Typical proteomic study:

identify proteins that are differentially expressed between two samples (e.g. normal vs disease state)

Technology:

mass spectrometry (MSMS) = mass measurement of protein fragments

Context: Proteomics

Page 5: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Identification of proteins: principle

Many available tools; all work in the same way a LIST OF MSMS SPECTRA processed sequentially

a LIST OF POSSIBLE SOLUTIONS e.g. a list of known protein sequences

thousands to miostens to thousands

Page 6: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

solutions are (sequentially) evaluated against the spectra using a COMPARISON FUNCTION

some display (OUTPUT) of the identified proteins (with/without additional features such as statistics, result export, etc.)

Identification of proteins: principle

thousands to miostens to thousands

Page 7: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Key idea:

Give access through a unique web portal to several spectrum analysis software in a workflow-oriented data analysis platform.

the swissPIT platform

Key idea and main objectifs of the PP

Page 8: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Key idea:

Give access through a unique web portal to several spectrum analysis software in a workflow-oriented data analysis platform.

the swissPIT platform

Main objectifs:

- increase the coverage of identified proteins

- automatise analysis workflows

- provide a environment for parameter optimisation studies and for benchmarking

Key idea and main objectifs of the PP

Page 9: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Interaction with the user:

MSMS data upload

choice of workflows and parameter configuration

result visualisation

data/result sharing

swissPIT overview: three distinct parts

Page 10: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Execution of the analysis workflow selected by the user

 data exploitation or high-throughput centered workflows

task-specific workflows (=personalized for a given lab)

swissPIT overview: three distinct parts

Page 11: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Easy parallelisation

In a workflow, several analysis tools may be called in the same time (and independently)

For a given identification tool, the spectrum list and/or the db can be splitted into bundles and each bundle analysed independently

swissPIT overview: three distinct parts

Page 12: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Use of distributed resources

Each site decides what databases and tools to install and maintain.

Corresponds to the « reality ». Research groups and proteomics facilities are geographically scattered and need to collaborate.

swissPIT overview: three distinct parts

Page 13: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Current status of swissPIT

Web-based interface

4 protein identification tools Phenyx X!Tandem Popitam InsPecT

2 protein sequence databases uniProtKB/swissProt (>230’000 entries) uniProtKB/trEMBL (> 3’180’000 entries)

swissBioGrid compatible (submission to a grid is transparent for the user)

Page 14: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

User layer System layer

Current status: swissPIT from inside

Page 15: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

User layer System layer

Current status: swissPIT from inside

Page 16: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

User layer System layer

submit

mgf

Current status: swissPIT from inside

Page 17: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

User layer System layer

submit

mgf

Current status: swissPIT from inside

Page 18: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

User layer System layer

submit

mgf

Grid/cluster

Current status: swissPIT from inside

Page 19: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

User layer System layer

submit

mgf

Grid/cluster

Current status: swissPIT from inside

Page 20: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

http://swisspit.cscs.ch/

usernamepwd

Current status: swissPIT from outside

Page 21: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

some globalparameters

Current status: swissPIT from outside

Page 22: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

List of software that are installedCheck/uncheck boxes to select software to be run on the data

Current status: swissPIT from outside

Page 23: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Click on link to display and configuresoftware specific parameters

Current status: swissPIT from outside

Page 24: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Click on link to display and configuresoftware specific parameters

Press to run the software

Current status: swissPIT from outside

Page 25: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Go to the user spaceBrowse old/new project

Current status: swissPIT from outside

Page 26: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Results are visualized in native format raw text for Popitam, InsPecT XML with style sheet for X!Tandem advanced java interface for Phenyx

Current status: swissPIT from outside

Page 27: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Code improvement improve readability and maintainability of code

Upcoming work

Page 28: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Code improvement improve readability and maintainability of code

Standardisation unify parameters as much as possible display results in one format

Upcoming work

Page 29: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Code improvement improve readability and maintainability of code

Standardisation unify parameters as much as possible display results in one format

Workflows find a way to implement workflows using xml

configuration files

Upcoming work

screen for unsuspected modificationsscreen for proteinsremove spectra with low peak statistics

Page 30: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Ron Appel

Patricia Hernandez

Celine Hernandez

Andreas Quandt

Marc Tuloup

Pierre-Alain Binz

Markus Müller

Alexandre Masselot, Nicolas Budin

Vital-it team + Bruno Nyffeler

Peter Kunszt, Sergio Maffioletti, Arthur Thomas

Involved people, acknowledgments

Page 31: Patricia HernandezGeneva, 28 th September 2006 Swiss Bio Grid: Proteomics Project (PP)

Thankyoufor

yourattention

Involved people, acknowledgments