ELIXIR Pilot Actions launched in 2014: Integration of BILS-ProteomeXchange using EUDAT resources
PRIDE and ProteomeXchange: Training webinar
-
Upload
juan-antonio-vizcaino -
Category
Science
-
view
188 -
download
0
Transcript of PRIDE and ProteomeXchange: Training webinar
PRIDE and ProteomeXchange: Training webinar
Dr. Juan Antonio Vizcaíno
PRIDE Group CoordinatorProteomics Services TeamEMBL-EBIHinxton, Cambridge, [email protected]
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Welcome - webinar instructions• Gototraining works best in Chrome or IE – avoid
Firefox due to audio issues with Macs.• To access the full features of Gototraining, use
the desktop version by clicking “switch to desktop version”.
• All microphones will be muted whilst the trainer is speaking.
• If you have a question during this time or at the end, please use the chat box at the bottom of the gototraining box.
• Please complete the feedback survey which will launch at the end of the webinar.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Data resources at EMBL-EBIGenes, genomes & variation
RNA CentralArrayExpress
Expression AtlasMetabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structuresProtein Data Bank in EuropeElectron Microscopy Data Bank
European Nucleotide ArchiveEuropean Variation ArchiveEuropean Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biologyReactions, interactions & pathways
IntActReactome
MetaboLights
SystemsBioModels Enzyme Portal BioSamples
Ensembl Ensembl Genomes
GWAS CatalogMetagenomics portal
Europe PubMed CentralGene OntologyExperimental Factor Ontology
Literature & ontologies
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Data resources at EMBL-EBIGenes, genomes & variation
RNA CentralArrayExpress
Expression AtlasMetabolights
PRIDE
InterPro Pfam UniProt
ChEMBL ChEBI
Molecular structuresProtein Data Bank in EuropeElectron Microscopy Data Bank
European Nucleotide ArchiveEuropean Variation ArchiveEuropean Genome-phenome Archive
Gene, protein & metabolite expression
Protein sequences, families & motifs
Chemical biologyReactions, interactions & pathways
IntActReactome
MetaboLights
SystemsBioModels Enzyme Portal BioSamples
Ensembl Ensembl Genomes
GWAS CatalogMetagenomics portal
Europe PubMed CentralGene OntologyExperimental Factor Ontology
Literature & ontologies
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaí[email protected]
Training webinar25 November 2015 7
Mass Spectrometry (MS)-based proteomics• Many different workflows.
• Discovery mode:• Bottom-up proteomics
• Data dependent acquisition• Data independent acquisition
• Top down proteomics
• Targeted mode:• SRM (Selected Reaction Monitoring)
Juan A. Vizcaí[email protected]
Training webinar25 November 2015 8
Mass Spectrometry (MS)-based proteomics• Many different workflows.
• Discovery mode:• Bottom-up proteomics
• Data dependent acquisition• Data independent acquisition
• Top down proteomics
• Targeted mode:• SRM (Selected Reaction Monitoring)
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
MS proteomics: tandem MS (bottom-up)
MS/MS matching identifies peptides, not proteins.
Proteins are inferred from the peptide sequences.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
• PRIDE stores mass spectrometry (MS)-based proteomics data:
• Peptide and protein expression data (identification and quantification)
• Post-translational modifications• Mass spectra (raw data and peak
lists)• Technical and biological metadata• Any other related information
• Full support for tandem MS approaches
PRIDE (PRoteomics IDEntifications) database
http://www.ebi.ac.uk/pride/archive Martens et al., Proteomics, 2005Vizcaíno et al., NAR, 2013
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Mission
• To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers.
• To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression.
• To provide mass spectrometry based expression data to the Expression Atlas.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Mission
• To archive all types of proteomics mass spectrometry data for the purpose of supporting reproducible research, allowing the application of quality control metrics and enabling the reuse of these data by other researchers.
• To integrate MS-based data in a protein-centric manner to provide information on protein variants, modifications, and expression.
• To provide mass spectrometry based expression data to the Expression Atlas.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
What is a proteomics publication in 2015?• Proteomics studies generate potentially large amounts of
data and results.
• Ideally, a proteomics publication needs to:• Summarize the results of the study• Provide supporting information for reliability of any
results reported
• Information in a publication:• Manuscript• Supplementary material• Associated data submitted to a public repository
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Journal Submission Recommendations• Journal guidelines recommend submission to proteomics repositories:
Proteomics Nature Biotechnology Nature Methods Molecular and Cellular Proteomics
• Funding agencies are enforcing public deposition of data to maximize the value of the funds provided.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE: Source of MS proteomics data
• PRIDE Archive already provides or will soon provide MS proteomics data to other EMBL-EBI resources such as UniProt, Ensembl and the EBI Expression Atlas.
http://www.ebi.ac.uk/pride/archive
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Data content in PRIDE Archive• Dataset submission driven resource.
• PRIDE is organised in datasets (group of assays).
• An assay represents one MS run (in most cases).
• No data reprocessing at present. PRIDE aims to represent the author’s view on the data.
• Main supported formats: PRIDE XML and mzIdentML.
• Raw data is also now stored.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
ProteomeXchange Consortium•Goal: Development of a framework to allow
standard data submission and dissemination pipelines between the main existing proteomics repositories.
•Includes PeptideAtlas (ISB, Seattle), PRIDE (Cambridge, UK) and (very recently) MassIVE (UCSD, San Diego).
•Common identifier space (PXD identifiers)
•Two supported data workflows: MS/MS and SRM.
•Main objective: Make life easier for researchers
http://www.proteomexchange.org
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
ProteomeCentral
Metadata / Manuscript
Raw Data*
Results
Journals
UniProt/neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL (SRM data)
PRIDE (MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE (MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
ProteomeCentral
Metadata / Manuscript
Raw Data*
Results
Journals
UniProt/neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL (SRM data)
PRIDE (MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE (MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter.
4. Other files: Optional files:a. QUANT: Quantification related results e. FASTAb. PEAK: Peak list files f. SP_LIBRARYc. GEL: Gel imagesd. OTHER: Any other file type
Published
RawFiles
Other files
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Complete
Partial
Complete vs Partial submissions: processed resultsFor complete submissions, it is possible to connect the spectra with the identification
processed results and they can be visualized.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PX Data workflow for MS/MS data1. Mass spectrometer output files: raw data (binary files) or
peak list spectra in a standardized format (mzML, mzXML).
2. Result files:
a. Complete submissions: Result files can be converted to PRIDE XML or the mzIdentML data standard.
b. Partial submissions: For workflows not yet supported by PRIDE, search engine output files will be stored and provided in their original form.
3. Metadata: Sufficiently detailed description of sample origin, workflow, instrumentation, submitter.
4. Other files: Optional files (the list can be extended):a. QUANT: Quantification related results e. FASTAb. PEAK: Peak list files f. SP_LIBRARYc. GEL: Gel imagesd. OTHER: Any other file type
Published
RawFiles
Other files
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML1
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Search output files
Spectra files
Original data files ‘RESULT’ file generation Final ‘RESULT’ file
PRIDE XML
‘RESULT’
Before: only file conversion to PRIDE XML
File conversion
PRIDE Converter
Other tools, e.g. hEIDI
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PX Data workflow for MS/MS dataSearch Engine
Results + MS files
PRIDE Converter 2
PRIDE XML
Coté & Griss et al., MCP, 2012
Other tools available:
- PRIDE Converter- PLGS (Waters)- Proteios- EasyProt- hEIDI- OmicsHub (Integromics)- PeptideShaker (Compomics)
PRIDE Converter 2
https://github.com/PRIDE-Toolsuite/pride-converter-2
- ‘Bulk’ conversion possible: Command Line mode- Virtually no limit in file sizes.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Tools ‘RESULT’ file generation Final ‘RESULT’ file
mzIdentML ‘RESULT’
Now: native file export to mzIdentML
Spectra files
(mzML, mzXML, mzData,
mgf, pkl,
ms2, dta, apl)
Mascot
ProteinPilot
Scaffold
PEAKS
MSGF+
Others
Native File export
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Complete submissionsSearch Engine
Results + MS files
Search engines
mzIdentML
- Mascot- MSGF+- MyriMatch and related tools from D. Tabb’s
lab- OpenMS- PEAKS- PeptideShaker- ProCon (ProteomeDiscoverer, Sequest)- Scaffold- TPP via the idConvert tool (ProteoWizard)- ProteinPilot (from version 5.0)- X!Tandem native conversion (Beta,
PILEDRIVER)- Others: library for X!Tandem conversion, lab
internal pipelines, …- Crux
An increasing number of tools support export to mzIdentML 1.1
- Referenced spectral files need to be submitted as well (all open formats are supported).
Updated list: http://www.psidev.info/tools-implementing-mzIdentML#.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
2
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Inspector Toolsuite
Wang et al., Nat. Biotechnology, 2012Perez-Riverol et al., MCP, 2016, in press
PRIDE Inspector
PRIDE Inspector Toolsuite supports:
- PRIDE XML- mzIdentML + all types of spectra files- mzML- mzTab identification and Quantification +
all types of spectra files
https://github.com/PRIDE-Toolsuite/
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Inspector Toolsuite
https://github.com/PRIDE-Toolsuite/
New visualisation functionality for Protein Groups
PRIDE Inspector Toolsuite
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Inspector ToolsuitePRIDE Inspector Toolsuite
Private review of files submitted to PRIDE https://github.com/PRIDE-Toolsuite/
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Components: Submission Process
PRIDE Converter 2
PRIDE Inspector PX Submission Tool
mzIdentML
PRIDE XML
3
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
•Capture the mappings between the different types of files.
•Make the file upload process straightforward to the submitter (It transfers all the files using Aspera or FTP).
PX submission tool
Published
Raw
Other files
http://www.proteomexchange.org/submission
PXsubmission
tool
•Command line alternative: Using the Aspera file transfer protocol.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Manuscript published detailing the process
Ternent et al., Proteomics, 2014http://www.proteomexchange.org/submission
Example dataset:PXD000764
- Title: “Discovery of new CSF biomarkers for meningitis in children”- 12 runs: 4 controls and 8 infected samples- Identification and quantification data
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Archive submitted datasets up until 1st November, 2015
• 1,259 submitted datasets by November 1st • 923 submitted datasets in 2014• In the last 6 months, 155 submitted datasets per month• Size: ~ 160 TB.
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE: Size comparison with other EBI resources (May 2015)
2004 2006 2008 2010 2012 2014 20161E+07
1E+12
1E+17Data accumulation by resource
Metabo-lites
PRIDE
EGA
ENA (less AE)
AE
date
byte
s
Chart generated by Guy Cochrane
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Data access to PRIDE Archive• Look for particular datasets of interest:
• For data reuse: which particular proteins and peptides (including PTMs) have been detected.
• Data reinterpretation or re-analysis.
• Validation of the experimental results reported.
• Specific use cases for proteomics: spectral libraries, fragmentation models, SRM transitions,…
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
RSS feed for public datasets
http://groups.google.com/group/proteomexchange/feed/rss_v2_0_msgs.xml
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Ways to access data in PRIDE Archive
• PRIDE web interface
• File repository
• REST web service
• PRIDE Inspector tool
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
ProteomeCentral
Metadata / Manuscript
Raw Data*
Results
Journals
UniProt/neXtProt
Peptide Atlas
Other DBs
Receiving repositories
PASSEL (SRM data)
PRIDE (MS/MS data)
Other DBs
GPMDB
Researcher’s results
Reprocessed results
Raw data*
Metadata
MassIVE (MS/MS data)
Vizcaíno et al., Nat Biotechnol, 2014
ProteomeXchange data workflow
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
ProteomeCentral: Portal for all PX datasets
http://proteomecentral.proteomexchange.org/cgi/GetDataset
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
• PRIDE Archive (in the context of ProteomeXchange and the PSI standards)
• How to submit data to PRIDE: PRIDE tools
• How to access data in PRIDE Archive
• A sneak peak to other PRIDE resources
Overview
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
PRIDE Proteomes and PRIDE Cluster• Provide an aggregated and QC filtered peptide-
centric and protein centric view on PRIDE Archive data. http://www.ebi.ac.uk/pride/cluster/http://wwwdev.ebi.ac.uk/pride/proteomes/
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
• Main characteristics of PRIDE Archive and ProteomeXchange (PX)
• PX/PRIDE submission workflow for MS/MS data• PRIDE Inspector• PX submission tool
• PRIDE/ProteomeXchange has become the de facto standard for data submission and data availability in proteomics
Conclusions
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Do you want to know a bit more…?
http://www.slideshare.net/JuanAntonioVizcaino
Juan A. Vizcaí[email protected]
Training webinar25 November 2015
Aknowledgements: PeopleAttila CsordasTobias TernentNoemi del ToroGerhard Mayer (Bochum, de.NBI)
Johannes GrissYasset Perez-Riverol
Henning Hermjakob
Former team members: Rui Wang, Florian Reisinger and Jose A. Dianes
Acknowledgements: The PRIDE Team
• 9 December – UniProt website updates• 16 December – Ensembl release 83
All webinars @ 4:00pm GMT time unless statedFor details see: http://www.ebi.ac.uk/training/webinars
Future webinars: