Http:// [email protected] Data Integration and Management A PDB Perspective.
-
Upload
alyson-wade -
Category
Documents
-
view
217 -
download
0
Transcript of Http:// [email protected] Data Integration and Management A PDB Perspective.
![Page 2: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/2.jpg)
What is PDB?
• Single international repository of three-dimensional data for biological macromolecules
• Public community resource• Established at Brookhaven in 1971 (7
structures)• Moves to RCSB in 1998• wwPDB established in 2004• > 25,000 structures in PDB
![Page 3: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/3.jpg)
Community
• Scientific Community - at all levels– Structural biologists (crystallography, NMR, cryo-
EM)– Biologists– Computational biologists
• Journals• General Community
– Secondary school– General public
• Internal– RCSB PDB staff– wwPDB members
![Page 4: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/4.jpg)
Data Representation
• Macromolecular Crystallographic Information Framework
• XML DTD/Schema Mapping• SQL Schema Mapping• CORBA IDL Mapping• Supporting emerging ontology representations - OWL
![Page 5: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/5.jpg)
Elements of Dictionary Metadata
• Data Attributes– Definition– Examples– Data type (primitive type/regular expression
patterns)– Range or allowed values
• Classes– Categories– Subcategories– Category groups
• Associations– Parent-child relationships– Interdependencies/exclusivity– Methods
![Page 6: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/6.jpg)
Difficult Issues
• Resolving semantic ambiguities – encoding meaning
• Integrating controlled vocabularies
• Separation of primary and derived information
• Supporting rapid evolution of science
![Page 7: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/7.jpg)
What’s Driving Data Definition
• IUCr-sponsored community effort
• Automated data acquisition
• Data management and data exchange for PDB
• New technologies (e.g. cryo-electron
microscopy)
• High-throughput structure determination and
structural genomics
![Page 8: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/8.jpg)
TargetSelection Protein
Production
StructureDetermination
PDBDeposition
Merged
Project Data
CrystalProduction
ProjectDatabase
ExchangeDictionary
Typical Project Deposition Data
Flow
![Page 9: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/9.jpg)
Data Sharing Nightmare
![Page 10: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/10.jpg)
Incremental Data Pipeline
![Page 11: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/11.jpg)
Current Integration Strategy
• Provide software tools to collect bits of data from the output from each program step
• Convert data in log and output files to a common representation
• Merge the data corresponding to the successful outcome
• Provide an editor tool to enter remaining data and check consistency of results
![Page 12: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/12.jpg)
Data Deposition and Annotation
PDB ID
DistributionSite
Depositor
ArchivalData
Core DB
PDB EntryADIT Annotate Validate
Depositor Approval
Validation Report
Corrections
Step 2
Step 3
Step 4
Step 1
Functional AnnotationStep 5
![Page 13: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/13.jpg)
Integrated Data Processing System
ADITADITsrv
ADITADITsrv
Reports Final Files
MAXIT
Validation
Database Loader
Metadata DictionariesData
Views
Client Input Tool
Data Assembled by Depositor
ADIT
ADITsrv
![Page 14: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/14.jpg)
Features of System
• Different dictionaries without software changes
• Metadata customization of both functionality and
content
• Automatically scales with changes in content
• Can be distributed to multiple deposition sites
• Reference data and standard nomenclature (ERFs)
• Self-monitoring
![Page 15: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/15.jpg)
Data Distribution
Applications
mmCIF Data Files(Data Reference Standard)
APIServers
RelationalDatabase
mmCIFParsers
XML Files
![Page 16: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/16.jpg)
Automatic Production of Macromolecular Structure
API Components
PDB Exchange Dictionary +
API Specific Data Dictionaries
CORBA IDL, SQL Schema,
XML DTD/Schemas, Data Loaders
Database Access Classes
MetamodelFramework
![Page 17: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/17.jpg)
Management
• Complex challenges in technology and sociology• Communicate and work with diverse community• Help create and enforce community policies and
standards• Must take advantage of the most current
innovations in new technologies• New technologies must be introduced so as to
enable and not disrupt the users of the resource• Beyond all else is the need for good data and a
robust data representation
![Page 18: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/18.jpg)
Access
• RCSB Protein Data Bank Site• http://www.pdb.org/
• OpenMMS site (Java implementation)• http://openmms.sdsc.edu/
• RCSB PDB Software Download Site (C++ and Python implementation, NDB server)• http://deposit.pdb.org/mmcif/FILM/
• RCSB PDB Dictionary Resource Site • http://deposit.pdb.org/mmcif/
• RCSB PDB Beta Data Site• ftp://beta.rcsb.org/pub/pdb/uniformity/data/
![Page 19: Http:// info@rcsb.org Data Integration and Management A PDB Perspective.](https://reader036.fdocuments.in/reader036/viewer/2022062422/56649f275503460f94c3f869/html5/thumbnails/19.jpg)
http://www.pdb.org/ • [email protected]
Operated by three members of the RCSB: Rutgers, The State University of New Jersey; San Diego Supercomputer Center at the University of
California, San Diego; Center for Advanced Research in Biotechnology/UMBI/NIST
The RCSB PDB is supported by funds from the National Science Foundation (NSF), the National Institute of General Medical Sciences
(NIGMS), the Office of Science, Department of Energy (DOE), the National Library of Medicine (NLM), the National Cancer Institute (NCI),
the National Center for Research Resources (NCRR), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and the
National Institute of Neurological Disorders and Stroke (NINDS).
The RCSB PDB is a member of the wwPDB (http://www.wwpdb.org/)