Presentation on Biological database By Elufer Akram @ University Of Science And technology...
-
Upload
elufer-akram -
Category
Data & Analytics
-
view
88 -
download
4
Transcript of Presentation on Biological database By Elufer Akram @ University Of Science And technology...
PRESENTATION ON
BIOLOGICAL DATABASE
By– Elufer Akram (14/BBT/06)University Of Science and Technology, Meghalaya
What is the Database? Databases Architecture Variants Of Biological Database Nucleotide sequence database GenBank NCBI DDBJ Protein Sequence Database PDB ( Protein Data Bank) TrEMBL, PIR, UniPROT Collaboration Main Objectives of Biological Databases
Contents
Database are convenient system to properly store, search and retrieve any type of data.
A database helps to easily handle and share large amount of data and supports large scale analysis by easy access and data updation.Further the databases link information generated from various knowledge about the subject under consideration
What is the Database?
Biological databases are libraries of life sciences information ,collected from scientific experiments, published literature, high-throughput experiment technology and computational analysis. They contain information from genomics,proteomics,microarry gene expression.
Information contained in biological databases includes gene function,structure,localization(both cellular and chromosomal),biological sequences and structures.
What is Biological Database
Information system
Query system
Storage SystemData
Databases Architecture
Information system
Query system
Storage SystemData
GenBank flat file PDB fileInteraction RecordTitle of a bookBook
Databases Architecture
Information system
Query system
Storage SystemData
BoxesOracleMySQLPC binary filesUnix text filesBookshelves
Databases Architecture
The GoogleEntrezSRS
Information system
Query system
Storage SystemData
Databases Architecture
1. Primary Database. 2. Secondary database. 3. Composite Database.
Variants Of Biological Database
Theses are the primary repositories of data used to store nucleic acid, protein sequences and structural information of biological macromolecules.
Some primary databases->
NCBI(The National Centre for Biotechnology Information),GenBank,DDBJ (DNA data bank of Japan),SWISS-PROT(Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB)),PIR (Protein Information Resource),PDB(Protein Data Bank)This sequence collection of this database is due to the efforts of basic research from academic industrial and sequencing lab)
Primary Database
This repositories are developed in collaboration to each other and as a result contain similar data. However this database have different user interface to query and search information available in the database.
Primary Database
A Secondary database contain additional information derived from the analysis of data available in primary repositories.Secondary databases are analysed in a variety of ways and contain different information in different formats. One of the major primary database SWISS-PROT is used to derive several other secondary databases.
Some secondary databases TrEMBL,Pfam,PROSITE,Profiles,SCOP,CATH
Secondary Database
A composite database is combines information from various primary database and makes it convenient to search the desired information without querying to all these primary database.
Composite database make searching much simpler because information from different resources is gathered in a single database. It has its own format and different strategies to store data from various primary database.
Some composite database-> OWL (The Web Ontology Language),MISPX,NRDB (Natural Resources Database)
Composite database
The National Center for Biotechnology Information
Created in 1988 as a part of theNational Library of Medicine at NIH
– Establish public databases– Research in computational biology– Develop software tools for sequence analysis– Disseminate biomedical information
Bethesda,MD
GenBank, EmBL nucleotide Sequence Database and DDBJ are major sequence repositories from which various databases have been derived.
Nucleotide sequence database
GenBank File format
GenBank
GenBank is the most comprehensive and annotated collection of publicly available DNA sequences and is apart of International Nucleotide Sequence database Collaboration(INSDC),Which consist of DNA databank of Japan(DDBJ),The European Molecular Biology Laboratory(EMBL), And GenBank at National Centre for Biotechnology Information(NCBI,USA). A new release of GenBaNK is made every two months.
GenBank
Traditional GenBank Record
ACCESSION U07418VERSION U07418.1 GI:466461
Accession•Stable•Reportable•Universal
VersionTracks changes in sequence GI number
NCBI internal use
well annotated
the sequence is the data
The NCBI (The National Centre for Biotechnology Information) was establish in November 4th ,1988 as a part of the national Library of medicine (NLM) at the National institute of Health (NIH),USA .The multidisciplinary research group consists of Scientist from diverse fields (Computers,Mathematics,Biochemistry, Physics etc.)
NCBI
NCBI HOMEPAGE
LIPASE Sequece in NCBI
PRIMARY VS. DERIVATIVE SEQUENCE DATABASES
GenBank
SequencingCenters
GA
GAGA
ATTAT
TCCGAGA
ATTAT
TCC
AT
GAGA
ATTCC GAGA
ATTCC
TTGACAATT
GACTA
ACGTGC
TTGACA
CGTGAATTGAC
TATATAGCCG
ACGTGC
ACGTGCACGTGCTTGACA
TTGACA
CGTGA CGTGA
CGTGA
ATTGACTAATTGACTA AT
TGACTA
ATTGACTA
TATAGC
CG
TATAGCCGTATAGCCGTATAGCCGTATAGCCG TATAGCCGTATAGCCG TATAGCCG CAT
T
GAGA
ATTCC GAGA
ATTCC Labs
Algorithms
UniGene
Curators
RefSeq
GenomeAssembly
TATAGCCGAGCTCCGATACCGATGACAA
Updated continually by NCBI
Updated ONLY by submitters
DNA Data Bank of Japan was established in 1986 at the National Institute of genetics (NIG),Japan with the support of Ministry of Education Science, Sports and Culture,Japan. DDBJ has served as one of the three collaborating International DNA Databases.
DDBJ
DDBJ Homepage
Protein has a wide range of database such as SWISS-PROT , TrEMBL, Protein Information Resource (PIR), UniPort
SWISS-PROT-> It is a database of protein sequences and provides high quality with minimum redundancy. It was created in 1986 at the Department of Medical Biochemistry, University of Geneva. SWISS-PROT is a cross referenced with several other databases including nucleic acid and protein structure database. It classify its data in to two ways----i) Core dataii) Annotation
Protein Sequence Database
PDB ( Protein Data Bank)
TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. These databases are developed by the SWISS-PROT groups at SIB and at EBI.
It was created in 1996 t with the objective to fill-up the gap between flow of genomic data and annotated protein sequences
TrEMBL ( Translated EMBL)
PIR HomePage
PIR (Protein Information Resource)
The Protein Information Resource (PIR), located at Georgetown University Medical Centre (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies
PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers and costumers in the identification and interpretation of protein sequence information
PIR
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature.
UniPROT
The UniProt consortium comprises the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). EBI, located at the Welcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Centre in Washington, DC, USA, is heir to the oldest protein sequence database
UniPROT
Some Keywords that are used in The NCBI GenBANK database
LOCUS: Unique string of 10 letters and numbers in the database. Not maintained amongst databases, and is therefore a poor sequence identifier.
ACCESSION: A unique identifier to that record, citable entity; does not change when record is updated. A good record identifier, ideal for citation in publication.
VERSION: New system where the accession and version play the same function as the accession and gi number.
Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes.
PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one CDS.
Protein gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes.
protein_id: Identifier which has the same structure and function as the nucleotide
Differences…..
International Nucleotide Sequence Database Collaboration
GenBank EMBL DDBJ
Collaboration
Recognize various data formats, and know what their primary use.
Know, understand and utilize all types of sequence identifiers.
Know and understand various feature types present in the GenBank flat files.
Know and understand the various GenBank divisions.
Main Objectives of Biological Databases
WIKIPEDIA NCBI DDBJ PDB GenBank PIR SWISS-PROT/UniPROT
Sources
THANK YOU