Presentation on Biological database By Elufer Akram @ University Of Science And technology...

PRESENTATION ON

BIOLOGICAL DATABASE

By– Elufer Akram (14/BBT/06)University Of Science and Technology, Meghalaya

What is the Database? Databases Architecture Variants Of Biological Database Nucleotide sequence database GenBank NCBI DDBJ Protein Sequence Database PDB ( Protein Data Bank) TrEMBL, PIR, UniPROT Collaboration Main Objectives of Biological Databases

Contents

Database are convenient system to properly store, search and retrieve any type of data.

A database helps to easily handle and share large amount of data and supports large scale analysis by easy access and data updation.Further the databases link information generated from various knowledge about the subject under consideration

What is the Database?

Biological databases are libraries of life sciences information ,collected from scientific experiments, published literature, high-throughput experiment technology and computational analysis. They contain information from genomics,proteomics,microarry gene expression.

Information contained in biological databases includes gene function,structure,localization(both cellular and chromosomal),biological sequences and structures.

What is Biological Database

Information system

Query system

Storage SystemData

Databases Architecture

Information system

Query system

Storage SystemData

GenBank flat file PDB fileInteraction RecordTitle of a bookBook


Information system

Query system

Storage SystemData

BoxesOracleMySQLPC binary filesUnix text filesBookshelves


The GoogleEntrezSRS

Information system

Query system

Storage SystemData


1. Primary Database. 2. Secondary database. 3. Composite Database.

Variants Of Biological Database

Theses are the primary repositories of data used to store nucleic acid, protein sequences and structural information of biological macromolecules.

Some primary databases->

NCBI(The National Centre for Biotechnology Information),GenBank,DDBJ (DNA data bank of Japan),SWISS-PROT(Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB)),PIR (Protein Information Resource),PDB(Protein Data Bank)This sequence collection of this database is due to the efforts of basic research from academic industrial and sequencing lab)

Primary Database

This repositories are developed in collaboration to each other and as a result contain similar data. However this database have different user interface to query and search information available in the database.

Primary Database

A Secondary database contain additional information derived from the analysis of data available in primary repositories.Secondary databases are analysed in a variety of ways and contain different information in different formats. One of the major primary database SWISS-PROT is used to derive several other secondary databases.

Some secondary databases TrEMBL,Pfam,PROSITE,Profiles,SCOP,CATH

Secondary Database

A composite database is combines information from various primary database and makes it convenient to search the desired information without querying to all these primary database.

Composite database make searching much simpler because information from different resources is gathered in a single database. It has its own format and different strategies to store data from various primary database.

Some composite database-> OWL (The Web Ontology Language),MISPX,NRDB (Natural Resources Database)

Composite database

The National Center for Biotechnology Information

Created in 1988 as a part of theNational Library of Medicine at NIH

– Establish public databases– Research in computational biology– Develop software tools for sequence analysis– Disseminate biomedical information

Bethesda,MD

GenBank, EmBL nucleotide Sequence Database and DDBJ are major sequence repositories from which various databases have been derived.

Nucleotide sequence database

GenBank File format

GenBank

GenBank is the most comprehensive and annotated collection of publicly available DNA sequences and is apart of International Nucleotide Sequence database Collaboration(INSDC),Which consist of DNA databank of Japan(DDBJ),The European Molecular Biology Laboratory(EMBL), And GenBank at National Centre for Biotechnology Information(NCBI,USA). A new release of GenBaNK is made every two months.

GenBank

Traditional GenBank Record

ACCESSION U07418VERSION U07418.1 GI:466461

Accession•Stable•Reportable•Universal

VersionTracks changes in sequence GI number

NCBI internal use

well annotated

the sequence is the data

The NCBI (The National Centre for Biotechnology Information) was establish in November 4th ,1988 as a part of the national Library of medicine (NLM) at the National institute of Health (NIH),USA .The multidisciplinary research group consists of Scientist from diverse fields (Computers,Mathematics,Biochemistry, Physics etc.)

NCBI

NCBI HOMEPAGE

LIPASE Sequece in NCBI

PRIMARY VS. DERIVATIVE SEQUENCE DATABASES

GenBank

SequencingCenters

GA

GAGA

ATTAT

TCCGAGA

ATTAT

TCC

AT

GAGA

ATTCC GAGA

ATTCC

TTGACAATT

GACTA

ACGTGC

TTGACA

CGTGAATTGAC

TATATAGCCG

ACGTGC

ACGTGCACGTGCTTGACA

TTGACA

CGTGA CGTGA

CGTGA

ATTGACTAATTGACTA AT

TGACTA

ATTGACTA

TATAGC

CG

TATAGCCGTATAGCCGTATAGCCGTATAGCCG TATAGCCGTATAGCCG TATAGCCG CAT

T

GAGA

ATTCC GAGA

ATTCC Labs

Algorithms

UniGene

Curators

RefSeq

GenomeAssembly

TATAGCCGAGCTCCGATACCGATGACAA

Updated continually by NCBI

Updated ONLY by submitters

DNA Data Bank of Japan was established in 1986 at the National Institute of genetics (NIG),Japan with the support of Ministry of Education Science, Sports and Culture,Japan. DDBJ has served as one of the three collaborating International DNA Databases.

DDBJ

DDBJ Homepage

Protein has a wide range of database such as SWISS-PROT , TrEMBL, Protein Information Resource (PIR), UniPort

SWISS-PROT-> It is a database of protein sequences and provides high quality with minimum redundancy. It was created in 1986 at the Department of Medical Biochemistry, University of Geneva. SWISS-PROT is a cross referenced with several other databases including nucleic acid and protein structure database. It classify its data in to two ways----i) Core dataii) Annotation

Protein Sequence Database

PDB ( Protein Data Bank)

TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. These databases are developed by the SWISS-PROT groups at SIB and at EBI.

It was created in 1996 t with the objective to fill-up the gap between flow of genomic data and annotated protein sequences

TrEMBL ( Translated EMBL)

PIR HomePage

PIR (Protein Information Resource)

The Protein Information Resource (PIR), located at Georgetown University Medical Centre (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies

PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers and costumers in the identification and interpretation of protein sequence information

PIR

UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature.

UniPROT

The UniProt consortium comprises the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). EBI, located at the Welcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Centre in Washington, DC, USA, is heir to the oldest protein sequence database

UniPROT

Some Keywords that are used in The NCBI GenBANK database

LOCUS: Unique string of 10 letters and numbers in the database. Not maintained amongst databases, and is therefore a poor sequence identifier.

ACCESSION: A unique identifier to that record, citable entity; does not change when record is updated. A good record identifier, ideal for citation in publication.

VERSION: New system where the accession and version play the same function as the accession and gi number.

Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes.

PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one CDS.

Protein gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes.

protein_id: Identifier which has the same structure and function as the nucleotide

Differences…..

International Nucleotide Sequence Database Collaboration

GenBank EMBL DDBJ

Collaboration

Recognize various data formats, and know what their primary use.

Know, understand and utilize all types of sequence identifiers.

Know and understand various feature types present in the GenBank flat files.

Know and understand the various GenBank divisions.

Main Objectives of Biological Databases

WIKIPEDIA NCBI DDBJ PDB GenBank PIR SWISS-PROT/UniPROT

Sources

THANK YOU

Presentation on Biological database By Elufer Akram @ University Of Science And technology...

Data & Analytics

Transcript of Presentation on Biological database By Elufer Akram @ University Of Science And technology...